Skip to content

[Feature Request]: Support for YOLO #332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mhornsby opened this issue Jun 19, 2024 · 8 comments
Closed

[Feature Request]: Support for YOLO #332

mhornsby opened this issue Jun 19, 2024 · 8 comments
Assignees

Comments

@mhornsby
Copy link

mhornsby commented Jun 19, 2024

Feature Name

Support for YOLO mutiple boxes

Feature Description

Hi
Following on from issue 85 #85

I found the example code errors with "df_annot must contain unique filenames, found repeating filenames" when there are multiple boxes for the same image file for example:

         filename  img_w  img_h label  bbox_x  bbox_y  bbox_w  bbox_h

2 Cocaktoo14563.jpg 1200 800 3 727 337 190 425
3 Cocaktoo14563.jpg 1200 800 3 238 40 206 441

Is there a good way to handle this ?

Thanks

Contact Information [Optional]

No response

@dnth dnth self-assigned this Jun 19, 2024
@dbickson
Copy link
Collaborator

Hi @mhornsby this error happens since the column names are different than expected.
The output columns after converting from yolo should be as in this example:

       filename   						col_x  row_y width height label   
Kitti/raw/training/image_2/006149.png    0  240  135  133    Car      
Kitti/raw/training/image_2/006149.png  608  169   59   43    Car     

Please let us know which example code are you using so we could fix it?

@dnth
Copy link
Contributor

dnth commented Jun 19, 2024

@mhornsby you could refer to our example notebook here too - https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb

Also, we currently only support COCO-style bounding boxes. Eg xywh format.

The dataframe should consist of the following columns:

  • col_x : the top left corner x coordinate of the bounding box
  • row_y: the top left corner y coordinate of the bounding box
  • width: the width of the bounding box
  • height: the height of the bounding box

@dbickson
Copy link
Collaborator

@mhornsby
Copy link
Author

mhornsby commented Jun 20, 2024

Hi @dbickson the example code is in issue #85

I've been working on it and so far have this code with is no longer erroring and is loading ok. But I am not seeing boxes on images e.g. when I list duplicates so I suspect I have something wrong

import os
import pandas as pd
from PIL import Image

These should come from the yaml file

image_dir = '/content/sample/dataset/train/images'
label_dir = '/content/sample/dataset/train/labels'
label_mapping = [ "Magpie" , "Black Cockatoo" , "White Ibis" , "Cockatoo" ]

def parse_object(obj_str, img_w, img_h):
item_list = obj_str.split(' ')
class_id = int(item_list[0] )
cx_rel, cy_rel, w_rel, h_rel = [float(o) for o in item_list[1:]]

x = round(img_w * (cx_rel - w_rel / 2))
y = round(img_h * (cy_rel - h_rel / 2))
w = round(img_w * w_rel)
h = round(img_h * h_rel)
return [ x , y , w , h , label_mapping[class_id] ]

img_file_list = [f for f in os.listdir(image_dir) if f.endswith('.jpg')]
annotation_list = []

for img_fn in img_file_list:
img_full_path = os.path.join(image_dir, img_fn)
label_full_path = os.path.join(image_dir, img_fn)
img_w, img_h = Image.open(img_full_path).size

anot_full_path = os.path.join(label_dir, img_fn).replace('jpg', 'txt')
with open(anot_full_path, 'r') as f:
    for o in f.readlines():
        bbox_field_list = parse_object(o, img_w, img_h )
        annotation_list.append([img_fn] + bbox_field_list )

columns=['filename', 'col_x', 'row_y', 'width', 'height', 'label', ]

annotation_df = pd.DataFrame(annotation_list, columns=columns)
annotation_df['split'] = 'train' # Only train files were loaded

print( annotation_df )

fd = fastdup.create("/content/work_dir", input_dir=image_dir )

fd.run(annotations=annotation_df , overwrite=True)

@Tompil3r
Copy link
Collaborator

Hi @mhornsby, I'm not seeing anything out of the ordinary with how you're running fastdup with annotations, but could you share a print of your annotations dataframe just so I could be sure everything is as it's supposed to? Could you also share how you're viewing the duplicates? Thanks

@dnth
Copy link
Contributor

dnth commented Jun 20, 2024

@mhornsby I made a tutorial notebook on Kaggle that runs on the traffic detection dataset in YOLO format. Since the dataset is on Kaggle, you can also fork the notebook and run it end-to-end if you have a Kaggle account.

https://www.kaggle.com/code/dnth90/fastdup-traffic-det

Feel free to adapt the notebook to your dataset.

The gallery should look like the following
image

Let me know if this helps.

@mhornsby
Copy link
Author

Many thanks @dnth I successfully used your kaggle notebook on my databset. The bounding boxes in my colab would have been because I was not usng draw_bbox=True !! . My error wasn't aware of that one.
Thanks for you help

@dnth
Copy link
Contributor

dnth commented Jun 26, 2024

Happy to know it helped. Feel free to re-open if there are other issues related to YOLO annotations.

@dnth dnth closed this as completed Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants