Skip to content

[Bug]: Fastdup will create a copy of all images in the 'cdn' folder inside work_dir. #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shantanusingh16 opened this issue Jun 21, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@shantanusingh16
Copy link

What happened?

When trying to run fastdup on a dataset, it ends up copying all these images to specific sub-directories inside a 'cdn' directory inside the work-dir specified. This becomes a challenge with disk storage and also a bottleneck when dealing with network volumes that have slow read/write speeds.

What did you expect to see?

Expected fastdup to not create copies of all images inside work-dir.

What version of fastdup were you runnning on?

2.3

What version of Python were you running on?

Python 3.10

Operating System

Ubuntu 22.04

Reproduction steps

  1. Download an image dataset.
  2. Run fastdup on this dataset using the command:
fd = fastdup.create(input_dir=f"{data_dir}/images/", work_dir=f"{data_dir}/work_dir")
fd.run()
  1. Navigate to the directory work_dir/cdn. This would contain subdirectories where all the images have been copied.

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

[email protected]

@shantanusingh16 shantanusingh16 added the bug Something isn't working label Jun 21, 2024
@dnth
Copy link
Contributor

dnth commented Jun 21, 2024

This is a valid concern. Thanks for reporting!

@dnth
Copy link
Contributor

dnth commented Jun 27, 2024

@shantanusingh16 we've released fastdup==2.5 which addressed this issue. Would you please update fastdup and see if this is still an issue?

@shantanusingh16
Copy link
Author

Hey @dnth . I was able to verify that this problem is solved with fastdup==2.5. Thank you for the prompt fix!

@dnth
Copy link
Contributor

dnth commented Jul 9, 2024

Thanks for confirming again! I will close this issue.

@dnth dnth closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants