Skip to content

[Bug]: Failed to execute fd.run() #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lycika-5mzw opened this issue Jul 10, 2024 · 4 comments
Closed

[Bug]: Failed to execute fd.run() #340

lycika-5mzw opened this issue Jul 10, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@lycika-5mzw
Copy link

What happened?

Code:

def start(progressbar: Progress, begin: int = 0, limit: int = 0, batch_size: int = 0, deduplicate: bool = False):
    split_image(progressbar, begin, limit)
    if deduplicate:
        fd = fastdup.create(work_dir=FASTDUP_WORK_PATH, input_dir=CANDIDATE_SPLIT_PATH)
        fd.run(overwrite=True)
        fd.explore()

Error Message:

fastdup.fastdup_runner.utilities.ExplorationError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10". Please note that only the following image 
formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif', '.bmp', '.webp', '.jfif']

I got hundres of picture named by <HASH>_<TIMESTAMP>_<XAXIS>,<YAXIS>.jpg format. For example: "0a847185848d6e7fc6e967b96a5af457_1720439139294554000_0,0.jpg". They're all 100*87 in size. But fastdup seems to failed read these images

BTW

I can not provide more information since the source code is confused by pyarmor

What did you expect to see?

Images Loaded, WebUI Started

What version of fastdup were you runnning on?

2.5

What version of Python were you running on?

Other

Operating System

macOS 14.3.1 M1

Reproduction steps

No response

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

No response

@lycika-5mzw lycika-5mzw added the bug Something isn't working label Jul 10, 2024
@dbickson
Copy link
Collaborator

dbickson commented Jul 10, 2024

HI @lycika-5mzw, can you specify what is in CANDIDATE_SPLIT_PATH ?
Please send us the output of find <candidate_split_path_folder> -name '*.jpg'.
We have checked your filenames and it seems to work fine. Please run() with verbose=1 and send us the full output.

@lycika-5mzw
Copy link
Author

@dbickson Thanks for reply, just been busy. And sorry for not render the verbose=1 output as a code block, I don't know how to render a code block within the <detail> tag

find command output

find ./assets/candidates_split/ -name '*.jpg'
./assets/candidates_split//0b9e8d01c653efebbb16dcf8dbd7e0e7_1720439636011954000_2,2.jpg
./assets/candidates_split//07f17c0fa416f38cf9c517cbb3ebc8d3_1720439078829525000_0,1.jpg
./assets/candidates_split//3cc7bba743a00bc0fcf41ebd7b8dc8b1_1720439161270461000_2,0.jpg
./assets/candidates_split//2627c50f79f43a1321bf8488acfb7390_1720439142586106000_0,0.jpg
./assets/candidates_split//9df02e74ea16c790510eac082cbdf7d9_1720440019576070000_1,0.jpg
./assets/candidates_split//7cf19dfc5e9c65db6c8ad153ecd1a6e4_1720439649441145000_2,2.jpg
./assets/candidates_split//cb8884398be34fe4734c5106aa93ff01_1720439819882526000_1,2.jpg
./assets/candidates_split//2483118f645a08d7c744dab796b79fad_1720439987313116000_1,2.jpg
./assets/candidates_split//2ba7dc2f6ee12681a674e57ed23ce5ce_1720439847878324000_1,2.jpg
./assets/candidates_split//b80feac615da6b3ab4e4e0589d6fffd2_1720439999224942000_1,0.jpg

fd.run(verbose=True) log

fastdup By Visual Layer, Inc. 2024. All rights reserved.

A fastdup dataset object was created!

Input directory is set to "assets/candidates_split"
Work directory is set to "fastdup"

The next steps are:

  1. Analyze your dataset with the .run() function of the dataset object
  2. Interactively explore your data on your local machine with the .explore() function of the dataset object

For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup.

fastdup By Visual Layer, Inc. 2024. All rights reserved.
Using crashpad handler: /PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/lib/crashpad_handler
2024-07-11 00:16:39 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2024-07-11 00:16:39 [DEBUG] out_dims[0] = -1
2024-07-11 00:16:39 [DEBUG] out_dims[1] = 576
2024-07-11 00:16:39 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2024-07-11 00:16:39 [INFO] Going to loop over dir assets/candidates_split
2024-07-11 00:16:39 [DEBUG] find -L "assets/candidates_split" -type f | egrep -i '.bmp$|.jpg$|.jp2$|.tiff$|.giff$|.jpeg$|.png$|.tif$|.tar$|.tar.gz$|.zip$|.tgz$|.mp4$|.avi$|.m4a$|.m4v$|.mov$|.dav$|.heif$|.heic$|.webp$|.jfif$|.mkv$|.flv$|.wmv$|.webm$|.mpg$|.mpeg$|.3gp$'| sort > fastdup/tmp/files0.txt
2024-07-11 00:16:39 [DEBUG] Read a total of 4500 lines from fastdup/tmp/files0.txt
2024-07-11 00:16:39 [DEBUG] Total images read so far 4500
2024-07-11 00:16:39 [INFO] Found total 4500 images to run on, 4500 train, 0 test, name list 4500, counter 4500
2024-07-11 00:16:39 [DEBUG] Going to init pool
2024-07-11 00:16:39 [DEBUG] Starting to run with 8 threads
2024-07-11 00:16:39 [DEBUG] Going to init quad array of size 4500
2024-07-11 00:16:39 [DEBUG] Going to init jobs
2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,0.jpg 0 batch size 1
2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,0.jpg 3 batch size 1
2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,1.jpg 4 batch size 1
2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,2.jpg 2 batch size 1
2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,1.jpg 1 batch size 1
2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,2.jpg
2024-07-11 00:16:39 [DEBUG] Read image took 2
2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,0.jpg
2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,0.jpg
2024-07-11 00:16:39 [DEBUG] Read image took 2
2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,1.jpg
2024-07-11 00:16:39 [DEBUG] Read image took 2
2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,1.jpg
2024-07-11 00:16:39 [DEBUG] Read image took 2

original 100x87:
[[
2024-07-11 00:16:39 [DEBUG] Read image took 2

255original 100x87:
[[255,
original 100x87:
[[255, 255, 255], [255, 255, 255], [255, 255, 255, original 100x87:255, 255
]
255, 255]]]
[[255, 255, 255], , [[255255, 255, 255], 255, 255], [255, 255, 255]]
, [, [255, 255, [[255, 255, 255255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, [[255, 255255], [255, 255, 255], [255, 255, 251, 255], 255, [255, 255, original 100x87, :
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255]]], [255, 255, 255], [255, 255, 255]]

255, 255

255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255255, , 255], [255, 255, 255]]
[[253, 255, 255], [253, 255, 255], [253, 255, 255]]

], [251, 255, 255], [253, 255, 255]]
[[251, 255, 255], [251, 255, 255], [253, 255, 255]]
[[251, 255, 255], [251, 255, 255], [253, 255, 255]]

255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

2024-07-11 00:16:39 [DEBUG] Computed stats 16807.019531 151.179581 105.936523
2024-07-11 00:16:39 [DEBUG] Computed stats 21843.732422 197.045013 91.041908
2024-07-11 00:16:39 [DEBUG] Computed stats 7786.753418 240.129578 42.584698
2024-07-11 00:16:39 [DEBUG] Computed stats 12685.631836 129.354675 98.006622
2024-07-11 00:16:39 [DEBUG] Computed stats 19594.638672 177.504135 94.880737

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

resized 224:
[[251, 255, 255], [251, 255, 255], [251, 255, 255]]
[[251, 255, 255], [251, 255, 255], [251, 255, 255]]
[[251, 255, 255], [251, 255, 255], [251, 255, 255]]

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

RGB:
[[255, 255, 251], [255,
RGB:
[[
255, 251], [
RGB:
[[255, 255, 255, 255, 251]]
[[255, 255255255, ], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255RGB:
[[, 251255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255], [255, 255, 251], [255, 255, 251]
RGB:
[[, 255, 255]]

]
[[255, 255, 251], [255, 255, 251], [255, 255, 251]]

, 255]]

255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
0 :[255.0000, 255.0000, 251.0000, 255.0000, 255.0000, 251.0000, 255.0000, 255.0000, 251.0000, 255.0000]
0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
2024-07-11 00:16:39 [DEBUG] Inner inference took 21 (test? 0)
output_tensor4 :[0.2104, 0.1376, -0.1381, 0.6034, 1.5306, -0.0473, 1.2749, 0.7011, 0.2332, 0.1807]
output_tensor_end4 :[1.6542, -0.0701, 0.9269, 0.2174]
2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 4 start_offset 0
features4 :[0.2104, 0.1376, -0.1381, 0.6034]
2024-07-11 00:16:39 [DEBUG] Finished inference fine 4 (test 0)!!
2024-07-11 00:16:39 [DEBUG] Inner inference took 23 (test? 0)
output_tensor1 :[0.2169, 0.1785, -0.1234, 0.2666, 0.0978, 0.1854, 0.2726, 2.0044, -0.0423, 0.2064]
output_tensor_end1 :[0.5314, 0.1115, 1.7439, 0.1835]
2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 1 start_offset 0
features1 :[0.2169, 0.1785, -0.1234, 0.2666]
2024-07-11 00:16:39 [DEBUG] Finished inference fine 1 (test 0)!!
2024-07-11 00:16:39 [DEBUG] Inner inference took 24 (test? 0)
output_tensor0 :[0.3633, 1.1611, 0.2634, 0.0288, 0.0052, 0.1590, 0.8868, 3.0300, -0.0705, 1.3699]
output_tensor_end0 :[0.4783, -0.0353, 1.7936, 0.2264]
2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 0 start_offset 0
features0 :[0.3633, 1.1611, 0.2634, 0.0288]
2024-07-11 00:16:39 [DEBUG] Finished inference fine 0 (test 0)!!
2024-07-11 00:16:39 [DEBUG] Inner inference took 25 (test? 0)
output_tensor3 :[1.0767, -0.0195, 0.1717, -0.0528, -0.0154, 1.1650, 0.5571, 0.6300, 0.9296, -0.0045]
output_tensor_end3 :[-0.1771, -0.1224, -0.0969, 1.7246]
2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 3 start_offset 0
features3 :[1.0767, -0.0195, 0.1717, -0.0528]
2024-07-11 00:16:39 [DEBUG] Finished inference fine 3 (test 0)!!
2024-07-11 00:16:39 [DEBUG] Inner inference took 34 (test? 0)
output_tensor2 :[0.2455, 0.0602, -0.0981, 0.7068, 0.0038, 0.9310, 1.1705, 1.7438, 0.0154, 0.6575]
output_tensor_end2 :[1.0321, 0.0210, 1.3554, 0.3052]
2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 2 start_offset 0
features2 :[0.2455, 0.0602, -0.0981, 0.7068]
2024-07-11 00:16:39 [DEBUG] Finished inference fine 2 (test 0)!!
2024-07-11 00:16:48 [DEBUG] Going to store results
Quad array 0x297a70000 i 0 FL 576
features0 :[0.3633, 1.1611, 0.2634, 0.0288]
features-end572 :[0.4783, -0.0353, 1.7936, 0.2264]
Quad array 0x297a70000 i 1 FL 576
features0 :[0.2169, 0.1785, -0.1234, 0.2666]
features-end572 :[0.5314, 0.1115, 1.7439, 0.1835]
Quad array 0x297a70000 i 2 FL 576
features0 :[0.2455, 0.0602, -0.0981, 0.7068]
features-end572 :[1.0321, 0.0210, 1.3554, 0.3052]
Quad array 0x297a70000 i 3 FL 576
features0 :[1.0767, -0.0195, 0.1717, -0.0528]
features-end572 :[-0.1771, -0.1224, -0.0969, 1.7246]
Quad array 0x297a70000 i 4 FL 576
features0 :[0.2104, 0.1376, -0.1381, 0.6034]
features-end572 :[1.6542, -0.0701, 0.9269, 0.2174]
2024-07-11 00:16:48 [DEBUG] Wrote total of 4500 features , found 0 bad images, total so far 4500, filename fastdup/atrain_features.dat
stats width: 100 height: 87 unique: 256 blur: 12685.631836, mean: 129.354675 min: 0.000000 max: 255.000000 stdv: 98.006622 file_fize: 3823
stats width: 100 height: 87 unique: 255 blur: 16807.019531, mean: 151.179581 min: 0.000000 max: 255.000000 stdv: 105.936523 file_fize: 3432
stats width: 100 height: 87 unique: 256 blur: 19594.638672, mean: 177.504135 min: 0.000000 max: 255.000000 stdv: 94.880737 file_fize: 3784
stats width: 100 height: 87 unique: 252 blur: 7786.753418, mean: 240.129578 min: 0.000000 max: 255.000000 stdv: 42.584698 file_fize: 2065
stats width: 100 height: 87 unique: 256 blur: 21843.732422, mean: 197.045013 min: 0.000000 max: 255.000000 stdv: 91.041908 file_fize: 3081
2024-07-11 00:16:48 [DEBUG] Wrote total of 4500 stats in fastdup/atrain_stats.csv
2024-07-11 00:16:48 [DEBUG] Done store results
2024-07-11 00:16:48 [INFO] Found total 4500 images to run on
2024-07-11 00:16:48 [DEBUG] Going to init quad array of size 1000
2024-07-11 00:16:48 [DEBUG] Going to run 5 batches with reminder 500
2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 1000 from offet 0
2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 1000 from offet 576000
2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 1000 from offet 1152000
2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 500 from offet 2304000
2024-07-11 00:16:48 [DEBUG] Finished single thread normalization
after normalization10 :[0.0170, 0.0543, 0.0123, 0.0013]
2024-07-11 00:16:48 [DEBUG] 3) Going to train NN model. Train sample factor 1.000000 howmany 4500
2024-07-11 00:16:48 [DEBUG] 3) Finished train() NN model
2024-07-11 00:16:49 [DEBUG] 265) Finished add() NN model
2024-07-11 00:16:49 [DEBUG] Total data points added= 4500
2024-07-11 00:16:49 [INFO] 268) Finished write_index() NN model
2024-07-11 00:16:49 [INFO] Stored nn model index file fastdup/nnf.index
2024-07-11 00:16:49 [DEBUG] 349) Finished search() NN model
2024-07-11 00:16:49 [DEBUG] KNN results
0 : 1.00000 1361 : 0.92043 596 : 0.90034
1 : 1.00000 1360 : 0.99972 2074 : 0.85255
2 : 1.00000 3604 : 0.97096 4164 : 0.87843
3 : 1.00000 1198 : 0.96876 3457 : 0.76420
4 : 1.00000 3605 : 0.99133 2628 : 0.94568
5 : 1.00000 1973 : 0.98401 2811 : 0.83700
6 : 1.00000 971 : 0.99720 4303 : 0.72761
7 : 1.00000 3205 : 0.95192 4303 : 0.76204
8 : 1.00000 4213 : 0.93754 4304 : 0.92917
9 : 1.00000 3980 : 0.99856 1313 : 0.79341
2024-07-11 00:16:49 [DEBUG] Found total results 9081
2024-07-11 00:16:49 [DEBUG] Replacing lower threshold 0.050000 with position 8627 top_k.size() 9081 loc pos: 0.742899 last pos: 0.600066 0.950000 8626.949993
2024-07-11 00:16:49 [DEBUG] Going to print top_k of len 9081 to fastdup/similarity.csv
2024-07-11 00:16:49 [DEBUG] Found from=to 699 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1069 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 700 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1068 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 698 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1063 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1062 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 701 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 696 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 695 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 406 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 407 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 693 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 409 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 410 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1016 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 411 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1015 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1014 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1013 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1011 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3987 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3991 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1008 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3993 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1988 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1987 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1985 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1984 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1983 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1981 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1980 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2511 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2514 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2515 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2516 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1942 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1941 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2517 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2518 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1940 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1939 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1938 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1937 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1936 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3365 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3363 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3362 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3360 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3358 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3357 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2512 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1986 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1943 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2519 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3994 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1065 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1010 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3990 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 413 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 412 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3364 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 405 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3361 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3359 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1066 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1935 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1009 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1064 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3989 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3988 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 408 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 694 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3995 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1982 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1012 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 697 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 3992 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 2513 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1070 with run_mode 0
2024-07-11 00:16:49 [DEBUG] Found from=to 1067 with run_mode 0
2024-07-11 00:16:49 [INFO] Total time took 9422 ms
2024-07-11 00:16:49 [INFO] Found a total of 1833 fully identical images (d>0.990), which are 20.37 % of total graph edges
2024-07-11 00:16:49 [INFO] Found a total of 624 nearly identical images(d>0.980), which are 6.93 % of total graph edges
2024-07-11 00:16:49 [INFO] Found a total of 5348 above threshold images (d>0.900), which are 59.42 % of total graph edges
2024-07-11 00:16:49 [INFO] Found a total of 454 outlier images (d<0.050), which are 5.04 % of total graph edges
2024-07-11 00:16:49 [INFO] Min similarity found 0.600 max similarity 1.000
2024-07-11 00:16:49 [INFO]

Example similar files
from,to,distance
700,682,1.000000
699,681,1.000000
1069,1051,1.000000
1068,1050,1.000000
2024-07-11 00:16:49 [INFO] Running connected components for ccthreshold 0.960000
2024-07-11 00:16:49 [DEBUG] 9081 After removing edges removed 4815 edges remained with 4266 h 0
2024-07-11 00:16:49 [INFO] .2024-07-11 00:16:49 [INFO] 02024-07-11 00:16:49 [DEBUG] Last component id was 2787
2024-07-11 00:16:49 [DEBUG] Total component stats size is 2787 last component was 2787
2024-07-11 00:16:49 [DEBUG] Going to store components to file fastdup/connected_components.csv
0%| | 0/3 [00:00<?, ?it/s]
Extracting metadata: 0%| | 0/3 [00:00<?, ?it/s]
NoneType: None
Traceback (most recent call last):
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 178, in
do_visual_layer
run_pipeline(input_dir, pbar)
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/fastdup_runner_pipeline.py",
line 38, in run_pipeline
Settings.DATASET_SIZE_BYTES = normalize_dataset(Settings.DATASET_ID, input_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/vl/utils/useful_decorators.py", line 113, in
wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "", line 124, in normalize_dataset
File "", line 112, in done
File "", line 76, in fatal_error
fastdup.pipeline.common.dataset_db_updater.PipelineFatalError: "Error: Insufficient number of valid images in dataset (0). Minimum required
images: 10". Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff',
'.heic', '.heif', '.bmp', '.webp', '.jfif']

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/sentry.py", line 135, in inner_function
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_controller.py", line 630, in run
do_visual_layer(work_dir=self._work_dir, input_dir=vl_input,
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 194, in
do_visual_layer
raise ExplorationError(e) from e
fastdup.fastdup_runner.utilities.ExplorationError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10".
Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif',
'.bmp', '.webp', '.jfif']
Traceback (most recent call last):
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 178, in
do_visual_layer
run_pipeline(input_dir, pbar)
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/fastdup_runner_pipeline.py",
line 38, in run_pipeline
Settings.DATASET_SIZE_BYTES = normalize_dataset(Settings.DATASET_ID, input_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/vl/utils/useful_decorators.py", line 113, in
wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "", line 124, in normalize_dataset
File "", line 112, in done
File "", line 76, in fatal_error
fastdup.pipeline.common.dataset_db_updater.PipelineFatalError: "Error: Insufficient number of valid images in dataset (0). Minimum required
images: 10". Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff',
'.heic', '.heif', '.bmp', '.webp', '.jfif']

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/PATH/TO/PROJECT/main.py", line 49, in
main()
File "/PATH/TO/PROJECT/main.py", line 35, in main
ps(pb, SPLIT_BEGIN, SPLIT_LIMIT, SPLIT_BATCH_SIZE, SPLIT_DEDUPLICATE)
File "/PATH/TO/PROJECT/captcha_process.py", line 70, in start
fd.run(overwrite=True, verbose=True)
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/engine.py", line 157, in run
return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/sentry.py", line 148, in inner_function
raise ex
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/sentry.py", line 135, in inner_function
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_controller.py", line 630, in run
do_visual_layer(work_dir=self._work_dir, input_dir=vl_input,
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 194, in
do_visual_layer
raise ExplorationError(e) from e
fastdup.fastdup_runner.utilities.ExplorationError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10".
Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif',
'.bmp', '.webp', '.jfif']
Exception ignored in: <function tqdm.del at 0x155416d40>
Traceback (most recent call last):
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 1148, in del
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 1302, in close
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 1495, in display
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 459, in print_status
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 453, in fp_write
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/utils.py", line 196, in inner
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/file_proxy.py", line 53, in flush
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/console.py", line 1674, in print
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/console.py", line 1535, in _collect_renderables
File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/protocol.py", line 28, in rich_cast
ImportError: sys.meta_path is None, Python is likely shutting down

@galbarnissan
Copy link

@lycika-5mzw, thanks for reporting this. That’s indeed a bug in the recent Fastdup version (2.5) with relative file paths, and we’re about to release a fixed version for macOS (2.6) probably by tomorrow. In the meantime, you can use absolute paths as a workaround.

@dbickson
Copy link
Collaborator

Hi @lycika-5mzw version 2.6 is out should fix your issue. Please continue to report any errors, your feedback helps us to improve!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants