Skip to content

Conversation

@JirenJin
Copy link
Contributor

@JirenJin JirenJin commented Jul 1, 2017

As mentioned in #192, current make_dataset function for ImageFolder may create datasets with different orders because os.listdir(dir) might produce different results on different machines.

This can be inconvenient if the user wants to compare or ensemble the outputs of the full dataset (for example, the validation set).

A simple modification for target in sorted(os.listdir(dir)): solves the problem. And since the loading procedure itself can be shuffled, I have not found the possible problems can be caused by this modification.

@JirenJin
Copy link
Contributor Author

JirenJin commented Jul 1, 2017

Also, it is difficult to reproduce the results on different machines. Even setting random seed will not produce the same ordered shuffle in training because the original orders of data are different.

@soumith soumith merged commit 08b1f59 into pytorch:master Jul 2, 2017
@soumith
Copy link
Member

soumith commented Jul 2, 2017

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants