-
Notifications
You must be signed in to change notification settings - Fork 1.9k
LoadImages not warning that input column is empty and ignoring imageFolder parameter in such a case #4429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Antonio, PR #4665 should fix the bug you raise with the |
Hi Antonio, In addition, the LoadImages parameter here is being ignored when it is empty because of the snippet below: machinelearning/src/Microsoft.ML.ImageAnalytics/ExtensionsCatalog.cs Lines 55 to 73 in fa75f4f
If inputColumnName is empty/null, then instead of inputColumnName being returned, it is instead outputColumnName that is being returned. The |
Hi @mstfbl . Thanks for taking a look into this. As I've commented on your PR, I am afraid I didn't explained clearly the issue. My problem isn't with the So if you take the sample I provided in the first post of this issue, and run it, and take
As you can see If you look into the other rows (which also had an empty As I've mentioned in the first post, I understand this is desired when working with big datasets where some rows have empty values in their A solution to this might be to modify the ImageLoader.cs file content in order to count how many empty inputs where there, and then check if the number of rows is the same as the number of empty columns... if they're the same, then throw an exception, or at least show a warning. |
Also, I've just added a solution.zip file inside my first post in this issue, so to make it easier for you to debug this. Thanks! |
PR #4691 did fix the problem of throwing an exception when the imageFolder doesn't exist, but it didn't fix the problem of not warning that the input column had only empty values, so I am reopening the issue. |
Hey @antoniovs1029 , I ran the sample you provided, and saw that the |
Closing this issue as #4831 is merged. |
I don't know if this is actually an issue, but it's something that I noticed when working with the LoadImages method, and perhaps it is necessary to warn the user that this could happen, whether at runtime, or in the documentation.
Issue
As I show in the source code I provide below, if all the values of the input column of a LoadImages transform are empty when fitting a pipeline, then the code will still run and not give any warning whatsoever, even though no image is actually loaded to train the model. The transform would also appear to work when transforming an input Data View which uses an empty column as input of the LoadImages, and, in the example I provide, the pipeline would still assign a predicted label to each row of the input data view bein transformed, even if no image was actually loaded.
I show this by exemplifying two main cases in which this could happen:
When the user loads its data through a method such as LoadFromTextFile, with an input file that only has 2 columns, but the user specifies that the ImagePath column (to be used as input column of the LoadImages method) is the 3rd column inside of the file. This kind of scenario could happen if the user makes a typo in the ModelInput class, or if the user (perhaps mistakenly) passes an input file that doesn't contain an image path column.
When the user loads its data through a method such as LoadFromEnumerable from an array where all of the objects provide either a null or an empty string value to the ImagePath column.
Also notice in my code that in both cases the LoadImages transform also ignores whatever is passed as the imageFolder parameter, since because there are actually no ImagePaths, it will never try to load images. If there was at least one ImagePath in the input dataview, then LoadImages would actually try to load that image using the imageFolder parameter, and an exception is correctly thrown if the folder doesn't exist.
Why is this a problem?
I would understand this behavior happening if the input data view doesn't provide an image path for some of the rows, specially when working with big datasets. But I think it becomes a problem if there's actually no image loaded, and the whole thing appears to work without a warning, like in the example I provided. If a user unknowingly makes a mistake that leads to this problem, then s/he might believe that the model was actually correctly trained with actual images, or that it actually transformed an input where no imagePath was provided. This problem might be harder to spot in more complex pipelines or input files.
Also the fact that the imageFolder parameter gets ignored in this case seems odd to me, as I would have expected an exception to be thrown if a user passes an inexistent folder path to the LoadImages transformer, regardless of the content of the ImagePath column.
Source code and input file
Download solution:
solution.zip
Or look into the code in here:
https://gist.github.com/antoniovs1029/997ca183411f173e81a131f09722b092
The text was updated successfully, but these errors were encountered: