Skip to content

Support lazy loading MLImage when training. #6474

Open
@vpenades

Description

@vpenades

Is your feature request related to a problem? Please describe.
When training with images, it is required to feed a large collection of images into a column dataset. This is typically done by setting the file path to each image, which is late loaded when needed.

But this is, assuming the image is stored in the hard drive, and not somewhere else, or if it requires some custom transformation, or it doesn't exist at all and it's procedurally generated.

For such case scenarios the proposed solution is to use custom transformers. I've tried them and I've found them needlessly complex and hard to understand, most probably due to lack of documentation and proper examples dealing with images.

Describe the solution you'd like
I think a simpler solution would be to support Lazy<MLImage> (or some kind of factory interface, or a Func) as a DataSet column. that would greatly simplify development and would not require the use of custom transformers.

Describe alternatives you've considered
Using custom transformers, which is ugly.

Additional context
Another point of concern is that MLImage is a disposable object, I have no clue at which point ML disposes of the images already used for training (or if it ever does it) or who's responsible of disposing, ML or the developer. Certainly having datatype columns that require to be disposed makes things a lot more complicated that they should be, I believe MLImage should have been made non disposable in the first place.

So maybe by supporting this kind of late image creation/loading and disposing, the memory management of images would be easier to understand.

Futhermore, by allowing MLImage to be late loaded, it also opens the possibility to the developer to preprocess the image in any way it seems fit, using any image processing available, and not limited to the image transformations provided by ML. In this case, it could be interesting to have an interface or a function that would pass the expected input image size and format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority of the issue for triage purpose: Needs to be fixed at some point.enhancementNew feature or requestimageBugs related image datatype tasks

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions