-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CreatePredictionEngine overrides inputSchemaDefinition by calling into DataViewConstructionUtils.GetSchemaDefinition #3234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for writing this up @codemzs, but I still have a few questions till I understand the issue fully, specifically, I do not yet understand your first example, which I quote here: var trainedModel = mlContext.Model.Load(stream, out var inputSchema);
var = mlContext.Model.CreatePredictionEngine<TInput, TOutput>(trainedModel, inputSchemaDefinition: inputSchema); The first line appears to be a usage of this method:
The second line appears to be a usage of this method: machinelearning/src/Microsoft.ML.Data/Model/ModelOperationsCatalog.cs Lines 284 to 285 in fdd24cf
I can tell it is that specifically, rather than the overload a few lines below, due to the explicit usage of the parameter My guess is that you meant to use the one with I'm going to move forward under the assumption that my guess is correct, since I'm about 90% sure about that. Anyway, if I suppose you really meant the second overload, I see this method.
I'm also guessing that your desire is to make the input type defined from the input. If you want to have the types of the input schema definition taken from the input schema that seems like a positive change to me. There is an additional subtlety you'll need to take care of, and that is inconsistencies w.r.t. the schema types in the input schema, as well as checking for consistency. For example, imagine the situation where we have columns Then imagine we have this input type: public sealed class Instance
{
public float[] A;
[VectorType]
public float[] B;
[VectorType(5)]
public float[] C;
[VectorType(10)]
public float[] D;
public double[] E;
} I would expect if you are inferring types from the input schema as well, it should succeed in remapping column You'll want to do something similar with keys, though I'd mark that as less urgent. There is another additional problem that I notice just by reading that that method, and that is that in the loop, hidden columns are not skipped, and they kind of have to be... otherwise annotation and type information will be layered on top of each other. Hopefully the above discussion clear of what a desirable design for this would be, let me (or whoever works on it) know if you have questions, and thanks for bringing this up. |
@TomFinley's assumption is correct, the problem is with the overload of To reproduce, change the test LoadModelWithOptionalColumnTransform to use the |
I can reproduce this issue on latest code base with below extra error: System.InvalidOperationException : Type should contain a member named Label after ignore column "Label" on purpose, get below exception: Incompatible features column type: 'Vector<Single>' vs 'Vector<Single, 38>' so 2 issues here, one is the schema override issue, the other is we are loading extra column "Label" from model. |
It seems
CreatePredictionEngine
overridesinputSchemaDefinition
by calling intoDataViewConstructionUtils.GetSchemaDefinition
. This is problematic if some of the features of input schema is determined at runtime, example, length of feature vector.CC: @TomFinley
The text was updated successfully, but these errors were encountered: