Skip to content

Issue 3234: use model schema type instead of class definition schema #5228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 15, 2020

Conversation

frank-dong-ms-zz
Copy link
Contributor

address issue: #3234

@@ -217,6 +217,7 @@ public class ModelInput
public string[] CategoricalFeatures;
public float[] NumericalFeatures;
#pragma warning restore SA1401
public float Label;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label [](start = 25, length = 5)

add extra column Label as the model modelwithoptionalcolumntransform's schema contains an extra Label column

@frank-dong-ms-zz frank-dong-ms-zz marked this pull request as ready for review June 11, 2020 04:37
@frank-dong-ms-zz frank-dong-ms-zz requested a review from a team as a code owner June 11, 2020 04:37
@codecov
Copy link

codecov bot commented Jun 11, 2020

Codecov Report

Merging #5228 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5228      +/-   ##
==========================================
- Coverage   73.47%   73.45%   -0.02%     
==========================================
  Files        1010     1010              
  Lines      187988   187984       -4     
  Branches    20262    20261       -1     
==========================================
- Hits       138118   138092      -26     
- Misses      44385    44404      +19     
- Partials     5485     5488       +3     
Flag Coverage Δ
#Debug 73.45% <100.00%> (-0.02%) ⬇️
#production 69.27% <100.00%> (-0.02%) ⬇️
#test 87.38% <100.00%> (+<0.01%) ⬆️
Impacted Files Coverage Δ
...soft.ML.Data/DataView/DataViewConstructionUtils.cs 85.51% <100.00%> (+0.02%) ⬆️
test/Microsoft.ML.Functional.Tests/ModelFiles.cs 96.38% <100.00%> (+0.12%) ⬆️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 71.42% <0.00%> (-11.77%) ⬇️
...rosoft.ML.AutoML/Utils/SweepableParamAttributes.cs 51.26% <0.00%> (-6.73%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0.00%> (-1.46%) ⬇️
...Microsoft.ML.Transforms/OptionalColumnTransform.cs 76.87% <0.00%> (-0.87%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 89.13% <0.00%> (-0.16%) ⬇️
src/Microsoft.ML.Core/Data/Repository.cs 79.77% <0.00%> (+0.05%) ⬆️
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 86.89% <0.00%> (+1.74%) ⬆️


//Always use column type from model as this type can be more specific.
//This can be corner case:
//For example, we can load an model whose schema contains Vector<Single, 38>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a check or at least logging for this corner case? Maybe we should put in a check for arrays of size 0, which is what float[] would be. Also is there a check for the size of provided arrays matching the size of the corresponding Vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test LoadModelWithOptionalColumnTransform is to test this case.
for length check, it is applied at below:
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Scorers/SchemaBindablePredictorWrapper.cs#L129-L147


In reply to: 439585282 [](ancestors = 439585282)

@@ -230,15 +231,17 @@ public class ModelOutput
[Fact]
public void LoadModelWithOptionalColumnTransform()
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fix invalidate the old test?

It appears that the the previous test was testing this using SchemaDefinition and to reproduce this issue you are using the one DataViewSchema.

Does it make sense to test both code paths not replace the previous one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the old test is still valid. I'm ok with test both code path not just replace as I don't find a test is test the previous case directly.

This test was originally created at below PR:
#3700

From the conversation you can see the original intention was to use the modified way to write test(load model with schema output then use the schema to create prediction engine), but with the type bug this test was written in the way with extra schema definition.

I will made the change to test both ways of create prediction engine, thanks


In reply to: 440394844 [](ancestors = 440394844)

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@frank-dong-ms-zz frank-dong-ms-zz merged commit 0fcf7ae into dotnet:master Jun 15, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants