Skip to content

CrossValidation Macros stops working in 1.5.0 #5221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ganik opened this issue Jun 9, 2020 · 1 comment · Fixed by #5227
Closed

CrossValidation Macros stops working in 1.5.0 #5221

ganik opened this issue Jun 9, 2020 · 1 comment · Fixed by #5227
Assignees
Labels
bug Something isn't working P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.

Comments

@ganik
Copy link
Member

ganik commented Jun 9, 2020

ML.NET 1.5.0

Have NimbusML (built with ML.NET 1.5.0, here is PR that does this )
Run the tests in test.cv
test_default_label2 fails with error:
Error: *** System.InvalidOperationException: 'Column 'GroupId' not found' StackTrace: at Microsoft.ML.EntryPoints.TrainerEntryPointsUtils.FindColumn(IExceptionContext ectx, DataViewSchema schema, Optional1 value) at Microsoft.ML.EntryPoints.TrainerEntryPointsUtils.Train[TArg,TOut](IHost host, TArg input, Func1 createTrainer, Func1 getLabel, Func1 getWeight, Func1 getGroup, Func1 getName, Func`1 getCustom, ICalibratorTrainerFactory calibrator, Int32 maxCalibrationExamples)
at Microsoft.ML.Trainers.LightGbm.LightGbm.TrainRanking(IHostEnvironment env, Options input)

This is a regression from ML.NET 1.5.0.preview2

I did some debugging, it appears that once macros is expanded ColumnSelector Transform drops GroupId. ColumnSelector appears to be added by Macros expansion.

@antoniovs1029 antoniovs1029 added bug Something isn't working P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away. labels Jun 10, 2020
@antoniovs1029
Copy link
Member

This issue was introduced in #4828 which was included in the latest 1.5.0 release. Particularly the problem is in the return stratificationColumn here:

if (data.Schema[stratificationColumn].IsNormalized() || (type != NumberDataViewType.Single && type != NumberDataViewType.Double))
return stratificationColumn;
data = new NormalizingEstimator(host,
new NormalizingEstimator.MinMaxColumnOptions(stratCol, stratificationColumn, ensureZeroUntouched: true))
.Fit(data).Transform(data);
}
}
return stratCol;

The problem is that that method (CreateStratificationColumn) used to always create a new column using the HashJoiningTransform, and return the name of the new column. Then that new column was dropped in here in the "Models.CrossValidatorDatasetSplitter" entrypoint:

output.TrainData[i] = ColumnSelectingTransformer.CreateDrop(host, trainData, stratCol);

With the changes of #4828, there's now a new path in CreateStratificationColumn which doesn't create a new column, and return the name of a previously existing column (in the context of this issue, that column is GroupId). Then in the "Models.CrossValidatorDatasetSplitter" entrypoint, the GroupId column gets dropped, which then is a problem when trying to train the LightGBM model, because it requires that column.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away.
Projects
None yet
3 participants