AutoML Regression Experiment fails after 67iterations #4906

francescomazzurco · 2020-03-02T17:42:05Z

Hi,

When running a Regression Experiment, AutoML sistematically fails after 67 iterations, raising the Exception "All instances skipped due to missing features". By looking at other issues, I got the idea that the SmacSweeper could be the cause. This is also suggested by the stack strace:

in Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl.MakeBoundariesAndCheckLabels(Int64& missingInstances, Int64& totalInstances)
   in Microsoft.ML.Trainers.FastTree.DataConverter.MemImpl..ctor(RoleMappedData data, IHost host, Double[][] binUpperBounds, Single maxLabel, Boolean dummy, Boolean noFlocks, PredictionKind kind, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   in Microsoft.ML.Trainers.FastTree.DataConverter.Create(RoleMappedData data, IHost host, Int32 maxBins, Single maxLabel, Boolean diskTranspose, Boolean noFlocks, Int32 minDocsPerLeaf, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeatureIndices, Boolean categoricalSplit)
   in Microsoft.ML.Trainers.FastTree.ExamplesToFastTreeBins.FindBinsAndReturnDataset(RoleMappedData data, PredictionKind kind, IParallelTraining parallelTraining, Int32[] categoricalFeaturIndices, Boolean categoricalSplit)
   in Microsoft.ML.Trainers.FastTree.FastTreeTrainerBase`3.ConvertData(RoleMappedData trainData)
   in Microsoft.ML.Trainers.FastTree.FastForestRegressionTrainer.TrainModelCore(TrainContext context)
   in Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   in Microsoft.ML.AutoML.SmacSweeper.FitModel(IEnumerable`1 previousRuns)
   in Microsoft.ML.AutoML.SmacSweeper.ProposeSweeps(Int32 maxSweeps, IEnumerable`1 previousRuns)
   in Microsoft.ML.AutoML.PipelineSuggester.SampleHyperparameters(MLContext context, SuggestedTrainer trainer, IEnumerable`1 history, Boolean isMaximizingMetric)
   in Microsoft.ML.AutoML.PipelineSuggester.GetNextInferredPipeline(MLContext context, IEnumerable`1 history, DatasetColumnInfo[] columns, TaskKind task, Boolean isMaximizingMetric, CacheBeforeTrainer cacheBeforeTrainer, IEnumerable`1 trainerWhitelist)
   in Microsoft.ML.AutoML.Experiment`2.Execute()
   in Microsoft.ML.AutoML.ExperimentBase`2.Execute(ColumnInformation columnInfo, DatasetColumnInfo[] columns, IEstimator`1 preFeaturizer, IProgress`1 progressHandler, IRunner`1 runner)
   in Microsoft.ML.AutoML.ExperimentBase`2.Execute(IDataView trainData, ColumnInformation columnInformation, IEstimator`1 preFeaturizer, IProgress`1 progressHandler)

However, compared to the other issues, I'm running a console application, I'm loading data from database with no missing values. and I hopefully have the right NuGet dependencies:

Microsoft.ML.AutoML and Microsoft.ML.Recommender: 0.16.0
Microsoft.ML and all the other ML packages: 1.4.0

I understand that the problem might be caused by some of the third-party libraries ML depends on, but isn't at least possible to ignore the exception thrown by a single trainer without compromising the whole regression experiment? I would like to be able to access the BestRun object and choose the best out of the first 67 experiments without having to look back at the CacheDirectory.

If necessary, I can generate a csv with all the data used for training.

Thanks

The text was updated successfully, but these errors were encountered:

mstfbl · 2020-03-02T18:42:28Z

Hi @francescomazzurco , please send along a .csv example with which we can reproduce this issue.

francescomazzurco · 2020-03-03T08:42:31Z

Hi @mstfbl, I'm now creating a small working example along with the .csv, but I am encountering difficulties in reproducing the issue. I'll dig into it and give you updates by the end of the day

francescomazzurco · 2020-03-03T09:57:22Z

Ok, I found the problem. I could reproduce the exception only on one of our computers, so I finally realised that the issue is related to culture, even when data is loaded from memory and there is no parsing. In the project I attached, data is parsed and loaded using invariant culture. Then, a non-english culture is set just before running the experiment.

   var mlContext = new MLContext();
   List<Model> models = ReadCsv(@"data\data.csv");
   var dataView = BuildDataView(mlContext, models);
   var experimentSettings = new RegressionExperimentSettings
   {
         MaxExperimentTimeInSeconds = 600,
          CacheDirectory = new DirectoryInfo(@".\cache"),
   };
   var experiment = mlContext.Auto().CreateRegressionExperiment(experimentSettings);
   // Data has already been parsed using invariant culture 
   CultureInfo.DefaultThreadCurrentCulture = CultureInfo.CreateSpecificCulture("it-IT");
   var bestRun = experiment.Execute(dataView).BestRun;

The exception is thrown after the 67th iteration.
TestML.zip

Now I've seen other issues related to culture, not sure if they are reporting the same issue but in such case feel free to close this issue. Thanks

justinormont · 2020-03-03T12:33:24Z

@francescomazzurco: This should be fixed in the next release (v1.5.0-preview2). There was a fix added in January to use culture invariant when sweeping parameter values -- #4635.

You can test against the nightly NuGet feed by adding https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json as a NuGet source in Visual Studio. Feed details: https://dev.azure.com/dnceng/public/_packaging?_a=connect&feed=MachineLearning.

francescomazzurco · 2020-03-03T13:46:31Z

Hi @justinormont, thanks for your reply.
I tested against the nightly build, no exception is thrown anymore, however the regression experiment hangs forever and does not complete the 68th training. Nothing happens even after MaxExperimentTimeInSeconds (I expected the experiment to abort after such time).
Interestingly, this behaviour only occurs when setting a non-english culture, so it seems that culture still has effects on the SmacSweeper.

I published the working example here: https://github.com/francescomazzurco/TestML

justinormont · 2020-03-03T17:52:57Z

@LittleLittleCloud: Do you have time to investigate?

LittleLittleCloud · 2020-03-03T21:38:03Z

I will take a look

DiegoStefanon · 2020-04-28T01:33:29Z

Hi I am Diego S. , from Italy.
I have the same issue ...
CreateBinaryClassificationExperiment is good
CreateRegressionExperiment fail..
only if I set
CultureInfo.DefaultThreadCurrentCulture = CultureInfo.CreateSpecificCulture("en-EN");
it works.
The data is good, not nulls.
So I think it a bug.
I get the data from a database.
package ML.AutoML 0.16

francescomazzurco · 2020-09-03T13:44:43Z

Quick update: I just tested against v.0.17.1 and the bug is still there. Same behavior: the 68th iteration hang forever and never completes.

justinormont · 2020-11-03T10:08:33Z

@francescomazzurco: I believe this fixed now. It will be available in the next release. Or you can run against the nightly build, as outlined above.

francescomazzurco · 2020-11-27T09:21:07Z

@justinormont I just tested against v.0.17.3-29420-1 from October 20th, but the bug is still there. I see there are newer builds, but I am not able to install them as NuGet can not find package MlNetMklDepsCode

justinormont · 2020-11-27T11:33:07Z

@francescomazzurco: You'll need a nightly build or release after 2020-10-30 as the fix went in then.

@harishsk: Any guess why the nightly won't install for @francescomazzurco?

antoniovs1029 · 2020-11-30T18:21:47Z

@justinormont @francescomazzurco

As part of moving into arcade, we've published some nugets that have a bug, where it requires the MlNetMklDepsCode nuget to work. This is a bug, and we're working on fixing it. Those nugets should be ignored for the time being.

Also, there had been some problems with publishing nugets from master (which are the ones required by @francescomazzurco ), and so I believe there hasn't been any nuget published correctly from master since October 20th. So I don't think there's any public nuget including the change made on October 30, Justin is referring to. This problem was on Azure DevOps side, and should be fixed now. So I'll run a manual build to publish nugets from master branch, and hopefully it will work. I'll update this thread with info about that. Thanks.

antoniovs1029 · 2020-11-30T19:05:23Z

There are some problems with our nuget publishing pipeline. Working on that now, I'll update this thread once the nuget is published.

antoniovs1029 · 2020-11-30T21:56:43Z

The nugets has just been published to the public feed.
@francescomazzurco , please, try version 0.17.3-29530-4 from the feed, it should work now.
Thanks.

francescomazzurco · 2020-12-02T09:04:21Z

I was able to successfully install the most recent build from today ( 0.17.3-29602-5 ) which indeed solves the bug. Feel free to close the issue. Thanks for the support

mstfbl added Azure AutoML https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml bug Something isn't working P2 Priority of the issue for triage purpose: Needs to be fixed at some point. labels Mar 2, 2020

justinormont added AutoML.NET Automating various steps of the machine learning process and removed Azure AutoML https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml labels Mar 3, 2020

frank-dong-ms-zz assigned LittleLittleCloud Oct 20, 2020

justinormont linked a pull request Oct 27, 2020 that will close this issue

Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment #5163

Merged

antoniovs1029 closed this as completed in #5163 Oct 30, 2020

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML Regression Experiment fails after 67iterations #4906

AutoML Regression Experiment fails after 67iterations #4906

francescomazzurco commented Mar 2, 2020

mstfbl commented Mar 2, 2020

francescomazzurco commented Mar 3, 2020

francescomazzurco commented Mar 3, 2020

justinormont commented Mar 3, 2020

francescomazzurco commented Mar 3, 2020 •

edited

Loading

justinormont commented Mar 3, 2020

LittleLittleCloud commented Mar 3, 2020

DiegoStefanon commented Apr 28, 2020 •

edited

Loading

francescomazzurco commented Sep 3, 2020

justinormont commented Nov 3, 2020

francescomazzurco commented Nov 27, 2020

justinormont commented Nov 27, 2020

antoniovs1029 commented Nov 30, 2020

antoniovs1029 commented Nov 30, 2020

antoniovs1029 commented Nov 30, 2020

francescomazzurco commented Dec 2, 2020

AutoML Regression Experiment fails after 67iterations #4906

AutoML Regression Experiment fails after 67iterations #4906

Comments

francescomazzurco commented Mar 2, 2020

mstfbl commented Mar 2, 2020

francescomazzurco commented Mar 3, 2020

francescomazzurco commented Mar 3, 2020

justinormont commented Mar 3, 2020

francescomazzurco commented Mar 3, 2020 • edited Loading

justinormont commented Mar 3, 2020

LittleLittleCloud commented Mar 3, 2020

DiegoStefanon commented Apr 28, 2020 • edited Loading

francescomazzurco commented Sep 3, 2020

justinormont commented Nov 3, 2020

francescomazzurco commented Nov 27, 2020

justinormont commented Nov 27, 2020

antoniovs1029 commented Nov 30, 2020

antoniovs1029 commented Nov 30, 2020

antoniovs1029 commented Nov 30, 2020

francescomazzurco commented Dec 2, 2020

francescomazzurco commented Mar 3, 2020 •

edited

Loading

DiegoStefanon commented Apr 28, 2020 •

edited

Loading