Skip to content

IMonitor.ReportRunningTrial reports incorrect pipeline #6425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrasfuchs opened this issue Nov 2, 2022 · 4 comments
Open

IMonitor.ReportRunningTrial reports incorrect pipeline #6425

andrasfuchs opened this issue Nov 2, 2022 · 4 comments
Labels
AutoML.NET Automating various steps of the machine learning process
Milestone

Comments

@andrasfuchs
Copy link
Contributor

andrasfuchs commented Nov 2, 2022

System Information (please complete the following information):

  • OS & Version: Windows 11 [Version 10.0.22621.675]
  • ML.NET Version: ML.NT v2.0.0-preview.22525.3
  • .NET Version: .NET 6.0.9

Describe the bug
I created a custom class that implements the Microsoft.ML.AutoML.IMonitor to monitor my training. It's ReportRunningTrial method has the following simple implementation:
_logger.LogInformation($"{"Running".PadLeft(9)} Trial #{setting.TrialId.ToString().PadLeft(4)} - Pipeline: {_pipeline.ToString(setting.Parameter).PadRight(70)}");

The other, ReportBestTrial, ReportCompletedTrial and ReportFailTrial methods look very similar.

When I run my training it looks like that the ReportRunningTrial method's pipeline object wasn't updated, so it doesn't show the pipeline of the current trial.

To Reproduce
Steps to reproduce the behavior:

  1. Create a custom class that implements IMonitor
  2. Create an ML.NET experiment, and use that monitor class, like this:
var experiment = mlContext.Auto().CreateExperiment();

experiment
    .SetPipeline(pipeline)
    .SetTrainingTimeInSeconds(trainingTimeInSeconds)
    .SetRegressionMetric(RegressionMetric.RSquared, labelColumn: "Label")
    .SetDataset(trainTestData.TrainSet, trainTestData.TestSet)
    .SetMonitor(monitor)
    .SetMaximumMemoryUsageInMegaByte(20 * 1024);
  1. Make sure that you allow more than one type of (regression) algorithms
  2. Run the experiment
  3. Check the output: you will see that the settings.TrialId property values look good, but the _pipeline.ToString(setting.Parameter) returns an invalid value in the ReportRunningTrial method.

Expected behavior
The _pipeline.ToString(setting.Parameter) should return the just-started trial's pipeline.

Screenshots, Code, Sample Projects
image

Additional context
You can see the problem on the screenshot:
Running Trial 0's pipeline looks like it was FastTreeRegression, but in reality it's FastForestRegression.
Running Trial 1's pipeline looks like it was FastForestRegression, but it's FastTreeRegression.
Running Trial 2's pipeline looks like it was FastForestRegression, but it's LightGbmRegression.
etc.

Edit: After testing a little more I'm not sure anymore which methods of the IMonitor interface report the pipeline wrong, it is possible that ReportRunningTrial is the correct one and the others are incorrect.

@ghost ghost added the untriaged New issue has not been triaged label Nov 2, 2022
@michaelgsharp michaelgsharp added this to the ML.NET 3.0 milestone Nov 28, 2022
@ghost ghost removed the untriaged New issue has not been triaged label Nov 28, 2022
@michaelgsharp michaelgsharp added the AutoML.NET Automating various steps of the machine learning process label Nov 28, 2022
@luisquintanilla
Copy link
Contributor

@LittleLittleCloud thoughts on this one?

@andrasfuchs
Copy link
Contributor Author

After I looked into the ML.NET code a little more, my current understanding is that we don't know the last estimator when the IMonitor's ReportRunningTrial() event is called. The last estimator is set only after we call runner.RunAsync() and that's why the "Running Trial" lines in my log show the pipeline incorrectly (they have a previously set value).

I modified my event handler code to not show the pipeline at that point, but I think it would be a little better if the TrialSettings class that is passed to all IMonitor event handlers would contain a nullable property with the last (or all) estimator(s).

@LittleLittleCloud
Copy link
Contributor

@andrasfuchs does the original monitor also print incorrect pipeline

@andrasfuchs
Copy link
Contributor Author

You mean the MLContextMonitor class in IMonitor.cs? Yes, I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AutoML.NET Automating various steps of the machine learning process
Projects
None yet
Development

No branches or pull requests

4 participants