OnlineGradientDescent crash #4363

daholste · 2019-10-22T04:16:40Z

System information

OS version/distro: Windows 10
.NET Version (eg., dotnet --info): .NET Core 2.2

Issue

What did you do?
I ran the script:

var mlContext = new MLContext();
var textLoader = mlContext.Data.CreateTextLoader(new TextLoader.Options()
{
	Columns = new TextLoader.Column[]
	{
		new TextLoader.Column("Label", DataKind.Single, 0),
		new TextLoader.Column("0", DataKind.String, 1),
		new TextLoader.Column("1", DataKind.String, 2),
		new TextLoader.Column("2", DataKind.Single, 3),
		new TextLoader.Column("3", DataKind.Single, 4),
		new TextLoader.Column("4", DataKind.Single, 5),
		new TextLoader.Column("5", DataKind.Single, 5),
	},
	HasHeader = true,
	Separators = new char[] {','}
});
var dataView = textLoader.Load(@"dataset.csv");
var pipeline = mlContext.Transforms.Categorical.OneHotHashEncoding("0")
	.Append(mlContext.Transforms.Categorical.OneHotEncoding("1", "3"))
	.Append(mlContext.Transforms.Concatenate("Features", "0", "1", "2", "3", "4", "5"))
	.Append(mlContext.Transforms.NormalizeMinMax("Features"))
	.Append(mlContext.Regression.Trainers.OnlineGradientDescent(new OnlineGradientDescentTrainer.Options()
	{
		LearningRate = 1.0f,
		DecreaseLearningRate = false,
	}));
pipeline.Fit(dataView);

(I can provide the data as requested)

What happened?
I get the exception

System.InvalidOperationException
  HResult=0x80131509
  Message=The weights/bias contain invalid values (NaN or Infinite). Potential causes: high learning rates, no normalization, high initial weights, etc.
  Source=Microsoft.ML.StandardTrainers
  StackTrace:
   at Microsoft.ML.Trainers.OnlineLinearTrainer`2.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at OnlineGradientDescentCrash.Program.Main(String[] args) in C:\Users\daholste\source\repos\OnlineGradientDescentCrash\OnlineGradientDescentCrash\Program.cs:line 38

What did you expect?
Successful training

The text was updated successfully, but these errors were encountered:

gvashishtha · 2019-11-01T16:37:17Z

Does this still happen if you set DecreaseLearningRate to true? Any reason you want a constant learning rate?

justinormont · 2019-11-04T20:00:43Z

Any reason you want a constant learning rate?

Constant learning rates work well in many cases. In AutoML, we sweep over both choices of true/false.

My top guess is that LearningRate = 1.0f, DecreaseLearningRate = false don't work very well together without L2 regularization for this dataset. Defaults are LearningRate = 0.1f, DecreaseLearningRate = true.

I'd recommend building a meta-model to predict failures given a position in the hyperparameter space. Then AutoML can avoid areas that generally throw an error.

It would be nice to attach a debugger and see what the trainer is doing. I'm guessing the model weights are oscillating and growing towards +/- Infinity. Though, it's possible there's a bug in the weight update code leading to the infinity. Stepping thru the trainer's weight update code will tell us.

mstfbl · 2020-01-27T10:52:05Z

Hi @daholste , has your question been addressed by @justinormont 's comments ? If so, please feel free to close the issue.

mstfbl · 2020-03-19T19:34:36Z

Closing this issue as the issue seems to be resolved.

yaeldekel added the P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away. label Jan 9, 2020

mstfbl closed this as completed Mar 19, 2020

aforoughi1 mentioned this issue Feb 5, 2021

How to get around the exception caused because of #5506 fix? #5612

Closed

ghost locked as resolved and limited conversation to collaborators Mar 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OnlineGradientDescent crash #4363

OnlineGradientDescent crash #4363

daholste commented Oct 22, 2019 •

edited

Loading

gvashishtha commented Nov 1, 2019

justinormont commented Nov 4, 2019 •

edited

Loading

mstfbl commented Jan 27, 2020

mstfbl commented Mar 19, 2020

OnlineGradientDescent crash #4363

OnlineGradientDescent crash #4363

Comments

daholste commented Oct 22, 2019 • edited Loading

System information

Issue

gvashishtha commented Nov 1, 2019

justinormont commented Nov 4, 2019 • edited Loading

mstfbl commented Jan 27, 2020

mstfbl commented Mar 19, 2020

daholste commented Oct 22, 2019 •

edited

Loading

justinormont commented Nov 4, 2019 •

edited

Loading