Skip to content

Application crashes inside Docker containing when using .Fit() #5155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shywa opened this issue May 23, 2020 · 2 comments
Closed

Application crashes inside Docker containing when using .Fit() #5155

shywa opened this issue May 23, 2020 · 2 comments

Comments

@shywa
Copy link

shywa commented May 23, 2020

When creating a model for a recommender system that uses MatrixFactorization, the Docker Container crashes on an Ubuntu server without further notice.

The only note in the kernel log is
kernel: [12922.080806] traps: dotnet[30957] trap invalid opcode ip:7f07d81b5efc sp:7ffdc5965110 error:0 in libMatrixFactorizationNative.so[7f07d81a5000+2a000]

In the local version the recommender system and the training of the model are working.

System information

  • OS version/distro:
    Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-101-generic x86_64)
  • .NET Version (eg., dotnet --info):
    Host (useful for support):
    Version: 3.1.4
    Commit: 0c2e69caa6
    .NET Core SDKs installed:
    No SDKs were found.
    .NET Core runtimes installed:
    Microsoft.AspNetCore.App 3.1.4 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
    Microsoft.NETCore.App 3.1.4 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Issue

  • What did you do?
    Starting training of a new model using matrix factorization
  • What happened?
    The application and the Docker Container crash without further message or exception.
  • What did you expect?
    Training of a new model or at least an exception

Source code / logs

Implementation:

Log.Information("Extracting train data...");
var trainingData = GetDataView(trainData);

var options = new MatrixFactorizationTrainer.Options
{
    MatrixColumnIndexColumnName = UserIdEncoding,
    MatrixRowIndexColumnName = MusicIdEncoding,
    LabelColumnName = "Label",
    NumberOfIterations = 20,
    ApproximationRank = 100,
    //Quiet = false
};
Log.Information("Setting Matrix Factorization");
var trainingPipeline = trainingData.Transformer.Append(
    MLContext.Recommendation().Trainers.MatrixFactorization(options));

Log.Information("Starting training...");
ITransformer trainedModel = trainingPipeline.Fit(trainingData.DataView);

Log.Information("Saving model...");
MLContext.Model.Save(trainedModel, trainingData.DataView.Schema, ModelPath);

Log.Information("Extracting test data..."); ;
var testingData = GetDataView(testData);

Log.Information("Starting model testing...");
var testingTransform = trainedModel.Transform(testingData.DataView);

Log.Information("Evaluating model");
return MLContext.Recommendation().Evaluate(testingTransform);

Container Logs:

[13:45:25 Information]
Preparing prediction Model

[13:45:25 Information]
Starting Model Training...

[13:45:25 Information]
Extracting train data...

[13:45:25 Information]
Setting Matrix Factorization

[13:45:25 Information]
Starting training...

Warning: insufficient blocks may slow down the trainingprocess (4*nr_threads^2+1 blocks is suggested)
Warning: insufficient blocks may slow down the trainingprocess (4*nr_threads^2+1 blocks is suggested)
--> Application crash
@antoniovs1029
Copy link
Member

antoniovs1029 commented May 24, 2020

I wonder if this is a duplicate of #5067

If it is, then that issue was already fixed on #5071, and the fix should be available on the upcoming ML.NET version 1.5 release.

@mstfbl
Copy link
Contributor

mstfbl commented May 26, 2020

Hey @shywa ,

I made an Ubuntu Docker model, and installed our latest ML .NET build on our current master branch. I then ran the code snippet you've provided with our training and test dataset for testing MF. I did not obtain the warning you've listed. As @antoniovs1029 said, this issue has been fixed in PR #5071 , and will be available in our upcoming v1.5 release!

As such, I'm closing this issue. Please feel free to reopen if you are receiving further errors on our v1.5 release. Thanks.

@mstfbl mstfbl closed this as completed May 26, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants