Skip to content

LightGBM Save/Load round trip loses Softmax OneVersusAllModelParameters #3647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rauhs opened this issue May 2, 2019 · 2 comments · Fixed by #4472
Closed

LightGBM Save/Load round trip loses Softmax OneVersusAllModelParameters #3647

rauhs opened this issue May 2, 2019 · 2 comments · Fixed by #4472
Labels
bug Something isn't working P1 Priority of the issue for triage purpose: Needs to be fixed soon.

Comments

@rauhs
Copy link
Contributor

rauhs commented May 2, 2019

version: 0.11

Related: #1424

When training a softmax Multi classifier with LightGBM a save/load will lose the softmax it seems:

Also inspecting the model we can see it doesn't use ImplSoftmax but ImplRaw.

Reproduce:

    public class GenericSample
    {
      public string A { get; set; }
      public string Label { get; set; }
    }
    public static void ReproduceLightGbmPersistanceBug()
    {
      var data = Enumerable.Range(1, 100).Select(x => new GenericSample { A = $"{x % 20}", Label = $"{x % 10}" });
      var ctx = new MLContext();

      var options = new Options {
        UseSoftmax = true,
      };
      var pipe = ctx.Transforms.Categorical.OneHotEncoding("A")
        .Append(ctx.Transforms.Concatenate("Features", "A"))
        .Append(ctx.Transforms.Conversion.MapValueToKey("Label"))
        .Append(ctx.MulticlassClassification.Trainers.LightGbm(options));
      var dataView = ctx.Data.LoadFromEnumerable(data);
      ITransformer model = pipe.Fit(dataView);
      var scores = model.Transform(dataView).GetColumn<float[]>(ctx,"Score");

      Console.WriteLine($"Min: {scores.Select(x => x.Min()).Min()}");
      Console.WriteLine($"Max: {scores.Select(x => x.Max()).Max()}");

      var memoryStream = new MemoryStream();
      ctx.Model.Save(model, memoryStream);
      model = ctx.Model.Load(memoryStream);

      scores = model.Transform(dataView).GetColumn<float[]>(ctx,"Score");
      Console.WriteLine($"Min: {scores.Select(x => x.Min()).Min()}");
      Console.WriteLine($"Max: {scores.Select(x => x.Max()).Max()}");
    }

Output:

Min: 0.001027671
Max: 0.9907509
Min: -4.843706
Max: 2.027462
@najeeb-kazmi najeeb-kazmi added the bug Something isn't working label May 8, 2019
@najeeb-kazmi
Copy link
Member

Reproduces with 1.0.0 as well. Classifying as bug.

This behavior is not present in other multiclass trainers SdcaMaximumEntropy, LbfgsMaximumEntropy, and OneVersusAll. Could be related to calibration not being set to true in the TrainerInfo for LightGbmTrainerBase.

@wschin wschin added the P1 Priority of the issue for triage purpose: Needs to be fixed soon. label May 21, 2019
@drake7707
Copy link

Is there a workaround for this? I'd like to combine multiple models together, all trained with LightGbm, but I currently can't combine the predictions because the min/max scores range of each model is different. If both models were to output the 0-1 range I could aggregate the probabilities more easily.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working P1 Priority of the issue for triage purpose: Needs to be fixed soon.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants