Skip to content

LDSVM trainer #4060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 5, 2020
Merged

LDSVM trainer #4060

merged 11 commits into from
Feb 5, 2020

Conversation

yaeldekel
Copy link

Add an LDSVM trainer.
Fixes #4031 .
I will add documentation and samples in the next iterations.

@yaeldekel yaeldekel requested review from eerhardt and codemzs August 4, 2019 11:17
@@ -873,5 +873,32 @@ private static ICalibratorTrainer GetCalibratorTrainerOrThrow(IExceptionContext
Contracts.CheckValue(catalog, nameof(catalog));
return new PriorTrainer(CatalogUtils.GetEnvironment(catalog), labelColumnName, exampleWeightColumnName);
}

/// <summary>
///
Copy link
Member

@eerhardt eerhardt Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to fill these docs out? #Resolved

public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, LdSvmTrainer.Options options)
=> new LdSvmTrainer(catalog.GetEnvironment(), options);

public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,
Copy link
Member

@eerhardt eerhardt Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This public method should have docs too. #Resolved

=> new LdSvmTrainer(catalog.GetEnvironment(), options);

public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,
string labelColumnName = DefaultColumnNames.Label,
Copy link
Member

@eerhardt eerhardt Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't you using labelColumnName, featureColumnName, and exampleWeightColumnName in the method body? #Resolved

IValueMapper,
ICanSaveModel
{
public const string LoaderSignature = "LDSVMBinaryPredictor";
Copy link
Member

@eerhardt eerhardt Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? #Resolved

// badly written and the first thing is it just grabs all instances. If this
// ever changes, do review this return value to make sure it is still
// appropriate.
private static TrainerInfo _info = new TrainerInfo(calibration: true, caching: false);
Copy link
Member

@eerhardt eerhardt Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be private readonly static ? #Resolved

}

// REVIEW: This does not need caching, but only because it's very
// badly written and the first thing is it just grabs all instances. If this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

badly written and the first thing is it just grabs all instances

Does this mean this trainer can't stream data, but instead it needs to load all the data into memory before it can train on it? Is that acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking thru the code, it seemed like a sizable amount of work to bring this to a streaming mode. I didn't see an obvious method to be performant and streaming -- seems like a lots of passes over the dataset. I very well may be missing the obvious though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most recent iteration I added two code paths: one does streaming and one does caching. From running some experiments, the streaming version is around 2x slower. Could you take a look and let me know whether it's worth keeping both?

@justinormont, @harishsk


In reply to: 313606773 [](ancestors = 313606773)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the streaming version 2x slower even in the case of large data? If there is a real tradeoff for large data sizes, then it is worth keeping both.


In reply to: 368572412 [](ancestors = 368572412,313606773)

[Argument(ArgumentType.AtMostOnce, HelpText = "No bias", ShortName = "noBias")]
[TlcModule.SweepableDiscreteParam("NoBias", null, isBool: true)]
public bool NoBias = Defaults.NoBias;

Copy link
Contributor

@harishsk harishsk Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a double negative in naming here. The NoBias term defaults to false. Would it be better to name this variable UseBias and default it to true? #Resolved

@codecov
Copy link

codecov bot commented Dec 26, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@c4e4263). Click here to learn what that means.
The diff coverage is 94.93%.

@@            Coverage Diff            @@
##             master    #4060   +/-   ##
=========================================
  Coverage          ?   75.91%           
=========================================
  Files             ?      953           
  Lines             ?   172850           
  Branches          ?    18663           
=========================================
  Hits              ?   131221           
  Misses            ?    36444           
  Partials          ?     5185
Flag Coverage Δ
#Debug 75.91% <94.93%> (?)
#production 71.54% <94.71%> (?)
#test 90.66% <100%> (?)
Impacted Files Coverage Δ
...est/Microsoft.ML.Predictor.Tests/TestPredictors.cs 69.94% <ø> (ø)
...ft.ML.Tests/TrainerEstimators/TrainerEstimators.cs 93.54% <100%> (ø)
test/Microsoft.ML.TestFramework/Learners.cs 91.35% <100%> (ø)
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.92% <100%> (ø)
....ML.StandardTrainers/LdSvm/LdSvmModelParameters.cs 91.27% <91.27%> (ø)
...icrosoft.ML.StandardTrainers/LdSvm/LdSvmTrainer.cs 95.84% <95.84%> (ø)

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yaeldekel yaeldekel merged commit c819d77 into dotnet:master Feb 5, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a Local Deep Kernel Learning trainer
5 participants