-
Notifications
You must be signed in to change notification settings - Fork 1.9k
LDSVM trainer #4060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDSVM trainer #4060
Conversation
@@ -873,5 +873,32 @@ private static ICalibratorTrainer GetCalibratorTrainerOrThrow(IExceptionContext | |||
Contracts.CheckValue(catalog, nameof(catalog)); | |||
return new PriorTrainer(CatalogUtils.GetEnvironment(catalog), labelColumnName, exampleWeightColumnName); | |||
} | |||
|
|||
/// <summary> | |||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to fill these docs out? #Resolved
public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, LdSvmTrainer.Options options) | ||
=> new LdSvmTrainer(catalog.GetEnvironment(), options); | ||
|
||
public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This public method should have docs too. #Resolved
=> new LdSvmTrainer(catalog.GetEnvironment(), options); | ||
|
||
public static LdSvmTrainer LdSvm(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog, | ||
string labelColumnName = DefaultColumnNames.Label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't you using labelColumnName
, featureColumnName
, and exampleWeightColumnName
in the method body? #Resolved
IValueMapper, | ||
ICanSaveModel | ||
{ | ||
public const string LoaderSignature = "LDSVMBinaryPredictor"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public? #Resolved
// badly written and the first thing is it just grabs all instances. If this | ||
// ever changes, do review this return value to make sure it is still | ||
// appropriate. | ||
private static TrainerInfo _info = new TrainerInfo(calibration: true, caching: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be private readonly static
? #Resolved
} | ||
|
||
// REVIEW: This does not need caching, but only because it's very | ||
// badly written and the first thing is it just grabs all instances. If this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
badly written and the first thing is it just grabs all instances
Does this mean this trainer can't stream data, but instead it needs to load all the data into memory before it can train on it? Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking thru the code, it seemed like a sizable amount of work to bring this to a streaming mode. I didn't see an obvious method to be performant and streaming -- seems like a lots of passes over the dataset. I very well may be missing the obvious though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the most recent iteration I added two code paths: one does streaming and one does caching. From running some experiments, the streaming version is around 2x slower. Could you take a look and let me know whether it's worth keeping both?
In reply to: 313606773 [](ancestors = 313606773)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the streaming version 2x slower even in the case of large data? If there is a real tradeoff for large data sizes, then it is worth keeping both.
In reply to: 368572412 [](ancestors = 368572412,313606773)
[Argument(ArgumentType.AtMostOnce, HelpText = "No bias", ShortName = "noBias")] | ||
[TlcModule.SweepableDiscreteParam("NoBias", null, isBool: true)] | ||
public bool NoBias = Defaults.NoBias; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a double negative in naming here. The NoBias term defaults to false. Would it be better to name this variable UseBias and default it to true? #Resolved
Codecov Report
@@ Coverage Diff @@
## master #4060 +/- ##
=========================================
Coverage ? 75.91%
=========================================
Files ? 953
Lines ? 172850
Branches ? 18663
=========================================
Hits ? 131221
Misses ? 36444
Partials ? 5185
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an LDSVM trainer.
Fixes #4031 .
I will add documentation and samples in the next iterations.