-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fixing parameters in ML.NET Public API #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing parameters in ML.NET Public API #2665
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2665 +/- ##
=========================================
Coverage ? 71.65%
=========================================
Files ? 804
Lines ? 141872
Branches ? 16111
=========================================
Hits ? 101654
Misses ? 35781
Partials ? 4437
|
@@ -29,7 +29,7 @@ public static class KMeansClusteringExtensions | |||
/// </example> | |||
public static KMeansPlusPlusTrainer KMeans(this ClusteringCatalog.ClusteringTrainers catalog, | |||
string featureColumn = DefaultColumnNames.Features, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
featureColumn [](start = 18, length = 13)
So we're calling this exampleWeightColumnName
, but calling the above featureColumn
. I'm not terribly insistent one way or the other, but we should at least be consistent I think.
In the original issue you proposed calling this * exampleWeightColumn
, which seems to me to be consistent with the existing behavior here. Yet elsewhere I see that we are calling things labelColumn
and labelColumnName
here and there.
Is this on someone's radar? If we are not going to fix it (we should!) a second best alternative would be to be at least locally consistent. Not sure what @rogancarr and @eerhardt and @sfilipi think? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #2177 proposes we use suffix ColumnName consistently
In this PR I am making that change for weight, groupid parameters. In a follow up PR I will do it for label and features.
So once both PRs go in, we will mark #2177 as complete, and the following convention will be used
- labelColumnName
- featureColumnName
- exampleWeightColumnName
- rowGroupColumnName
This is certainly on my radar, as we drive towards consistency in the APIs . I have also updated the description of the issue so it has reference to issue #2177 . Hope that sounds good ? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good, thank you @abgoswam ! Sorry missed that discussion. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomFinley. I have consolidated all the changes in this PR. So this PR addresses {features, labels, groupid, weight} column names in a consistent manner.
In reply to: 258779314 [](ancestors = 258779314)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomFinley. I have consolidated all the changes in this PR. So this PR addresses {features, labels, groupid, weight} column names in a consistent manner.
In reply to: 258776812 [](ancestors = 258776812)
{ | ||
Contracts.CheckValue(catalog, nameof(catalog)); | ||
var env = CatalogUtils.GetEnvironment(catalog); | ||
var options = new OlsLinearRegressionTrainer.Options | ||
{ | ||
LabelColumn = labelColumnName, | ||
FeatureColumn = featureColumnName, | ||
WeightColumn = weightColumn != null ? Optional<string>.Explicit(weightColumn) : Optional<string>.Implicit(DefaultColumnNames.Weight) | ||
WeightColumn = exampleWeightColumnName != null ? Optional<string>.Explicit(exampleWeightColumnName) : Optional<string>.Implicit(DefaultColumnNames.Weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh oh. Am I wrong or is there literally no way to have non-weighted training if you happen to have a column named Weight
????
It looks like there's an existing issue #1065 that is capturing this. Not your code, so don't have to fix, I just notice this way. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I see it's incidentally assigned to you @abgoswam . 😄 Maybe after this PR we can think about how to address this. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We should fix this behavior as part of #1065 (in a separate PR). #Closed
src/Microsoft.ML.PCA/PCACatalog.cs
Outdated
/// <param name="rank">The number of principal components.</param> | ||
/// <param name="overSampling">Oversampling parameter for randomized PrincipalComponentAnalysis training.</param> | ||
/// <param name="center">If enabled, data is centered to be zero mean.</param> | ||
/// <param name="seed">The seed for random number generation.</param> | ||
public static PrincipalComponentAnalysisEstimator ProjectToPrincipalComponents(this TransformsCatalog.ProjectionTransforms catalog, | ||
string outputColumnName, | ||
string inputColumnName = null, | ||
string weightColumn = PrincipalComponentAnalysisEstimator.Defaults.WeightColumn, | ||
string exampleWeightColumnName = PrincipalComponentAnalysisEstimator.Defaults.WeightColumn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting this to null
will be covered in #1065, right? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perhaps #1065 would be the right place to set this to null
as default value, since that issue related to changing the behavior of the weights parameters.
The issues addressed by this PR are all related to consistency in API naming convention. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="weights">The optional weights column.</param> | ||
/// <param name="labelColumnName">The name of the label column.</param> | ||
/// <param name="featureColumnName">The name of the feature column.</param> | ||
/// <param name="rowGroupColumnName">The name of the group column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary for this PR, but I wonder if we need some documentation around the row group
column. I'm obviously not an expert, but I have no idea what this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi...#2536 gives some context on this. we can add more documentation around this in follow up documentation focused PRs.
@@ -513,7 +513,7 @@ private static void RunEndToEnd(MLContext mlContext, IDataView trainData, string | |||
// Construct the learning pipeline. Note that we are now providing a contract name for the custom mapping: | |||
// otherwise we will not be able to save the model. | |||
var estimator = mlContext.Transforms.CustomMapping<InputRow, OutputRow>(CustomMappings.IncomeMapping, nameof(CustomMappings.IncomeMapping)) | |||
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumn: "Label")); | |||
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: "Label")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This usually means the cookbook .md file needs to be updated as well. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making me take a second look at this. Updated the cookbook .md file in a couple of places.
p.s. my initial search for "labelColumn" in the cookbook .md file did not return relevant results. Apparently it used to be called just "label" at some point :)
In reply to: 259626158 [](ancestors = 259626158)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #2257, #2660, #2177
Also related to #2680 #2613
Using ApiReview tool, went through the public surface area making the changes (as described in the issue)