-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Cleaning TrainCatalog and RecommenderCatalog #2973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2973 +/- ##
==========================================
+ Coverage 72.35% 72.71% +0.36%
==========================================
Files 803 803
Lines 143290 145370 +2080
Branches 16155 16766 +611
==========================================
+ Hits 103673 105710 +2037
- Misses 35191 35231 +40
- Partials 4426 4429 +3
|
test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamplesDynamicApi.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDataView data, IEstimator<ITransformer> estimator, int numFolds = 5, string labelColumn = DefaultColumnNames.Label, | ||
string stratificationColumn = null, int? seed = null) | ||
IDataView data, IEstimator<ITransformer> estimator, int numberOfFolds = 5, string labelColumnName = DefaultColumnNames.Label, | ||
string samplingKeyColumn = null, int? seed = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samplingKeyColumn [](start = 19, length = 17)
Would you consider FoldColumnName? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wei-Sheng suggested partitionColumnName`, which I think reflects the role of the column better.
In reply to: 266177816 [](ancestors = 266177816)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2537
#2536
We had discussion regarding name of this column.
If you want to change it again, I would advice ask opinion of @rogancarr and @TomFinley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I also considered in the issues @Ivanidzo4ka linked the merits of "partition" in my own thinking, though I did not write it down (as I had internally rejected it). I will now write down why I rejected it here, and you can see if you agree or disagree.
The major objection is that, if I look forward, I anticipate a future where this same sort of key is useful in any sampling utilities. (There are multiple ways to sample a dataset than simple partitionings!) For the sake of that more forward looking perspective, I wanted it named something else. And we ultimately settled on "sampling key."
So I would prefer if we did not act on this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally I wouldn't consider it a disaster if we did this. (Keeping it as stratification would have been a disaster! 😄) It merely is slightly misleading I think. But of course I also see @wschin's point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Ivanidzo4ka, and @TomFinley for your comments on this, I am reverting the commit to leave the name of samplingKeyColumnName
as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only have a comment to naming. Please let know if it doesn't make sense to you.
822686e
to
9021a94
Compare
Fixes #2972.
In this PR I clean
TrainCatalog
(commit 1) andRecommenderCatalog
(commit 2).The other commits are simply the adjustments in the code that need to be done to compile.