Skip to content

Cleaning TrainCatalog and RecommenderCatalog #2973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 19, 2019

Conversation

artidoro
Copy link
Contributor

Fixes #2972.

In this PR I clean TrainCatalog (commit 1) and RecommenderCatalog (commit 2).
The other commits are simply the adjustments in the code that need to be done to compile.

@artidoro artidoro added the API Issues pertaining the friendly API label Mar 15, 2019
@artidoro artidoro added this to the 0319 milestone Mar 15, 2019
@artidoro artidoro self-assigned this Mar 15, 2019
@codecov
Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2973 into master will increase coverage by 0.36%.
The diff coverage is 76.81%.

@@            Coverage Diff             @@
##           master    #2973      +/-   ##
==========================================
+ Coverage   72.35%   72.71%   +0.36%     
==========================================
  Files         803      803              
  Lines      143290   145370    +2080     
  Branches    16155    16766     +611     
==========================================
+ Hits       103673   105710    +2037     
- Misses      35191    35231      +40     
- Partials     4426     4429       +3
Flag Coverage Δ
#Debug 72.71% <76.81%> (+0.36%) ⬆️
#production 68.43% <71.42%> (+0.36%) ⬆️
#test 88.8% <100%> (+0.28%) ⬆️
Impacted Files Coverage Δ
...cenariosWithDirectInstantiation/TensorflowTests.cs 90.81% <100%> (ø) ⬆️
test/Microsoft.ML.Tests/Scenarios/Api/TestApi.cs 97.61% <100%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Validation.cs 100% <100%> (ø) ⬆️
...ests/TrainerEstimators/MatrixFactorizationTests.cs 97.17% <100%> (+0.25%) ⬆️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.12% <100%> (-0.41%) ⬇️
...soft.ML.Data/DataLoadSave/DataOperationsCatalog.cs 71.11% <100%> (-2.13%) ⬇️
test/Microsoft.ML.Functional.Tests/Evaluation.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Recommender/RecommenderCatalog.cs 70.83% <42.85%> (ø) ⬆️
src/Microsoft.ML.Data/TrainCatalog.cs 84.11% <72.72%> (ø) ⬆️
...rosoft.ML.Data/DataLoadSave/CompositeDataLoader.cs 73.23% <0%> (-16.35%) ⬇️
... and 23 more

Copy link
Member

@abgoswam abgoswam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

IDataView data, IEstimator<ITransformer> estimator, int numFolds = 5, string labelColumn = DefaultColumnNames.Label,
string stratificationColumn = null, int? seed = null)
IDataView data, IEstimator<ITransformer> estimator, int numberOfFolds = 5, string labelColumnName = DefaultColumnNames.Label,
string samplingKeyColumn = null, int? seed = null)
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samplingKeyColumn [](start = 19, length = 17)

Would you consider FoldColumnName? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wei-Sheng suggested partitionColumnName`, which I think reflects the role of the column better.


In reply to: 266177816 [](ancestors = 266177816)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2537
#2536
We had discussion regarding name of this column.
If you want to change it again, I would advice ask opinion of @rogancarr and @TomFinley

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I also considered in the issues @Ivanidzo4ka linked the merits of "partition" in my own thinking, though I did not write it down (as I had internally rejected it). I will now write down why I rejected it here, and you can see if you agree or disagree.

The major objection is that, if I look forward, I anticipate a future where this same sort of key is useful in any sampling utilities. (There are multiple ways to sample a dataset than simple partitionings!) For the sake of that more forward looking perspective, I wanted it named something else. And we ultimately settled on "sampling key."

So I would prefer if we did not act on this suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally I wouldn't consider it a disaster if we did this. (Keeping it as stratification would have been a disaster! 😄) It merely is slightly misleading I think. But of course I also see @wschin's point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Ivanidzo4ka, and @TomFinley for your comments on this, I am reverting the commit to leave the name of samplingKeyColumnName as is.

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only have a comment to naming. Please let know if it doesn't make sense to you.

@artidoro artidoro force-pushed the scrubbing branch 2 times, most recently from 822686e to 9021a94 Compare March 19, 2019 06:27
@artidoro artidoro merged commit 71693b3 into dotnet:master Mar 19, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants