Rename CV and TrainTest "stratification" parameter #2537

rogancarr · 2019-02-13T21:26:00Z

This PR changes the CrossValidation and TrainTest parameter StratificationColumn to be GroupPreservationColumn and updates the docstrings to give a clearer explanation.

Fixes #2536

See conversation in #2536

Related to #1204, but that issue might be asking for further coverage.

rogancarr · 2019-02-13T21:26:56Z

Something has gone weird with my master branch, and I'm seeing every merge from upstream as a commit. I'll look into this, but let's work with it for now ;)

codecov · 2019-02-13T23:13:42Z

Codecov Report

Merging #2537 into master will increase coverage by 0.17%.
The diff coverage is 76.19%.

@@            Coverage Diff             @@
##           master    #2537      +/-   ##
==========================================
+ Coverage   71.26%   71.44%   +0.17%     
==========================================
  Files         797      801       +4     
  Lines      141292   141855     +563     
  Branches    16118    16141      +23     
==========================================
+ Hits       100692   101346     +654     
+ Misses      36138    36041      -97     
- Partials     4462     4468       +6

Flag	Coverage Δ
#Debug	`71.44% <76.19%> (+0.17%)`	⬆️
#production	`67.73% <73.68%> (+0.14%)`	⬆️
#test	`85.53% <100%> (+0.17%)`	⬆️

Ivanidzo4ka · 2019-02-15T01:23:46Z

src/Microsoft.ML.Data/TrainCatalog.cs

-        /// If the <paramref name="stratificationColumn"/> is not provided, the random numbers generated to create it, will use this seed as value.
-        /// And if it is not provided, the default value will be used.</param>
-        public TrainTestData TrainTestSplit(IDataView data, double testFraction = 0.1, string stratificationColumn = null, uint? seed = null)
+        /// <param name="groupPreservationColumn">Name of a column to use as an ID for grouping rows. If two examples share the same value of the <paramref name="groupPreservationColumn"/>,


I think you left that description from previous iteration with idColumn.
Do you want to rephrase it, since ID looks weird. #Resolved

Also ID is already a thing on IDs, naming something different that would be super confusing.

machinelearning/src/Microsoft.Data.DataView/IDataView.cs

Line 132 in 1f29d2c

public abstract ValueGetter<DataViewRowId> GetIdGetter();

TomFinley

I @rogancarr , thanks for looking at this.

Group column is already an established name used to describe actual things that are intended to form a group. This is more general than this. So I do not like the name "group preservation." It's far too close. "Hey I specified this group column, how come my ranking metrics are all funky?" "Ah, you did not specify the group column, but the group preservation column!" Looks silly.

I'd almost prefer a random name drawn out of a dictionary to this. That would just look weird, but this will just be consistently confusing. I'll post suggestions in issue.

TomFinley · 2019-02-15T17:01:25Z

Something has gone weird with my master branch, and I'm seeing every merge from upstream as a commit. I'll look into this, but let's work with it for now ;)

You commited directly into master branch on your own fork about a week ago, somehow didn't catch it, pushed that cahnge, and you've been doing pulls ever since.

Once you're ready to restore your master branch, the easiest way to do this would be to git reset --hard upstream/master (assuming you set up the original remote name as upstream, as most people here seem to, if the original source repo remote you gave some other name, obviously replace that), which will update your local branches. Then git push -f to force push that update the master on your origin branch. Thenceforth remember that commiting directly to master is a really bad idea.

rogancarr · 2019-02-15T17:54:23Z

You commited directly into master branch on your own fork about a week ago, somehow didn't catch it, pushed that cahnge, and you've been doing pulls ever since.

Once you're ready to restore your master branch, the easiest way to do this would be to git reset --hard upstream/master (assuming you set up the original remote name as upstream, as most people here seem to, if the original source repo remote you gave some other name, obviously replace that), which will update your local branches. Then git push -f to force push that update the master on your origin branch. Thenceforth remember that commiting directly to master is a really bad idea.

Yes. I fixed that a couple days ago, but this PR still has the remnants :)

TomFinley

Thank you @rogancarr , this makes me a bit more comfortable.

singlis

Rogan Carr added 10 commits February 6, 2019 14:57

Updating docstrings

7557a83

Merge remote-tracking branch 'upstream/master'

e27e20f

Merge remote-tracking branch 'upstream/master'

22c2f1f

Merge remote-tracking branch 'upstream/master'

de92547

Merge remote-tracking branch 'upstream/master'

d43cf49

Merge remote-tracking branch 'upstream/master'

6dd83c1

Merge remote-tracking branch 'upstream/master'

82ace78

Merge remote-tracking branch 'upstream/master'

a178861

Merge remote-tracking branch 'upstream/master'

7ed44b7

Changing StratificationColumn to IdColumn

a864c93

rogancarr requested a review from Ivanidzo4ka February 13, 2019 21:26

Merge branch 'master' into 2536_rename_stratification_parameter

b2db3a3

rogancarr requested a review from TomFinley February 13, 2019 22:11

Renaming column.

67986c7

rogancarr mentioned this pull request Feb 15, 2019

StratificationColumn in CrossValidation and TrainTestSplit #2536

Closed

rogancarr requested a review from artidoro February 15, 2019 01:04

Ivanidzo4ka reviewed Feb 15, 2019

View reviewed changes

Addressing PR comments.

b8254ca

TomFinley suggested changes Feb 15, 2019

View reviewed changes

Addressing PR comments.

2782e1c

Addressing PR comments.

f72a103

TomFinley approved these changes Feb 15, 2019

View reviewed changes

singlis approved these changes Feb 15, 2019

View reviewed changes

rogancarr merged commit 832ecad into dotnet:master Feb 15, 2019

rogancarr deleted the 2536_rename_stratification_parameter branch February 15, 2019 21:56

Ivanidzo4ka mentioned this pull request Mar 18, 2019

Cleaning TrainCatalog and RecommenderCatalog #2973

Merged

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rename CV and TrainTest "stratification" parameter #2537

Rename CV and TrainTest "stratification" parameter #2537

Uh oh!

rogancarr commented Feb 13, 2019 •

edited

Loading

Uh oh!

rogancarr commented Feb 13, 2019

Uh oh!

codecov bot commented Feb 13, 2019 •

edited

Loading

Uh oh!

Ivanidzo4ka Feb 15, 2019 •

edited by rogancarr

Loading

Uh oh!

TomFinley Feb 15, 2019

Uh oh!

TomFinley left a comment •

edited

Loading

Uh oh!

TomFinley commented Feb 15, 2019

Uh oh!

rogancarr commented Feb 15, 2019

Uh oh!

TomFinley left a comment

Uh oh!

singlis left a comment

Uh oh!

Uh oh!

Rename CV and TrainTest "stratification" parameter #2537

Rename CV and TrainTest "stratification" parameter #2537

Uh oh!

Conversation

rogancarr commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rogancarr commented Feb 13, 2019

Uh oh!

codecov bot commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ivanidzo4ka Feb 15, 2019 • edited by rogancarr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomFinley Feb 15, 2019

Choose a reason for hiding this comment

Uh oh!

TomFinley left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomFinley commented Feb 15, 2019

Uh oh!

rogancarr commented Feb 15, 2019

Uh oh!

TomFinley left a comment

Choose a reason for hiding this comment

Uh oh!

singlis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rogancarr commented Feb 13, 2019 •

edited

Loading

codecov bot commented Feb 13, 2019 •

edited

Loading

Ivanidzo4ka Feb 15, 2019 •

edited by rogancarr

Loading

TomFinley left a comment •

edited

Loading