-
Notifications
You must be signed in to change notification settings - Fork 1.9k
StratificationColumn in CrossValidation and TrainTestSplit #2536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do we have any idea what should be new name? |
@Ivanidzo4ka good question! In the above, I've made a suggestion for "IdColumn". |
Sorry, I guess you mention it in other issue, don't see it here. |
How about
|
Row Set Preservation Society. That would be good name for my second album. |
If I heard something was renamed to Is there another industry term for this? We can't be the first. |
Closest I see in scikit-learn is https://scikit-learn.org/stable/modules/cross_validation.html#group-shuffle-split Another route is to rename the |
Speaking of renaming. @Dmitry-A was saying earlier today that |
@justinormont what |
The column purpose of I'm unsure we have brought the concept to ML.NET. |
Ah, that |
I see it listed here:
No idea if we utilize the concept though. |
Let's keep this discussion on potential names for |
So far we have
@TomFinley @shauheen @glebuk @yaeldekel Any thoughts? |
I renamed it to |
By itself not an acceptable name. If you somehow clarified the "group" column to mean something else. @justinormont 's suggestion of Anyway, This is the problem, is that what type of "group" is considered relevant are vert context dependent. If you can make a case that "group" is used in other contexts to refer to this specifically, I could change my mind potentially. But as far as I see the case depends on a 5 character substring of a method from Python taken compeltely out of the context that made it clear what type of group you were talking about. Maybe |
I like @TomFinley naming suggestions:
|
Would be nice to make that names consistent as well. |
CrossValidation
andTrainTestSplit
have a parameter calledStratificationColumn
that is used to preserve groupings of columns across splits (as discussed in #2487). This isn't actually stratification, so we should rename the column.This is a forked sub-issue from #2487
Related to #1204
The text was updated successfully, but these errors were encountered: