-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Classification stratificationColumn not supported for boolean column #1204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Stratification column is not what you think. Actually, if you read the documentation, it states that
As you can see, it is ALWAYS a bad idea to use That said, currently we only support float, key and string values for stratification (and I'm not sure about string). We should expand the coverage of Hash estimator (see #1031) |
It would be good to be able specify the " |
I agree with Pete. Closing this. |
For not balanced datasets, with stratified splitting, the data is divided in such a way that a percentage of each target column value is put in both training and test dataset.
However, the following line of code throws an error if the column 'Label' is Boolean, which is very common for binary classification.
(trainData, testData) = classification.TrainTestSplit(data, testFraction: 0.2, stratificationColumn: "Label");
It would work if the Label column would be float or other types.
I might be missing something, but why is Boolean not supported for the
stratificationColumn
?Can we support it since it can be a common scenario for binary classifications?
The text was updated successfully, but these errors were encountered: