-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added baselines for the TestDatasets.breastCancerPipeMissing dataset, and their unit tests #4785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added baselines for TestDatasets.breastCancerPipeMissing for the LightGBMClassificationTest, GossLightGBMTest, and DartLightGBMTest.
Hi, @mstfbl . Can you please explain here why these baselines are needed, if the ones with missing values already exist? Specially since, as you've stated, these baselines won't actually test the changes introduced in your other PR. |
Hi @antoniovs1029 , I wasn't as clear as I could have been in the original post. There I meant that the change made in PR #4695 with In addition, they are a second set of baselines that are tested in the already-existing set of LightGbm tests, and I think having extra cases to test isn't a bad thing at all. |
This is not right way to go. Add unit test in your original PR #4695. Take any already generated model file and add it to your PR. Also add just one (for one dataset) established baseline. Load the model in Unit test and run with the new LightGbm code, compare the baselines. In reply to: 582580955 [](ancestors = 582580955) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
Related to Issue #4681 and PR #4695.
This PR adds the baseline figures for the
TestDatasets.breastCancerPipeMissing
dataset (which is the same dataset asTestDatasets.breastCancerPipe
but without missing values due to theNAHandle
transformer), and adds these baseline figures to the required tests inTestPredictors.cs
.The tests that these baselines satisfy are:
LightGBMClassificationTest
,GossLightGBMTest
,DartLightGBMTest
,FastTreeBinaryClassificationTest
,FastTreeHighMinDocsTest
MulticlassNaiveBayes
These tests are also originally the only unit tests that utilize
TestDatasets.breastCancerPipe
.Since the
TestDatasets.breastCancerPipeMissing
dataset does not contain missing?
values, the change in PR #4695 of allowingCursOpt.AllFeatures
will not impact the test cases mentioned above.**Edit: ** In the last sentence above, I mean that since there are no missing values in
TestDatasets.breastCancerPipeMissing
, these datasets are better suited to test these LightGbm tests asCursOpt.AllFeatures
combined with theHandleMissingValues
flag inLightGbmTrainerBase.cs
introduce multiple modifiers to the case of handling missing values, which thisTestDatasets.breastCancerPipeMissing
dataset is not impacted by.