Skip to content

Added baselines for the TestDatasets.breastCancerPipeMissing dataset, and their unit tests #4785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mstfbl
Copy link
Contributor

@mstfbl mstfbl commented Feb 5, 2020

Related to Issue #4681 and PR #4695.

This PR adds the baseline figures for the TestDatasets.breastCancerPipeMissing dataset (which is the same dataset as TestDatasets.breastCancerPipe but without missing values due to the NAHandle transformer), and adds these baseline figures to the required tests in TestPredictors.cs.

The tests that these baselines satisfy are:

  • LightGBMClassificationTest,
  • GossLightGBMTest,
  • DartLightGBMTest,
  • FastTreeBinaryClassificationTest,
  • FastTreeHighMinDocsTest
  • MulticlassNaiveBayes

These tests are also originally the only unit tests that utilize TestDatasets.breastCancerPipe.

Since the TestDatasets.breastCancerPipeMissing dataset does not contain missing ? values, the change in PR #4695 of allowing CursOpt.AllFeatures will not impact the test cases mentioned above.

**Edit: ** In the last sentence above, I mean that since there are no missing values in TestDatasets.breastCancerPipeMissing, these datasets are better suited to test these LightGbm tests as CursOpt.AllFeatures combined with the HandleMissingValues flag in LightGbmTrainerBase.cs introduce multiple modifiers to the case of handling missing values, which this TestDatasets.breastCancerPipeMissing dataset is not impacted by.

Added baselines for TestDatasets.breastCancerPipeMissing for the LightGBMClassificationTest, GossLightGBMTest, and DartLightGBMTest.
@mstfbl mstfbl requested a review from a team as a code owner February 5, 2020 17:43
@mstfbl mstfbl changed the title Added baselines for the TestDatasets.breastCancerPipeMissing dataset Added baselines for the TestDatasets.breastCancerPipeMissing dataset, and their unit tests Feb 5, 2020
@antoniovs1029
Copy link
Member

Hi, @mstfbl . Can you please explain here why these baselines are needed, if the ones with missing values already exist? Specially since, as you've stated, these baselines won't actually test the changes introduced in your other PR.

@mstfbl
Copy link
Contributor Author

mstfbl commented Feb 5, 2020

Hi @antoniovs1029 , I wasn't as clear as I could have been in the original post. There I meant that the change made in PR #4695 with CursOpt.AllFeatures won't break the tests with these baselines, as CursOpt.AllFeatures includes a peculiar case with the handling of missing values. I am trying to say that these baselines are a better set of metrics to test these LightGbm tests.

In addition, they are a second set of baselines that are tested in the already-existing set of LightGbm tests, and I think having extra cases to test isn't a bad thing at all.

@ganik
Copy link
Member

ganik commented Feb 5, 2020

This is not right way to go. Add unit test in your original PR #4695. Take any already generated model file and add it to your PR. Also add just one (for one dataset) established baseline. Load the model in Unit test and run with the new LightGbm code, compare the baselines.


In reply to: 582580955 [](ancestors = 582580955)

ganik
ganik previously requested changes Feb 5, 2020
Copy link
Member

@ganik ganik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@ganik ganik dismissed their stale review February 5, 2020 21:07

revoking review

Copy link
Member

@ganik ganik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@mstfbl mstfbl closed this Feb 10, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants