-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix Flaky Euler-Maruyama Tests #6287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Flaky Euler-Maruyama Tests #6287
Conversation
So I took at look at the differences @ricardoV94... Also, the implementation of the logp seems to be the same besides the addition of the init_dist's logp. I view this as a bug fix but I haven't dug up that discussion. Not sure if you were involved there. EDIT: Looking at it again, I guess there is just a bit different formulation change there to include the init-dist. Thought that was just omitted in logp Other than that, it seems the port was as intended. I few the random seeding as blocking against the small chance the parameter used to generate is outside of the posterior credible interval |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6287 +/- ##
==========================================
- Coverage 94.17% 90.39% -3.78%
==========================================
Files 111 111
Lines 23908 23869 -39
==========================================
- Hits 22515 21577 -938
- Misses 1393 2292 +899
|
I am also looking into the failure rate as mentioned in the issue. WIP I modified the {
"lamh: (lo < lam) and (lam < hi),
"zh": ((lo < z) * (z < hi)).mean() > 0.95,
} and ran the test 50 times and stored each experiment as row in DataFrame. >>> df.mean()
lamh 0.88
zh 1.00
dtype: float64 And with tag 3.11.5 df.mean()
lamh 0.96
zh 1.00
dtype: float64 So seeing that |
Also seeing the previous tests uses Differences so far:
|
And now another test is failing: Which is also could be fixed with seeding... However, the specific test is y_eval = draw(y, draws=2)
assert y_eval[0].shape == (batch_size, steps)
> assert not np.any(np.isclose(y_eval[0], y_eval[1])) ...which I think is just trying to get at the samples are distinct. In that case, I think Even though two draws are unlikely to be the same across all the |
assert y_eval[0].shape == (batch_size, steps) | ||
assert not np.any(np.isclose(y_eval[0], y_eval[1])) | ||
assert np.any(~np.isclose(y_eval[0], y_eval[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.any(~np.isclose([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])) # -> False since all the same
np.any(~np.isclose([5, 4, 3, 2, 1], [1, 2, 3, 4, 5])) # -> True since at least one different
np.any(~np.isclose([1, 2, 3, 4, 5], [6, 7, 8, 9, 10])) # -> True since all different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you reckon this check is better, than checking all are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly, I had it fail on my end where checking that all are different (previous test). Not sure the chances of that happening, but it did. That is, the two arrays were the same at at least 1 spot. In the case of an intersection between the two ys, I saw this better capturing that they are different arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test should now be deterministic no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There are two changes:
- Deterministic because of the random seed
- Change in the assert which, I thought, better checks that the two draws are different based on the logic described. If you disagree, I will change back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still passes under the seed and the previous assert though but so can any test catered to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but all values differently is certainly stronger evidence that it's being random :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
One less flaky boi :)
I have to quickly rebase with new main since the |
Failed test on test_gp.py |
That's a known flaky test #6299. we can just override |
What is this PR about?
This PR addresses the flaky tests with the EulerMaruyama class which were brought up in here and here.
This PR includes seeding the current tests and ensuring that the V4 port didn't change the functionality.
Checklist
Major / Breaking Changes
Bugfixes / New features
Docs / Maintenance
:
in the Parameters for betting render