Fix Flaky Euler-Maruyama Tests #6287

williambdean · 2022-11-11T00:44:16Z

What is this PR about?
This PR addresses the flaky tests with the EulerMaruyama class which were brought up in here and here.

This PR includes seeding the current tests and ensuring that the V4 port didn't change the functionality.

Checklist

Explain important implementation details 👆
Make sure that the pre-commit linting/style checks pass.
Link relevant issues (preferably in nice commit messages)
Are the changes covered by tests and docstrings?
Fill out the short summary sections 👇

Major / Breaking Changes

...

Bugfixes / New features

...

Docs / Maintenance

Seeding the Euler-Maruyama linear_model test
Adding a space : in the Parameters for betting render

williambdean · 2022-11-11T00:52:45Z

So I took at look at the differences @ricardoV94...
The linear_model test seems to be a direct port from the last tagged version of V3.
Old test -> Latest V4 test

Also, the implementation of the logp seems to be the same besides the addition of the init_dist's logp. I view this as a bug fix but I haven't dug up that discussion. Not sure if you were involved there. EDIT: Looking at it again, I guess there is just a bit different formulation change there to include the init-dist. Thought that was just omitted in logp
Old logp -> Latest V4 logp

Other than that, it seems the port was as intended. I few the random seeding as blocking against the small chance the parameter used to generate is outside of the posterior credible interval

codecov · 2022-11-11T01:03:38Z

Codecov Report

Merging #6287 (ca7b5bd) into main (4acd98e) will decrease coverage by 3.77%.
The diff coverage is 0.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6287      +/-   ##
==========================================
- Coverage   94.17%   90.39%   -3.78%     
==========================================
  Files         111      111              
  Lines       23908    23869      -39     
==========================================
- Hits        22515    21577     -938     
- Misses       1393     2292     +899

Impacted Files	Coverage Δ
pymc/distributions/timeseries.py	`28.46% <ø> (-66.00%)`	⬇️
pymc/tests/distributions/test_timeseries.py	`0.00% <0.00%> (-99.51%)`	⬇️
pymc/tests/step_methods/hmc/test_quadpotential.py	`0.00% <0.00%> (-95.82%)`	⬇️
pymc/model_graph.py	`66.47% <0.00%> (-12.36%)`	⬇️
pymc/step_methods/hmc/quadpotential.py	`73.84% <0.00%> (-6.78%)`	⬇️
pymc/distributions/logprob.py	`61.72% <0.00%> (-5.56%)`	⬇️
pymc/distributions/shape_utils.py	`97.44% <0.00%> (-0.43%)`	⬇️
pymc/step_methods/hmc/hmc.py	`92.85% <0.00%> (-0.13%)`	⬇️
... and 11 more

williambdean · 2022-11-11T01:05:28Z

I am also looking into the failure rate as mentioned in the issue. WIP

I modified the linear_model test to store the two booleans values. i.e.

{
    "lamh: (lo < lam) and (lam < hi), 
    "zh": ((lo < z) * (z < hi)).mean() > 0.95, 
}

and ran the test 50 times and stored each experiment as row in DataFrame.
For the main branch,

>>> df.mean()
lamh    0.88
zh     1.00
dtype: float64

And with tag 3.11.5

df.mean()
lamh    0.96
zh     1.00
dtype: float64

So seeing that lamh is culprit for the assertion errors. However, there is indeed a higher failure rate in the main branch in two samples (and potentially under assumption of 0.95 success rate. Takes a bit of time to run these)

williambdean · 2022-11-11T02:33:46Z

Also seeing the previous tests uses advi+adapt_diag instead of NUTS but I'm not sure atm how that might impact the results.

Differences so far:

ADVI+adapt_diag vs NUTS
Accounting for first value based on init_dist in the logp

williambdean · 2022-11-11T02:34:58Z

And now another test is failing: pymc/tests/distributions/test_timeseries.py::TestEulerMaruyama::test_batched_size[False-2] lol
There is no comparable test in the previous version.

Which is also could be fixed with seeding... However, the specific test is

    y_eval = draw(y, draws=2)
    assert y_eval[0].shape == (batch_size, steps)
>   assert not np.any(np.isclose(y_eval[0], y_eval[1]))

...which I think is just trying to get at the samples are distinct. In that case, I think np.any(~np.isclose(y_eval[0], y_eval[1])) might be better in the case that the two ts are close enough at a point but are still different.

Even though two draws are unlikely to be the same across all the steps, I think a seed is still a wise choice here too.

williambdean · 2022-11-11T15:52:06Z

pymc/tests/distributions/test_timeseries.py

        assert y_eval[0].shape == (batch_size, steps)
-        assert not np.any(np.isclose(y_eval[0], y_eval[1]))
+        assert np.any(~np.isclose(y_eval[0], y_eval[1]))


np.any(~np.isclose([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])) # -> False since all the same np.any(~np.isclose([5, 4, 3, 2, 1], [1, 2, 3, 4, 5])) # -> True since at least one different np.any(~np.isclose([1, 2, 3, 4, 5], [6, 7, 8, 9, 10])) # -> True since all different

Do you reckon this check is better, than checking all are different?

Surprisingly, I had it fail on my end where checking that all are different (previous test). Not sure the chances of that happening, but it did. That is, the two arrays were the same at at least 1 spot. In the case of an intersection between the two ys, I saw this better capturing that they are different arrays

The test should now be deterministic no?

Yes. There are two changes:

Deterministic because of the random seed

Change in the assert which, I thought, better checks that the two draws are different based on the logic described. If you disagree, I will change back

It still passes under the seed and the previous assert though but so can any test catered to do so

I don't have a strong preference, but all values differently is certainly stronger evidence that it's being random :D

michaelosthege

Looking good!

One less flaky boi :)

williambdean · 2022-11-15T21:42:15Z

I have to quickly rebase with new main since the seed args in test were just changed to random_seed (by me)

pymc/tests/distributions/test_timeseries.py

williambdean · 2022-11-18T12:38:55Z

Failed test on test_gp.py

michaelosthege · 2022-11-18T12:50:06Z

Failed test on test_gp.py

That's a known flaky test #6299.

we can just override

williambdean added 2 commits November 10, 2022 19:21

seeding the test

0ba93b3

being nice to the docs

436d0f9

williambdean and others added 2 commits November 11, 2022 10:46

seeding and flipping any

eadfd75

Merge branch 'pymc-devs:main' into flaky_eulermaruyama_tests

f2a70fd

williambdean commented Nov 11, 2022

View reviewed changes

williambdean marked this pull request as ready for review November 11, 2022 16:32

michaelosthege approved these changes Nov 15, 2022

View reviewed changes

williambdean and others added 2 commits November 15, 2022 16:43

Merge branch 'pymc-devs:main' into flaky_eulermaruyama_tests

627e49d

switch from seed arg to random_seed for initial point

504f836

michaelosthege requested a review from ricardoV94 November 17, 2022 17:36

ricardoV94 reviewed Nov 18, 2022

View reviewed changes

pymc/tests/distributions/test_timeseries.py Outdated Show resolved Hide resolved

williambdean and others added 2 commits November 18, 2022 06:47

Merge branch 'pymc-devs:main' into flaky_eulermaruyama_tests

d0474ed

using the numpy_rng instance where applicable

a0d68d1

change back the test after discussion

ca7b5bd

ricardoV94 approved these changes Nov 19, 2022

View reviewed changes

michaelosthege merged commit cb37afa into pymc-devs:main Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Flaky Euler-Maruyama Tests #6287

Fix Flaky Euler-Maruyama Tests #6287

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

codecov bot commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean Nov 11, 2022

ricardoV94 Nov 18, 2022

williambdean Nov 18, 2022 •

edited

Loading

ricardoV94 Nov 18, 2022

williambdean Nov 18, 2022

williambdean Nov 18, 2022

ricardoV94 Nov 18, 2022

williambdean Nov 18, 2022

michaelosthege left a comment

williambdean commented Nov 15, 2022

williambdean commented Nov 18, 2022

michaelosthege commented Nov 18, 2022

Fix Flaky Euler-Maruyama Tests #6287

Fix Flaky Euler-Maruyama Tests #6287

Conversation

williambdean commented Nov 11, 2022 • edited Loading

Major / Breaking Changes

Bugfixes / New features

Docs / Maintenance

williambdean commented Nov 11, 2022 • edited Loading

codecov bot commented Nov 11, 2022 • edited Loading

Codecov Report

williambdean commented Nov 11, 2022 • edited Loading

williambdean commented Nov 11, 2022 • edited Loading

williambdean commented Nov 11, 2022 • edited Loading

williambdean Nov 11, 2022

Choose a reason for hiding this comment

ricardoV94 Nov 18, 2022

Choose a reason for hiding this comment

williambdean Nov 18, 2022 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Nov 18, 2022

Choose a reason for hiding this comment

williambdean Nov 18, 2022

Choose a reason for hiding this comment

williambdean Nov 18, 2022

Choose a reason for hiding this comment

ricardoV94 Nov 18, 2022

Choose a reason for hiding this comment

williambdean Nov 18, 2022

Choose a reason for hiding this comment

michaelosthege left a comment

Choose a reason for hiding this comment

williambdean commented Nov 15, 2022

williambdean commented Nov 18, 2022

michaelosthege commented Nov 18, 2022

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

codecov bot commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean commented Nov 11, 2022 •

edited

Loading

williambdean Nov 18, 2022 •

edited

Loading