-
Notifications
You must be signed in to change notification settings - Fork 6.1k
update expected results of slow tests #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes @kashif!
@patrickvonplaten IIRC you wanted to double-check something before updating the VE tests?
I can run the tests now on a GPU to double check as well... will report soon |
The documentation is not available anymore as the PR was closed or merged. |
Hmm, looks like the |
Ah yeah lemme run the pipeline real quick to make sure this all still works |
tests/test_scheduler.py
Outdated
@@ -714,7 +714,7 @@ def test_full_loop_no_noise(self): | |||
result_sum = torch.sum(torch.abs(sample)) | |||
result_mean = torch.mean(torch.abs(sample)) | |||
|
|||
assert abs(result_sum.item() - 14379591680.0) < 1e-2 | |||
assert abs(result_sum.item() - 14379589632.0) < 1e-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circle CI seemed to be happy with 14379591680.0
: https://github.com/huggingface/diffusers/runs/8068605613?check_suite_focus=true#step:6:71 So not sure about this change
=> maybe we should increase the tolerance here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes perhaps that is what i was thinking that too so perhaps abs < 2.5 e3 or so
tests/test_scheduler.py
Outdated
@@ -714,8 +714,8 @@ def test_full_loop_no_noise(self): | |||
result_sum = torch.sum(torch.abs(sample)) | |||
result_mean = torch.mean(torch.abs(sample)) | |||
|
|||
assert abs(result_sum.item() - 14379591680.0) < 1e-2 | |||
assert abs(result_mean.item() - 18723426.0) < 1e-3 | |||
assert abs(result_sum.item() - 14379591680.0) < 2.5e3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here that says that the results are flaky depending on GPU and this is why we have such high values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually upon taking a second look I think the reason is that we don't make the test deterministic with generators - both DDPM and Score-VE schedulers make use of torch.randn(...)
@natolambert @patrickvonplaten I tested now on a bunch of servers and laptop and all tests (slow ones) are passing for me... |
Great, it has been a bit since I tested them all so hopefully it's more stable now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we make the random scheduler tests deterministic by passing the generator correctly. Regarding the pipeline test it'd be nice to reduce the number of steps to speed it up a bit :-)
Think merging: #289 can unblock this PR here
@kashif looks like a couple of VE tests are failing now, could you check please? https://github.com/huggingface/diffusers/runs/8158641999?check_suite_focus=true When those are back, I think we can carefully merge this :) |
sure! let me get it over the line |
Should we still try to get this one merged? |
yes please! I believe some of the images on the hub might need updating? let me fix the conflict |
Can we fix the quality checks here and then run all slow tests to be sure? |
Awesome - thank to make all the changes! Feel free to merge whenever :-) |
* update expected results of slow tests * relax sum and mean tests * Print shapes when reporting exception * formatting * fix sentence * relax test_stable_diffusion_fast_ddim for gpu fp16 * relax flakey tests on GPU * added comment on large tolerences * black * format * set scheduler seed * added generator * use np.isclose * set num_inference_steps to 50 * fix dep. warning * update expected_slice * preprocess if image * updated expected results * updated expected from CI * pass generator to VAE * undo change back to orig * use orignal * revert back the expected on cpu * revert back values for CPU * more undo * update result after using gen * update mean * set generator for mps * update expected on CI server * undo * use new seed every time * cpu manual seed * reduce num_inference_steps * style * use generator for randn Co-authored-by: Patrick von Platen <[email protected]>
Updated the two failing slow tests