Skip to content

make benchmarks more stable #77661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0x8f701 opened this issue Sep 19, 2020 · 6 comments
Closed

make benchmarks more stable #77661

0x8f701 opened this issue Sep 19, 2020 · 6 comments
Labels
A-libtest Area: `#[test]` / the `test` library T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@0x8f701
Copy link

0x8f701 commented Sep 19, 2020

Describe the problem you are trying to solve

currently cargo bench isn't so stable, it doesn't run long enough, and the data can vary a lot (20-30%), which makes it hard to know if there is really a regression or not.

Describe the solution you'd like
no sorry

Notes

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 19, 2020

cargo bench is a wrapper around functionality in rustc. If you want to change the behavior https://github.com/rust-lang/rust is probably a better place to discuss it. On the other hand the bench is unstable exactly because it is not robust or flexible enough. There is work in rustc to make it more a plugin system. The recommendation at this time is to use the https://crates.io/crates/criterion .

@ehuss ehuss changed the title make cargo bench more stable make benchmarks more stable Oct 7, 2020
@ehuss ehuss transferred this issue from rust-lang/cargo Oct 7, 2020
@ehuss
Copy link
Contributor

ehuss commented Oct 7, 2020

Transferred to the rust-lang/rust repository, as that is where the libtest harness lives. Unfortunately, I don't think it is likely there will be much work done on libtest's benchmarking, as the future is currently uncertain (see #29553 and #66287). You will likely have better support for external benchmarking frameworks like criterion.

@ehuss ehuss added A-libtest Area: `#[test]` / the `test` library T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 7, 2020
@the8472
Copy link
Member

the8472 commented Oct 7, 2020

#[bench] measures iterations per walltime interval, more or less.
So if you don't want to switch to a different benchmark crate that supports instruction counting or does more sophisticated analysis you'll have to bring your system into a state that causes less variance. I.e. shut down background tasks, disable CPU clock boosting and check for thermal throttling which often is a problem when benching on laptops.

@0x8f701
Copy link
Author

0x8f701 commented Oct 8, 2020

@the8472 even with that the results can change a lot:)

@the8472
Copy link
Member

the8472 commented Oct 8, 2020

At least in Vec-related things I have been working on recently I have seen variances for a null run in the 2-10% range with two outliers around 20% (among dozens of benchmarks). But that's pure CPU/memory throughput benchmarks. If you start doing syscalls or even randomized allocations things will become noisier.

@Mark-Simulacrum
Copy link
Member

I'm going to go ahead and close this issue, as it seems to me that it's largely a consequence of the overall bench design (wall time, not instruction counts, for example) which seems unlikely to get much more sophisticated inside the standard library. And, realistically, unless you're doing software emulation of some kind, most larger benchmarks will have some amount of uncertainty, especially if they have syscalls or the like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-libtest Area: `#[test]` / the `test` library T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants