make benchmarks more stable #77661

0x8f701 · 2020-09-19T12:32:38Z

Describe the problem you are trying to solve

currently cargo bench isn't so stable, it doesn't run long enough, and the data can vary a lot (20-30%), which makes it hard to know if there is really a regression or not.

Describe the solution you'd like
no sorry

Notes

Eh2406 · 2020-09-19T14:46:31Z

cargo bench is a wrapper around functionality in rustc. If you want to change the behavior https://github.com/rust-lang/rust is probably a better place to discuss it. On the other hand the bench is unstable exactly because it is not robust or flexible enough. There is work in rustc to make it more a plugin system. The recommendation at this time is to use the https://crates.io/crates/criterion .

ehuss · 2020-10-07T18:18:32Z

Transferred to the rust-lang/rust repository, as that is where the libtest harness lives. Unfortunately, I don't think it is likely there will be much work done on libtest's benchmarking, as the future is currently uncertain (see #29553 and #66287). You will likely have better support for external benchmarking frameworks like criterion.

the8472 · 2020-10-07T23:32:29Z

#[bench] measures iterations per walltime interval, more or less.
So if you don't want to switch to a different benchmark crate that supports instruction counting or does more sophisticated analysis you'll have to bring your system into a state that causes less variance. I.e. shut down background tasks, disable CPU clock boosting and check for thermal throttling which often is a problem when benching on laptops.

0x8f701 · 2020-10-08T07:07:51Z

@the8472 even with that the results can change a lot:)

the8472 · 2020-10-08T19:34:11Z

At least in Vec-related things I have been working on recently I have seen variances for a null run in the 2-10% range with two outliers around 20% (among dozens of benchmarks). But that's pure CPU/memory throughput benchmarks. If you start doing syscalls or even randomized allocations things will become noisier.

Mark-Simulacrum · 2022-03-02T22:43:15Z

I'm going to go ahead and close this issue, as it seems to me that it's largely a consequence of the overall bench design (wall time, not instruction counts, for example) which seems unlikely to get much more sophisticated inside the standard library. And, realistically, unless you're doing software emulation of some kind, most larger benchmarks will have some amount of uncertainty, especially if they have syscalls or the like.

ehuss changed the title ~~make cargo bench more stable~~ make benchmarks more stable Oct 7, 2020

ehuss transferred this issue from rust-lang/cargo Oct 7, 2020

ehuss added A-libtest Area: `#[test]` / the `test` library T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 7, 2020

Mark-Simulacrum closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make benchmarks more stable #77661

make benchmarks more stable #77661

0x8f701 commented Sep 19, 2020

Eh2406 commented Sep 19, 2020

ehuss commented Oct 7, 2020

the8472 commented Oct 7, 2020

0x8f701 commented Oct 8, 2020

the8472 commented Oct 8, 2020

Mark-Simulacrum commented Mar 2, 2022

make benchmarks more stable #77661

make benchmarks more stable #77661

Comments

0x8f701 commented Sep 19, 2020

Eh2406 commented Sep 19, 2020

ehuss commented Oct 7, 2020

the8472 commented Oct 7, 2020

0x8f701 commented Oct 8, 2020

the8472 commented Oct 8, 2020

Mark-Simulacrum commented Mar 2, 2022