Creating an array can be made 2x faster #139875
Labels
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-optimization
Category: An issue highlighting optimization opportunities or PRs implementing such
I-heavy
Issue: Problems and improvements with respect to binary size of generated code.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Consider this simple function:
Because
2u64
doesn't have the same bytes throughout, the compile can't callmemset
and instead creates a vectorized loop.However, from my testing, using the
rep stosq
instruction is over twice as fast for large arrays (more than a few hundred elements). Here is a faster version of the same function:Benchmarking both with Criterion:
Compare both of them on Godbolt.
The text was updated successfully, but these errors were encountered: