Skip to content

3 x Compile time regression with Clang-17 and -inline-threshold=1000 on LLVM-git - not seen when using Clang-15 #61684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ms178 opened this issue Mar 24, 2023 · 8 comments
Labels
build-problem incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) slow-compile

Comments

@ms178
Copy link

ms178 commented Mar 24, 2023

Hi, the build time of my LLVM-17-git package increased dramatically during the past several weeks with the same compile flags and config options used, but only when using Clang-17 as compiler.

When using (a super optimized LTO+PGO+BOLTed) Clang-15 to compile LLVM-17-git, the build takes around 1h 10min, but if I use my not as optimized system's Clang-17 (651b405), it just took 3h 36 min. That's more than three times slower! Recently, I have observed similar bad compile times when using an older LTO+PGO+BOLTed LLVM-17 build of mine (19f74c9) to compile LLVM-git. Some weeks ago, however, the same LTO+PGO+BOLTed LLVM-17 build used to work just as fast as the Clang-15 compiler on LLVM-git, but now it does not.

The only difference between using the fast Clang-15 and slow Clang-17, is that I use a PGO-file when using Clang-17 that is not compatible with the older Clang version.

CPU: Xeon E5-2696V3 (Haswell-EP)
RAM: 64 GB DDR3-ECC 1866

PKGBUILD.txt

export CC=clang
export CXX=clang++
export CC_LD=lld
export CXX_LD=lld
export AR=llvm-ar
export NM=llvm-nm
export STRIP=llvm-strip
export OBJCOPY=llvm-objcopy
export OBJDUMP=llvm-objdump
export READELF=llvm-readelf
export RANLIB=llvm-ranlib
export HOSTCC=clang
export HOSTCXX=clang++
export HOSTAR=llvm-ar
export CPPFLAGS="-D_FORTIFY_SOURCE=0"
export CFLAGS="-O3 -march=native -mtune=native -mllvm -inline-threshold=1000 -maes -mllvm -extra-vectorizer-passes -mllvm -enable-cond-stores-vec -mllvm -slp-vectorize-hor-store -mllvm -enable-loopinterchange -mllvm -enable-loop-distribute -mllvm -enable-unroll-and-jam -mllvm -enable-loop-flatten -mllvm -interleave-small-loop-scalar-reduction -mllvm -unroll-runtime-multi-exit -mllvm -aggressive-ext-opt -mllvm -enable-interleaved-mem-accesses -mllvm -enable-masked-interleaved-mem-accesses -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition -fcf-protection=none -mharden-sls=none -fomit-frame-pointer -mprefer-vector-width=256 -flto -fprofile-instr-use=/home/marcus/Downloads/llvm17.profdata"
export CXXFLAGS="${CFLAGS}"
export LDFLAGS="-Wl,--lto-O3,-O3,-Bsymbolic-functions,--as-needed -mllvm -extra-vectorizer-passes -mllvm -enable-cond-stores-vec -mllvm -slp-vectorize-hor-store -mllvm -enable-loopinterchange -mllvm -enable-loop-distribute -mllvm -enable-unroll-and-jam -mllvm -enable-loop-flatten -mllvm -interleave-small-loop-scalar-reduction -mllvm -unroll-runtime-multi-exit -mllvm -aggressive-ext-opt -mllvm -enable-interleaved-mem-accesses -mllvm -enable-masked-interleaved-mem-accesses -march=native -maes -flto -fuse-ld=lld -Wl,-zmax-page-size=0x200000 -fprofile-instr-use=/home/marcus/Downloads/llvm17.profdata"
CCLDFLAGS="$LDFLAGS"
CXXLDFLAGS="$LDFLAGS"
export ASFLAGS="-D__AVX__=1 -D__AVX2__=1 -msse2avx -D__FMA__=1"
@nikic
Copy link
Contributor

nikic commented Mar 24, 2023

This is not really actionable without a reduced test case. (Where "reduced" here means reduced to one file and a compilation command, not necessarily anything beyond that.)

Though even then, I'm not sure a compiler invocation that contains -inline-threshold=1000 is even worth investigating.

@EugeneZelenko EugeneZelenko added the incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) label Mar 24, 2023
@ms178
Copy link
Author

ms178 commented Mar 24, 2023

@nikic Sorry, here is a link to my repo for the additional files needed: https://github.com/ms178/archpkgbuilds/tree/main/toolchain-experimental/llvm-git

With all the information given (for editing makepkg.conf with the CFLAGS mentioned above), this should be easily reproducable on any Arch system with makepkg -si --cleanbuild --skippgpcheck --skipchecksums. And using -inline-threshold=1000 worked without significantly regressing the overall build time when bulding LLVM-git using Clang-15, so I really don't get it why it should not be worth investigating.

The optimized Clang-15 used was from: https://aur.cachyos.org/llvm-bolt-15.tar.zst
The system Clang-17 was compiled with the Clang-15 build from above and the CFLAGS from above.
Just use the produced Clang-17 to compile LLVM-git with the same CFLAGS again, and you will see that huge compile time regression.

@nikic
Copy link
Contributor

nikic commented Mar 24, 2023

@nikic Sorry, here is a link to my repo for the additional files needed: https://github.com/ms178/archpkgbuilds/tree/main/toolchain-experimental/llvm-git

By "single file" I meant one C/C++ file on which a significant regression can be seen.

@ms178
Copy link
Author

ms178 commented Mar 24, 2023

Sorry, I can't provide that. I am just a user, not a programmer. But if you want to know where it uses a lot of time now, it is during linking libclang.

@ms178
Copy link
Author

ms178 commented Mar 24, 2023

I've tried again without -inline-threshold=1000, and indeed it has a very positive effect on compile times. A build with Clang-17 (ff426a6) took only 1h 2 min which is in line with my expectations. It is still open for analysis why using that setting regressed that much recently though.

@ms178 ms178 changed the title 3 x Compile time regression with Clang-17 on LLVM-git - not seen when using Clang-15 3 x Compile time regression with Clang-17 and -inline-threshold=1000 on LLVM-git - not seen when using Clang-15 Mar 24, 2023
@ms178
Copy link
Author

ms178 commented May 10, 2025

@ms178 ms178 closed this as completed May 10, 2025
@tstellar
Copy link
Collaborator

@ms178 If you are unhappy with the way these issues have been handled, I would recommend taking a break from filing new issues.

@ms178
Copy link
Author

ms178 commented May 10, 2025

@tstellar Thanks, that is exactly what the closure of my issues is all about, showing that decisions like the one I talk about in the blog post alienate external users and are harmful to the LLVM community eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-problem incomplete Issue not complete (e.g. missing a reproducer, build arguments, etc.) slow-compile
Projects
None yet
Development

No branches or pull requests

4 participants