AArch64 float and bool code regression from LLVM 18 #138212

smeenai · 2025-05-01T23:02:21Z

We recently upgraded an internal codebase from LLVM 18 to LLVM 19 and observed regressions in several benchmarks. LLVM 20 and trunk exhibit the same regressions as well.

The simplest reduction demonstrating an issue is https://godbolt.org/z/hG3oh4sjY, where the code generated by Clang 19 seems strictly worse than the code generated by Clang 18. Clang trunk's code is larger than 18's (although smaller than 19's) but branchless, so I guess it could be better or worse depending on the input data.

https://godbolt.org/z/ns6rqhese is a more realistic reduction, which shows LLVM 18 generating code that takes advantage of vector instructions, which later versions fail to do. https://godbolt.org/z/e4q4E6xE7 is an intermediate simplification where 19's code seems strictly worse than 18's, with trunk's code being the largest but also having the fewest branches.

I bisected all these regressions from 18 to 19 to #84628. CC the diff author @YanWQ-monad and reviewers @goldsteinn, @dtcxzyw and @nikic.

llvmbot · 2025-05-01T23:27:12Z

@llvm/issue-subscribers-backend-aarch64

Author: Shoaib Meenai (smeenai)

We recently upgraded an internal codebase from LLVM 18 to LLVM 19 and observed regressions in several benchmarks. LLVM 20 and trunk exhibit the same regressions as well.

The simplest reduction demonstrating an issue is https://godbolt.org/z/hG3oh4sjY, where the code generated by Clang 19 seems strictly worse than the code generated by Clang 18. Clang trunk's code is larger than 18's (although smaller than 19's) but branchless, so I guess it could be better or worse depending on the input data.

https://godbolt.org/z/ns6rqhese is a more realistic reduction, which shows LLVM 18 generating code that takes advantage of vector instructions, which later versions fail to do. https://godbolt.org/z/e4q4E6xE7 is an intermediate simplification where 19's code seems strictly worse than 18's, with trunk's code being the largest but also having the fewest branches.

I bisected all these regressions from 18 to 19 to #84628. CC the diff author @YanWQ-monad and reviewers @goldsteinn, @dtcxzyw and @nikic.

nikic · 2025-05-02T07:46:36Z

Missing fold: https://alive2.llvm.org/ce/z/PnHPtH

smeenai · 2025-05-06T05:09:45Z

How general should the fold be? Assuming one arm of each select is 0, the condition I can think of for the other constants is that their OR must be different from either constant (i.e. each constant has at least one bit set that's not set in the other constant, https://alive2.llvm.org/ce/z/HfCQx8). You could extend that to non-constants if you know enough bits to validate that condition. You could come up with conditions that work if the other arm of the select is non-zero, or for logical operations other than or. What did you have in mind here?

nikic · 2025-05-06T07:46:02Z

Yeah, there are a lot of variants that could be implemented here. I think it's fine to limit it to constants, but requiring one arm to be zero isn't really necessary.

One possibility would be to match a general icmp pred (select a ? c1 : c2) or (select b ? c3 : c4), c5 pattern and then just brute-force evaluate all four variants of icmp pred c1 or c2, c5 to create a truth table, which can then be converted back to code using the createLogicFromTable() helper. (If we're using the constant folding API, then replacing or with any binop should be essentially free as well).

antoniofrighetto · 2025-05-06T15:43:57Z

@smeenai Can take a look this later if you wish.

smeenai · 2025-05-06T16:26:47Z

That would be amazing, thank you!

Match icmps of binops where both operands are select with constant arms, i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`. Fold such patterns by creating a truth table of the possible four constant variants, and materialize back the optimal logic from it via `createLogicFromTable` helper. Proofs: https://alive2.llvm.org/ce/z/_kkUfJ. Fixes: llvm#138212.

Match icmps of binops where both operands are select with constant arms, i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`. Fold such patterns by creating a truth table of the possible four constant variants, and materialize back the optimal logic from it via `createLogicFromTable` helper. This also generalizes an existing fold, which has therefore been dropped. Proofs: https://alive2.llvm.org/ce/z/_kkUfJ. Fixes: llvm#138212.

…Table` folding Match icmps of binops where both operands are select with constant arms, i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`. Fold such patterns by creating a truth table of the possible four constant variants, and materialize back the optimal logic from it via `createLogicFromTable` helper. This also generalizes an existing fold, which has therefore been dropped. Proofs: https://alive2.llvm.org/ce/z/NS7Vzu. Fixes: llvm/llvm-project#138212.

llvmbot added the new issue label May 1, 2025

EugeneZelenko added backend:AArch64 and removed new issue labels May 1, 2025

thesamesam added regression:18 Regression in 18 release regression:14 Regression in clang 14 release and removed regression:14 Regression in clang 14 release labels May 2, 2025

nikic added llvm:instcombine missed-optimization and removed backend:AArch64 labels May 2, 2025

antoniofrighetto self-assigned this May 6, 2025

antoniofrighetto mentioned this issue May 8, 2025

[InstCombine] Introduce foldICmpBinOpWithConstantViaTruthTable folding #139109

Merged

antoniofrighetto closed this as completed in adfd59f May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AArch64 float and bool code regression from LLVM 18 #138212

AArch64 float and bool code regression from LLVM 18 #138212

smeenai commented May 1, 2025

llvmbot commented May 1, 2025

nikic commented May 2, 2025

smeenai commented May 6, 2025 •

edited

Loading

nikic commented May 6, 2025

antoniofrighetto commented May 6, 2025

smeenai commented May 6, 2025

AArch64 float and bool code regression from LLVM 18 #138212

AArch64 float and bool code regression from LLVM 18 #138212

Comments

smeenai commented May 1, 2025

llvmbot commented May 1, 2025

nikic commented May 2, 2025

smeenai commented May 6, 2025 • edited Loading

nikic commented May 6, 2025

antoniofrighetto commented May 6, 2025

smeenai commented May 6, 2025

smeenai commented May 6, 2025 •

edited

Loading