Skip to content

AArch64 float and bool code regression from LLVM 18 #138212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smeenai opened this issue May 1, 2025 · 6 comments · Fixed by #139109
Closed

AArch64 float and bool code regression from LLVM 18 #138212

smeenai opened this issue May 1, 2025 · 6 comments · Fixed by #139109
Assignees

Comments

@smeenai
Copy link
Collaborator

smeenai commented May 1, 2025

We recently upgraded an internal codebase from LLVM 18 to LLVM 19 and observed regressions in several benchmarks. LLVM 20 and trunk exhibit the same regressions as well.

The simplest reduction demonstrating an issue is https://godbolt.org/z/hG3oh4sjY, where the code generated by Clang 19 seems strictly worse than the code generated by Clang 18. Clang trunk's code is larger than 18's (although smaller than 19's) but branchless, so I guess it could be better or worse depending on the input data.

https://godbolt.org/z/ns6rqhese is a more realistic reduction, which shows LLVM 18 generating code that takes advantage of vector instructions, which later versions fail to do. https://godbolt.org/z/e4q4E6xE7 is an intermediate simplification where 19's code seems strictly worse than 18's, with trunk's code being the largest but also having the fewest branches.

I bisected all these regressions from 18 to 19 to #84628. CC the diff author @YanWQ-monad and reviewers @goldsteinn, @dtcxzyw and @nikic.

@llvmbot
Copy link
Member

llvmbot commented May 1, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Shoaib Meenai (smeenai)

We recently upgraded an internal codebase from LLVM 18 to LLVM 19 and observed regressions in several benchmarks. LLVM 20 and trunk exhibit the same regressions as well.

The simplest reduction demonstrating an issue is https://godbolt.org/z/hG3oh4sjY, where the code generated by Clang 19 seems strictly worse than the code generated by Clang 18. Clang trunk's code is larger than 18's (although smaller than 19's) but branchless, so I guess it could be better or worse depending on the input data.

https://godbolt.org/z/ns6rqhese is a more realistic reduction, which shows LLVM 18 generating code that takes advantage of vector instructions, which later versions fail to do. https://godbolt.org/z/e4q4E6xE7 is an intermediate simplification where 19's code seems strictly worse than 18's, with trunk's code being the largest but also having the fewest branches.

I bisected all these regressions from 18 to 19 to #84628. CC the diff author @YanWQ-monad and reviewers @goldsteinn, @dtcxzyw and @nikic.

@thesamesam thesamesam added regression:18 Regression in 18 release regression:14 Regression in clang 14 release and removed regression:14 Regression in clang 14 release labels May 2, 2025
@nikic
Copy link
Contributor

nikic commented May 2, 2025

Missing fold: https://alive2.llvm.org/ce/z/PnHPtH

@smeenai
Copy link
Collaborator Author

smeenai commented May 6, 2025

How general should the fold be? Assuming one arm of each select is 0, the condition I can think of for the other constants is that their OR must be different from either constant (i.e. each constant has at least one bit set that's not set in the other constant, https://alive2.llvm.org/ce/z/HfCQx8). You could extend that to non-constants if you know enough bits to validate that condition. You could come up with conditions that work if the other arm of the select is non-zero, or for logical operations other than or. What did you have in mind here?

@nikic
Copy link
Contributor

nikic commented May 6, 2025

Yeah, there are a lot of variants that could be implemented here. I think it's fine to limit it to constants, but requiring one arm to be zero isn't really necessary.

One possibility would be to match a general icmp pred (select a ? c1 : c2) or (select b ? c3 : c4), c5 pattern and then just brute-force evaluate all four variants of icmp pred c1 or c2, c5 to create a truth table, which can then be converted back to code using the createLogicFromTable() helper. (If we're using the constant folding API, then replacing or with any binop should be essentially free as well).

@antoniofrighetto
Copy link
Contributor

@smeenai Can take a look this later if you wish.

@smeenai
Copy link
Collaborator Author

smeenai commented May 6, 2025

That would be amazing, thank you!

@antoniofrighetto antoniofrighetto self-assigned this May 6, 2025
antoniofrighetto added a commit to antoniofrighetto/llvm-project that referenced this issue May 8, 2025
Match icmps of binops where both operands are select with constant arms,
i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`.
Fold such patterns by creating a truth table of the possible four
constant variants, and materialize back the optimal logic from it via
`createLogicFromTable` helper.

Proofs: https://alive2.llvm.org/ce/z/_kkUfJ.

Fixes: llvm#138212.
antoniofrighetto added a commit to antoniofrighetto/llvm-project that referenced this issue May 10, 2025
Match icmps of binops where both operands are select with constant arms,
i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`.
Fold such patterns by creating a truth table of the possible four
constant variants, and materialize back the optimal logic from it via
`createLogicFromTable` helper. This also generalizes an existing fold,
which has therefore been dropped.

Proofs: https://alive2.llvm.org/ce/z/_kkUfJ.

Fixes: llvm#138212.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue May 13, 2025
…Table` folding

Match icmps of binops where both operands are select with constant arms,
i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`.
Fold such patterns by creating a truth table of the possible four
constant variants, and materialize back the optimal logic from it via
`createLogicFromTable` helper. This also generalizes an existing fold,
which has therefore been dropped.

Proofs: https://alive2.llvm.org/ce/z/NS7Vzu.

Fixes: llvm/llvm-project#138212.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants