-
Notifications
You must be signed in to change notification settings - Fork 13.5k
AArch64 float and bool code regression from LLVM 18 #138212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-aarch64 Author: Shoaib Meenai (smeenai)
We recently upgraded an internal codebase from LLVM 18 to LLVM 19 and observed regressions in several benchmarks. LLVM 20 and trunk exhibit the same regressions as well.
The simplest reduction demonstrating an issue is https://godbolt.org/z/hG3oh4sjY, where the code generated by Clang 19 seems strictly worse than the code generated by Clang 18. Clang trunk's code is larger than 18's (although smaller than 19's) but branchless, so I guess it could be better or worse depending on the input data. https://godbolt.org/z/ns6rqhese is a more realistic reduction, which shows LLVM 18 generating code that takes advantage of vector instructions, which later versions fail to do. https://godbolt.org/z/e4q4E6xE7 is an intermediate simplification where 19's code seems strictly worse than 18's, with trunk's code being the largest but also having the fewest branches. I bisected all these regressions from 18 to 19 to #84628. CC the diff author @YanWQ-monad and reviewers @goldsteinn, @dtcxzyw and @nikic. |
Missing fold: https://alive2.llvm.org/ce/z/PnHPtH |
How general should the fold be? Assuming one arm of each select is 0, the condition I can think of for the other constants is that their OR must be different from either constant (i.e. each constant has at least one bit set that's not set in the other constant, https://alive2.llvm.org/ce/z/HfCQx8). You could extend that to non-constants if you know enough bits to validate that condition. You could come up with conditions that work if the other arm of the select is non-zero, or for logical operations other than |
Yeah, there are a lot of variants that could be implemented here. I think it's fine to limit it to constants, but requiring one arm to be zero isn't really necessary. One possibility would be to match a general |
@smeenai Can take a look this later if you wish. |
That would be amazing, thank you! |
Match icmps of binops where both operands are select with constant arms, i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`. Fold such patterns by creating a truth table of the possible four constant variants, and materialize back the optimal logic from it via `createLogicFromTable` helper. Proofs: https://alive2.llvm.org/ce/z/_kkUfJ. Fixes: llvm#138212.
Match icmps of binops where both operands are select with constant arms, i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`. Fold such patterns by creating a truth table of the possible four constant variants, and materialize back the optimal logic from it via `createLogicFromTable` helper. This also generalizes an existing fold, which has therefore been dropped. Proofs: https://alive2.llvm.org/ce/z/_kkUfJ. Fixes: llvm#138212.
…Table` folding Match icmps of binops where both operands are select with constant arms, i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`. Fold such patterns by creating a truth table of the possible four constant variants, and materialize back the optimal logic from it via `createLogicFromTable` helper. This also generalizes an existing fold, which has therefore been dropped. Proofs: https://alive2.llvm.org/ce/z/NS7Vzu. Fixes: llvm/llvm-project#138212.
We recently upgraded an internal codebase from LLVM 18 to LLVM 19 and observed regressions in several benchmarks. LLVM 20 and trunk exhibit the same regressions as well.
The simplest reduction demonstrating an issue is https://godbolt.org/z/hG3oh4sjY, where the code generated by Clang 19 seems strictly worse than the code generated by Clang 18. Clang trunk's code is larger than 18's (although smaller than 19's) but branchless, so I guess it could be better or worse depending on the input data.
https://godbolt.org/z/ns6rqhese is a more realistic reduction, which shows LLVM 18 generating code that takes advantage of vector instructions, which later versions fail to do. https://godbolt.org/z/e4q4E6xE7 is an intermediate simplification where 19's code seems strictly worse than 18's, with trunk's code being the largest but also having the fewest branches.
I bisected all these regressions from 18 to 19 to #84628. CC the diff author @YanWQ-monad and reviewers @goldsteinn, @dtcxzyw and @nikic.
The text was updated successfully, but these errors were encountered: