-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[SCEV] Try to re-use existing LCSSA phis when expanding SCEVAddRecExpr. #147214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi. This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer. Compile-time impact should be mostly neutral https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesIf an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi. This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer. Compile-time impact should be mostly neutral Full diff: https://github.com/llvm/llvm-project/pull/147214.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h b/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
index a101151eed7cc..39fef921a9590 100644
--- a/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
+++ b/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
@@ -530,6 +530,7 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
bool isExpandedAddRecExprPHI(PHINode *PN, Instruction *IncV, const Loop *L);
+ Value *tryToReuseLCSSAPhi(const SCEVAddRecExpr *S);
Value *expandAddRecExprLiterally(const SCEVAddRecExpr *);
PHINode *getAddRecExprPHILiterally(const SCEVAddRecExpr *Normalized,
const Loop *L, Type *&TruncTy,
diff --git a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
index 24fe08d6c3e4e..438ca6cb785be 100644
--- a/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
+++ b/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp
@@ -1223,6 +1223,24 @@ Value *SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr *S) {
return Result;
}
+Value *SCEVExpander::tryToReuseLCSSAPhi(const SCEVAddRecExpr *S) {
+ const Loop *L = S->getLoop();
+ BasicBlock *EB = L->getExitBlock();
+ if (!EB || !EB->getSinglePredecessor() ||
+ !SE.DT.dominates(EB, Builder.GetInsertBlock()))
+ return nullptr;
+
+ for (auto &PN : EB->phis()) {
+ if (!SE.isSCEVable(PN.getType()) || PN.getType() != S->getType())
+ continue;
+ auto *ExitV = SE.getSCEV(&PN);
+ if (S == ExitV)
+ return &PN;
+ }
+
+ return nullptr;
+}
+
Value *SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr *S) {
// In canonical mode we compute the addrec as an expression of a canonical IV
// using evaluateAtIteration and expand the resulting SCEV expression. This
@@ -1262,6 +1280,11 @@ Value *SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr *S) {
return V;
}
+ // If there S is expanded outside the defining loop, check if there is a
+ // matching LCSSA phi node for it.
+ if (Value *V = tryToReuseLCSSAPhi(S))
+ return V;
+
// {X,+,F} --> X + {0,+,F}
if (!S->getStart()->isZero()) {
if (isa<PointerType>(S->getType())) {
diff --git a/llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll b/llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll
index 2747895f06a7b..ce4270dc4b7fa 100644
--- a/llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll
+++ b/llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll
@@ -18,11 +18,9 @@ define void @reuse_lcssa_phi_for_add_rec1(ptr %head) {
; CHECK-NEXT: [[IV_NEXT]] = add nuw i64 [[IV]], 1
; CHECK-NEXT: br i1 [[EC_1]], label %[[PH:.*]], label %[[LOOP_1]]
; CHECK: [[PH]]:
-; CHECK-NEXT: [[IV_2_LCSSA:%.*]] = phi i32 [ [[IV_2]], %[[LOOP_1]] ]
; CHECK-NEXT: [[IV_LCSSA:%.*]] = phi i64 [ [[IV]], %[[LOOP_1]] ]
-; CHECK-NEXT: [[IV_2_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_2_NEXT]], %[[LOOP_1]] ]
+; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[IV_2_NEXT]], %[[LOOP_1]] ]
; CHECK-NEXT: [[SRC_2:%.*]] = tail call noalias noundef dereferenceable_or_null(8) ptr @calloc(i64 1, i64 8)
-; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[IV_2_LCSSA]], 1
; CHECK-NEXT: [[SMIN:%.*]] = call i32 @llvm.smin.i32(i32 [[TMP0]], i32 1)
; CHECK-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[SMIN]]
; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64
|
; CHECK-NEXT: [[IV_2_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_2_NEXT]], %[[LOOP_1]] ] | ||
; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[IV_2_NEXT]], %[[LOOP_1]] ] | ||
; CHECK-NEXT: [[SRC_2:%.*]] = tail call noalias noundef dereferenceable_or_null(8) ptr @calloc(i64 1, i64 8) | ||
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[IV_2_LCSSA]], 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Isn't tmp0 = iv_2_lcssa + 1
where the + 1
is dropped now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version has tmp0 = %iv.2 + 1
, in the new version tmp0 = iv.2.next
which should also be %iv.2 + 1
Hm, this has no impact in the real world: dtcxzyw/llvm-opt-benchmark#2550. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this has no impact in the real world: dtcxzyw/llvm-opt-benchmark#2550.
AFAIK llvm-opt-benchmark
disables LoopVectorize, which is the main source of expansions where it could be beneficial.
Real-world code where this applies includes a few instances in ffmpeg
, brotli
, freetype
, llvm's APFloat.cpp and a few others.
I will also share a follow-up patch that allows re-using existing LCSSA phi nodes in more cases.
; CHECK-NEXT: [[IV_2_NEXT_LCSSA:%.*]] = phi i32 [ [[IV_2_NEXT]], %[[LOOP_1]] ] | ||
; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ [[IV_2_NEXT]], %[[LOOP_1]] ] | ||
; CHECK-NEXT: [[SRC_2:%.*]] = tail call noalias noundef dereferenceable_or_null(8) ptr @calloc(i64 1, i64 8) | ||
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[IV_2_LCSSA]], 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version has tmp0 = %iv.2 + 1
, in the new version tmp0 = iv.2.next
which should also be %iv.2 + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the implementation approach here. At a high level, I'd expect the reuse to be for the in-loop expansion, and then reuse of an existing LCSSA phi node to be an optimization during LCSSA fixup. Is the idea here that we can't find existing expansions (that are not part of the ExprValue map) in the loop efficiently, so instead we focus on only the out-of-loop case, because that one only has to inspect LCSSA phi nodes? And this is also limited to addrecs only for the same reason?
@@ -1262,6 +1280,11 @@ Value *SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr *S) { | |||
return V; | |||
} | |||
|
|||
// If there S is expanded outside the defining loop, check if there is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If there S is expanded outside the defining loop, check if there is a | |
// If S is expanded outside the defining loop, check if there is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
@@ -1223,6 +1223,24 @@ Value *SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr *S) { | |||
return Result; | |||
} | |||
|
|||
Value *SCEVExpander::tryToReuseLCSSAPhi(const SCEVAddRecExpr *S) { | |||
const Loop *L = S->getLoop(); | |||
BasicBlock *EB = L->getExitBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Limitation to single-exit loops is unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to extend this to multi-exit loops, and just check for the exit dominating the insertion point as follow-up?
The main thing is that we need phis with single incoming values.
Or is the addrec expansion actually in the loop and the problem is that we currently don't support reusing expansions if they require LCSSA phi nodes due to https://github.com/fhahn/llvm-project/blob/8c8c5509e948130432d0b76ce7679a837cfa735c/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1495-L1502 If that's the case, I think we should add support for that. Doesn't make sense to create a new expansion if all we need to do is (maybe) insert LCSSA phi nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the addrec expansion actually in the loop and the problem is that we currently don't support reusing expansions if they require LCSSA phi nodes due to https://github.com/fhahn/llvm-project/blob/8c8c5509e948130432d0b76ce7679a837cfa735c/llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp#L1495-L1502 If that's the case, I think we should add support for that. Doesn't make sense to create a new expansion if all we need to do is (maybe) insert LCSSA phi nodes.
The expansion is outside the loop, they originate from runtime checks of another loop that uses a value out of the earlier loop.
The reason we don't have entries in the ExprValue map, is because the lcssa phi node we can re-use hasn't been visited by SCEV, so won't get added. The runtime check uses the BTC of the second loop, which will get computed w/o visiting the lcssa phi for IV + 1.
We could extend the phi check for all expressions that are defined in a loop and used outside one, but I am not sure how we would best detect that for non-AddRec exprs?
@@ -1262,6 +1280,11 @@ Value *SCEVExpander::visitAddRecExpr(const SCEVAddRecExpr *S) { | |||
return V; | |||
} | |||
|
|||
// If there S is expanded outside the defining loop, check if there is a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
@@ -1223,6 +1223,24 @@ Value *SCEVExpander::expandAddRecExprLiterally(const SCEVAddRecExpr *S) { | |||
return Result; | |||
} | |||
|
|||
Value *SCEVExpander::tryToReuseLCSSAPhi(const SCEVAddRecExpr *S) { | |||
const Loop *L = S->getLoop(); | |||
BasicBlock *EB = L->getExitBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to extend this to multi-exit loops, and just check for the exit dominating the insertion point as follow-up?
The main thing is that we need phis with single incoming values.
If an AddRec is expanded outside a loop with a single exit block, check if any of the (lcssa) phi nodes in the exit block match the AddRec. If that's the case, simply use the existing lcssa phi.
This can reduce the number of instruction created for SCEV expansions, mainly for runtime checks generated by the loop vectorizer.
Compile-time impact should be mostly neutral
https://llvm-compile-time-tracker.com/compare.php?from=48c7a3187f9831304a38df9bdb3b4d5bf6b6b1a2&to=cf9d039a7b0db5d0d912e0e2c01b19c2a653273a&stat=instructions:u