-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LV] Improve code in selectInterleaveCount (NFC) #128002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers Author: Ramkumar Ramachandra (artagnon) ChangesFull diff: https://github.com/llvm/llvm-project/pull/128002.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 8c41f896ad622..0c3afab724b34 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4922,7 +4922,6 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
if (Legal->hasUncountableEarlyExit())
return 1;
- auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);
const bool HasReductions = !Legal->getReductionVars().empty();
// If we did not calculate the cost for VF (because the user selected the VF)
@@ -4998,51 +4997,53 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
}
unsigned EstimatedVF = getEstimatedRuntimeVF(VF, VScaleForTuning);
- unsigned KnownTC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
- if (KnownTC > 0) {
- // At least one iteration must be scalar when this constraint holds. So the
- // maximum available iterations for interleaving is one less.
- unsigned AvailableTC =
- requiresScalarEpilogue(VF.isVector()) ? KnownTC - 1 : KnownTC;
-
- // If trip count is known we select between two prospective ICs, where
- // 1) the aggressive IC is capped by the trip count divided by VF
- // 2) the conservative IC is capped by the trip count divided by (VF * 2)
- // The final IC is selected in a way that the epilogue loop trip count is
- // minimized while maximizing the IC itself, so that we either run the
- // vector loop at least once if it generates a small epilogue loop, or else
- // we run the vector loop at least twice.
-
- unsigned InterleaveCountUB = bit_floor(
- std::max(1u, std::min(AvailableTC / EstimatedVF, MaxInterleaveCount)));
- unsigned InterleaveCountLB = bit_floor(std::max(
- 1u, std::min(AvailableTC / (EstimatedVF * 2), MaxInterleaveCount)));
- MaxInterleaveCount = InterleaveCountLB;
-
- if (InterleaveCountUB != InterleaveCountLB) {
- unsigned TailTripCountUB =
- (AvailableTC % (EstimatedVF * InterleaveCountUB));
- unsigned TailTripCountLB =
- (AvailableTC % (EstimatedVF * InterleaveCountLB));
- // If both produce same scalar tail, maximize the IC to do the same work
- // in fewer vector loop iterations
- if (TailTripCountUB == TailTripCountLB)
- MaxInterleaveCount = InterleaveCountUB;
- }
- } else if (BestKnownTC && *BestKnownTC > 0) {
+
+ // Try to get the exact trip count, or an estimate based on profiling data or
+ // ConstantMax from PSE, failing that.
+ if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop)) {
// At least one iteration must be scalar when this constraint holds. So the
// maximum available iterations for interleaving is one less.
unsigned AvailableTC = requiresScalarEpilogue(VF.isVector())
? (*BestKnownTC) - 1
: *BestKnownTC;
- // If trip count is an estimated compile time constant, limit the
- // IC to be capped by the trip count divided by VF * 2, such that the vector
- // loop runs at least twice to make interleaving seem profitable when there
- // is an epilogue loop present. Since exact Trip count is not known we
- // choose to be conservative in our IC estimate.
- MaxInterleaveCount = bit_floor(std::max(
+ unsigned InterleaveCountLB = bit_floor(std::max(
1u, std::min(AvailableTC / (EstimatedVF * 2), MaxInterleaveCount)));
+
+ if (PSE.getSE()->getSmallConstantTripCount(TheLoop) > 0) {
+ // If the estimated trip count is actually an exact one we select between
+ // two prospective ICs, where
+ //
+ // 1) the aggressive IC is capped by the trip count divided by VF
+ // 2) the conservative IC is capped by the trip count divided by (VF * 2)
+ //
+ // The final IC is selected in a way that the epilogue loop trip count is
+ // minimized while maximizing the IC itself, so that we either run the
+ // vector loop at least once if it generates a small epilogue loop, or
+ // else we run the vector loop at least twice.
+
+ unsigned InterleaveCountUB = bit_floor(std::max(
+ 1u, std::min(AvailableTC / EstimatedVF, MaxInterleaveCount)));
+ MaxInterleaveCount = InterleaveCountLB;
+
+ if (InterleaveCountUB != InterleaveCountLB) {
+ unsigned TailTripCountUB =
+ (AvailableTC % (EstimatedVF * InterleaveCountUB));
+ unsigned TailTripCountLB =
+ (AvailableTC % (EstimatedVF * InterleaveCountLB));
+ // If both produce same scalar tail, maximize the IC to do the same work
+ // in fewer vector loop iterations
+ if (TailTripCountUB == TailTripCountLB)
+ MaxInterleaveCount = InterleaveCountUB;
+ }
+ } else {
+ // If trip count is an estimated compile time constant, limit the
+ // IC to be capped by the trip count divided by VF * 2, such that the
+ // vector loop runs at least twice to make interleaving seem profitable
+ // when there is an epilogue loop present. Since exact Trip count is not
+ // known we choose to be conservative in our IC estimate.
+ MaxInterleaveCount = InterleaveCountLB;
+ }
}
assert(MaxInterleaveCount > 0 &&
|
@llvm/pr-subscribers-llvm-transforms Author: Ramkumar Ramachandra (artagnon) ChangesFull diff: https://github.com/llvm/llvm-project/pull/128002.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 8c41f896ad622..0c3afab724b34 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4922,7 +4922,6 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
if (Legal->hasUncountableEarlyExit())
return 1;
- auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);
const bool HasReductions = !Legal->getReductionVars().empty();
// If we did not calculate the cost for VF (because the user selected the VF)
@@ -4998,51 +4997,53 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
}
unsigned EstimatedVF = getEstimatedRuntimeVF(VF, VScaleForTuning);
- unsigned KnownTC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
- if (KnownTC > 0) {
- // At least one iteration must be scalar when this constraint holds. So the
- // maximum available iterations for interleaving is one less.
- unsigned AvailableTC =
- requiresScalarEpilogue(VF.isVector()) ? KnownTC - 1 : KnownTC;
-
- // If trip count is known we select between two prospective ICs, where
- // 1) the aggressive IC is capped by the trip count divided by VF
- // 2) the conservative IC is capped by the trip count divided by (VF * 2)
- // The final IC is selected in a way that the epilogue loop trip count is
- // minimized while maximizing the IC itself, so that we either run the
- // vector loop at least once if it generates a small epilogue loop, or else
- // we run the vector loop at least twice.
-
- unsigned InterleaveCountUB = bit_floor(
- std::max(1u, std::min(AvailableTC / EstimatedVF, MaxInterleaveCount)));
- unsigned InterleaveCountLB = bit_floor(std::max(
- 1u, std::min(AvailableTC / (EstimatedVF * 2), MaxInterleaveCount)));
- MaxInterleaveCount = InterleaveCountLB;
-
- if (InterleaveCountUB != InterleaveCountLB) {
- unsigned TailTripCountUB =
- (AvailableTC % (EstimatedVF * InterleaveCountUB));
- unsigned TailTripCountLB =
- (AvailableTC % (EstimatedVF * InterleaveCountLB));
- // If both produce same scalar tail, maximize the IC to do the same work
- // in fewer vector loop iterations
- if (TailTripCountUB == TailTripCountLB)
- MaxInterleaveCount = InterleaveCountUB;
- }
- } else if (BestKnownTC && *BestKnownTC > 0) {
+
+ // Try to get the exact trip count, or an estimate based on profiling data or
+ // ConstantMax from PSE, failing that.
+ if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop)) {
// At least one iteration must be scalar when this constraint holds. So the
// maximum available iterations for interleaving is one less.
unsigned AvailableTC = requiresScalarEpilogue(VF.isVector())
? (*BestKnownTC) - 1
: *BestKnownTC;
- // If trip count is an estimated compile time constant, limit the
- // IC to be capped by the trip count divided by VF * 2, such that the vector
- // loop runs at least twice to make interleaving seem profitable when there
- // is an epilogue loop present. Since exact Trip count is not known we
- // choose to be conservative in our IC estimate.
- MaxInterleaveCount = bit_floor(std::max(
+ unsigned InterleaveCountLB = bit_floor(std::max(
1u, std::min(AvailableTC / (EstimatedVF * 2), MaxInterleaveCount)));
+
+ if (PSE.getSE()->getSmallConstantTripCount(TheLoop) > 0) {
+ // If the estimated trip count is actually an exact one we select between
+ // two prospective ICs, where
+ //
+ // 1) the aggressive IC is capped by the trip count divided by VF
+ // 2) the conservative IC is capped by the trip count divided by (VF * 2)
+ //
+ // The final IC is selected in a way that the epilogue loop trip count is
+ // minimized while maximizing the IC itself, so that we either run the
+ // vector loop at least once if it generates a small epilogue loop, or
+ // else we run the vector loop at least twice.
+
+ unsigned InterleaveCountUB = bit_floor(std::max(
+ 1u, std::min(AvailableTC / EstimatedVF, MaxInterleaveCount)));
+ MaxInterleaveCount = InterleaveCountLB;
+
+ if (InterleaveCountUB != InterleaveCountLB) {
+ unsigned TailTripCountUB =
+ (AvailableTC % (EstimatedVF * InterleaveCountUB));
+ unsigned TailTripCountLB =
+ (AvailableTC % (EstimatedVF * InterleaveCountLB));
+ // If both produce same scalar tail, maximize the IC to do the same work
+ // in fewer vector loop iterations
+ if (TailTripCountUB == TailTripCountLB)
+ MaxInterleaveCount = InterleaveCountUB;
+ }
+ } else {
+ // If trip count is an estimated compile time constant, limit the
+ // IC to be capped by the trip count divided by VF * 2, such that the
+ // vector loop runs at least twice to make interleaving seem profitable
+ // when there is an epilogue loop present. Since exact Trip count is not
+ // known we choose to be conservative in our IC estimate.
+ MaxInterleaveCount = InterleaveCountLB;
+ }
}
assert(MaxInterleaveCount > 0 &&
|
Can you add more description to this PR explaining why the newer version is better? |
Done. |
c0a966b
to
c49546b
Compare
Gentle ping. |
Gentle ping. |
2 similar comments
Gentle ping. |
Gentle ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I just had a minor comment about wording in a code comment, but apart from that looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Use the fact that getSmallBestKnownTC returns an exact trip count, if possible, and falls back to returning an estimate, to factor some code in selectInterleaveCount.