-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638
Conversation
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesThe motivation of this PR is to make #115274 easier to implement. It's similar to the idea in #82021 (but admittedly I didn't notice it until I had already written this!), and should allow us to add EVL support by just passing EVL to the VF operand. The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body. The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:
gets expanded to:
This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves. I initially tried just splitting up VPWidenIntOrFpInductionRecipe immediately in I also tried to avoid the start and increment recipes by expanding them directly into VPInstructions/VPWidenIntrinsicRecipe/VPScalarCastRecipes, but I ran into some difficulties when trying to broadcast a scalar type to use in a widened binary op. (I'm happy to give this another try though!) I hoped to make this an NFC, but unfortunately some of the splatted values get shuffled about in the preheaders. I may be duplicating other work here, and this was just the solution I stumbled upon. Open to any other ideas or approaches! Patch is 84.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118638.diff 19 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index b801d1863e252c..4e4c4dfd461824 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1043,18 +1043,12 @@ void VPlan::execute(VPTransformState *State) {
if (isa<VPWidenPHIRecipe>(&R))
continue;
- if (isa<VPWidenPointerInductionRecipe>(&R) ||
- isa<VPWidenIntOrFpInductionRecipe>(&R)) {
- PHINode *Phi = nullptr;
- if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
- Phi = cast<PHINode>(State->get(R.getVPSingleValue()));
- } else {
- auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
- assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
- "recipe generating only scalars should have been replaced");
- auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
- Phi = cast<PHINode>(GEP->getPointerOperand());
- }
+ if (isa<VPWidenPointerInductionRecipe>(&R)) {
+ auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+ assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
+ "recipe generating only scalars should have been replaced");
+ auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
+ PHINode *Phi = cast<PHINode>(GEP->getPointerOperand());
Phi->setIncomingBlock(1, VectorLatchBB);
@@ -1062,10 +1056,6 @@ void VPlan::execute(VPTransformState *State) {
// consistent placement of all induction updates.
Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
Inc->moveBefore(VectorLatchBB->getTerminator()->getPrevNode());
-
- // Use the steps for the last part as backedge value for the induction.
- if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
- Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand()));
continue;
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e1d828f038f9a2..513e973af8fc1a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2089,7 +2089,9 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
};
/// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
+/// producing their vector values. This won't execute any LLVM IR and will get
+/// expanded later into VPWidenIntOrFpInitialRecipe, VPWidenIntOrFpPHIRecipe and
+/// VPWidenIntOrFpBackedgeRecipe.
class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
PHINode *IV;
TruncInst *Trunc;
@@ -2122,9 +2124,10 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
- /// Generate the vectorized and scalarized versions of the phi node as
- /// needed by their users.
- void execute(VPTransformState &State) override;
+ void execute(VPTransformState &State) override {
+ llvm_unreachable("cannot execute this recipe, should be expanded via "
+ "expandVPWidenIntOrFpInductionRecipe");
+ }
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
@@ -2180,10 +2183,152 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
}
/// Returns the VPValue representing the value of this induction at
- /// the last unrolled part, if it exists. Returns itself if unrolling did not
+ /// the last unrolled part, if it exists. Returns nullptr if unrolling did not
/// take place.
VPValue *getLastUnrolledPartOperand() {
- return getNumOperands() == 5 ? getOperand(4) : this;
+ return getNumOperands() == 5 ? getOperand(4) : nullptr;
+ }
+};
+
+/// A recipe to compute the initial value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionInitialRecipe : public VPSingleDefRecipe {
+ Instruction *IV;
+ const InductionDescriptor &ID;
+
+public:
+ VPWidenIntOrFpInductionInitialRecipe(Instruction *IV, VPValue *Start,
+ VPValue *Step,
+ const InductionDescriptor &ID)
+ : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionStartSC, {Start, Step}),
+ IV(IV), ID(ID) {
+ assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+ "Expected either an induction phi-node or a truncate of it!");
+ }
+
+ ~VPWidenIntOrFpInductionInitialRecipe() override = default;
+
+ VPWidenIntOrFpInductionInitialRecipe *clone() override {
+ return new VPWidenIntOrFpInductionInitialRecipe(IV, getOperand(0),
+ getOperand(1), ID);
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionStartSC)
+
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ VPValue *getStartValue() { return getOperand(0); }
+ const VPValue *getStartValue() const { return getOperand(0); }
+
+ VPValue *getStepValue() { return getOperand(1); }
+ const VPValue *getStepValue() const { return getOperand(1); }
+
+ /// Returns the scalar type of the induction.
+ Type *getScalarType() const { return IV->getType(); }
+
+ bool onlyFirstLaneUsed(const VPValue *Op) const override {
+ assert(is_contained(operands(), Op) &&
+ "Op must be an operand of the recipe");
+ return true;
+ }
+};
+
+/// A recipe to generate the PHI of a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe {
+ Instruction *IV;
+
+public:
+ VPWidenIntOrFpInductionPHIRecipe(Instruction *IV, VPValue *Start)
+ : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionPHISC, IV, Start),
+ IV(IV) {
+ assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+ "Expected either an induction phi-node or a truncate of it!");
+ }
+
+ ~VPWidenIntOrFpInductionPHIRecipe() override = default;
+
+ VPWidenIntOrFpInductionPHIRecipe *clone() override {
+ auto *R = new VPWidenIntOrFpInductionPHIRecipe(IV, getOperand(0));
+ R->addOperand(getBackedgeValue());
+ return R;
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionPHISC)
+
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
+/// A recipe to compute the backedge value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionBackedgeRecipe : public VPSingleDefRecipe {
+ Instruction *IV;
+ const InductionDescriptor &ID;
+
+public:
+ VPWidenIntOrFpInductionBackedgeRecipe(Instruction *IV, VPValue *Step,
+ VPValue *VF, VPValue *Prev,
+ VPValue *SplatVF,
+ const InductionDescriptor &ID)
+ : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionSC, {Step, VF, Prev}),
+ IV(IV), ID(ID) {
+ assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+ "Expected either an induction phi-node or a truncate of it!");
+ if (SplatVF)
+ addOperand(SplatVF);
+ }
+
+ ~VPWidenIntOrFpInductionBackedgeRecipe() override = default;
+
+ VPWidenIntOrFpInductionBackedgeRecipe *clone() override {
+ return new VPWidenIntOrFpInductionBackedgeRecipe(
+ IV, getOperand(0), getOperand(1), getOperand(2), getOperand(3), ID);
+ }
+
+ VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionIncSC)
+
+ void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ /// Print the recipe.
+ void print(raw_ostream &O, const Twine &Indent,
+ VPSlotTracker &SlotTracker) const override;
+#endif
+
+ VPValue *getStepValue() { return getOperand(0); }
+ const VPValue *getStepValue() const { return getOperand(0); }
+
+ VPValue *getVFValue() { return getOperand(1); }
+ const VPValue *getVFValue() const { return getOperand(1); }
+
+ VPValue *getPrevValue() { return getOperand(2); }
+ const VPValue *getPrevValue() const { return getOperand(2); }
+
+ VPValue *getSplatVFValue() {
+ // If the recipe has been unrolled (4 operands), return the VPValue for the
+ // induction increment.
+ return getNumOperands() == 4 ? getOperand(3) : nullptr;
+ }
+
+ /// Returns the scalar type of the induction.
+ Type *getScalarType() const { return IV->getType(); }
+
+ bool onlyFirstLaneUsed(const VPValue *Op) const override {
+ assert(is_contained(operands(), Op) &&
+ "Op must be an operand of the recipe");
+ return Op == getOperand(0) || Op == getOperand(1);
}
};
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 969d07b229e469..8a9e64b00850e2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -214,14 +214,17 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
.Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe,
- VPScalarPHIRecipe>([this](const auto *R) {
- // Handle header phi recipes, except VPWidenIntOrFpInduction
- // which needs special handling due it being possibly truncated.
- // TODO: consider inferring/caching type of siblings, e.g.,
- // backedge value, here and in cases below.
- return inferScalarType(R->getStartValue());
- })
- .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
+ VPScalarPHIRecipe, VPWidenIntOrFpInductionPHIRecipe>(
+ [this](const auto *R) {
+ // Handle header phi recipes, except VPWidenIntOrFpInduction
+ // which needs special handling due it being possibly truncated.
+ // TODO: consider inferring/caching type of siblings, e.g.,
+ // backedge value, here and in cases below.
+ return inferScalarType(R->getStartValue());
+ })
+ .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe,
+ VPWidenIntOrFpInductionInitialRecipe,
+ VPWidenIntOrFpInductionBackedgeRecipe>(
[](const auto *R) { return R->getScalarType(); })
.Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index ef5f6e22f82206..8ae1e7382bce38 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1631,47 +1631,73 @@ static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
: ConstantFP::get(Ty, C);
}
-void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
+void VPWidenIntOrFpInductionInitialRecipe::execute(VPTransformState &State) {
assert(!State.Lane && "Int or FP induction being replicated.");
- Value *Start = getStartValue()->getLiveInIRValue();
- const InductionDescriptor &ID = getInductionDescriptor();
- TruncInst *Trunc = getTruncInst();
+ Value *Start = State.get(getStartValue(), true);
IRBuilderBase &Builder = State.Builder;
- assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");
assert(State.VF.isVector() && "must have vector VF");
- // The value from the original loop to which we are mapping the new induction
- // variable.
- Instruction *EntryVal = Trunc ? cast<Instruction>(Trunc) : IV;
-
// Fast-math-flags propagate from the original induction instruction.
IRBuilder<>::FastMathFlagGuard FMFG(Builder);
- if (ID.getInductionBinOp() && isa<FPMathOperator>(ID.getInductionBinOp()))
+ if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
// Now do the actual transformations, and start with fetching the step value.
Value *Step = State.get(getStepValue(), VPLane(0));
- assert((isa<PHINode>(EntryVal) || isa<TruncInst>(EntryVal)) &&
- "Expected either an induction phi-node or a truncate of it!");
-
- // Construct the initial value of the vector IV in the vector loop preheader
- auto CurrIP = Builder.saveIP();
- BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
- Builder.SetInsertPoint(VectorPH->getTerminator());
- if (isa<TruncInst>(EntryVal)) {
- assert(Start->getType()->isIntegerTy() &&
- "Truncation requires an integer type");
- auto *TruncType = cast<IntegerType>(EntryVal->getType());
- Step = Builder.CreateTrunc(Step, TruncType);
- Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);
- }
-
+ // Construct the initial value of the vector IV
Value *Zero = getSignedIntOrFpConstant(Start->getType(), 0);
Value *SplatStart = Builder.CreateVectorSplat(State.VF, Start);
Value *SteppedStart = getStepVector(
SplatStart, Zero, Step, ID.getInductionOpcode(), State.VF, State.Builder);
+ State.set(this, SteppedStart);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionInitialRecipe::print(
+ raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+ O << Indent;
+ printAsOperand(O, SlotTracker);
+ O << " = WIDEN-INDUCTION-START ";
+ printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionPHIRecipe::execute(VPTransformState &State) {
+ BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+
+ Value *Start = State.get(getOperand(0));
+ PHINode *Phi = State.Builder.CreatePHI(Start->getType(), 2, "vec.ind");
+ Phi->addIncoming(Start, VectorPH);
+ Phi->setDebugLoc(IV->getDebugLoc());
+ State.set(this, Phi);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionPHIRecipe::print(raw_ostream &O,
+ const Twine &Indent,
+ VPSlotTracker &SlotTracker) const {
+ O << Indent;
+ printAsOperand(O, SlotTracker);
+ O << " = WIDEN-INDUCTION-PHI ";
+ printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionBackedgeRecipe::execute(VPTransformState &State) {
+ IRBuilderBase &Builder = State.Builder;
+
+ // Fast-math-flags propagate from the original induction instruction.
+ IRBuilder<>::FastMathFlagGuard FMFG(Builder);
+ if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
+ Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
+
+ Value *Step = State.get(getStepValue(), VPLane(0));
+
+ auto CurrIP = Builder.saveIP();
+ BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+ Builder.SetInsertPoint(VectorPH->getTerminator());
// We create vector phi nodes for both integer and floating-point induction
// variables. Here, we determine the kind of arithmetic we will perform.
@@ -1706,29 +1732,27 @@ void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
}
Builder.restoreIP(CurrIP);
-
- // We may need to add the step a number of times, depending on the unroll
- // factor. The last of those goes into the PHI.
- PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind");
- VecInd->insertBefore(State.CFG.PrevBB->getFirstInsertionPt());
- VecInd->setDebugLoc(EntryVal->getDebugLoc());
- State.set(this, VecInd);
+ Value *PrevVal = State.get(getPrevValue());
Instruction *LastInduction = cast<Instruction>(
- Builder.CreateBinOp(AddOp, VecInd, SplatVF, "vec.ind.next"));
- if (isa<TruncInst>(EntryVal))
- State.addMetadata(LastInduction, EntryVal);
- LastInduction->setDebugLoc(EntryVal->getDebugLoc());
+ Builder.CreateBinOp(AddOp, PrevVal, SplatVF, "vec.ind.next"));
+ if (isa<TruncInst>(IV))
+ State.addMetadata(LastInduction, IV);
+ LastInduction->setDebugLoc(IV->getDebugLoc());
- VecInd->addIncoming(SteppedStart, VectorPH);
- // Add induction update using an incorrect block temporarily. The phi node
- // will be fixed after VPlan execution. Note that at this point the latch
- // block cannot be used, as it does not exist yet.
- // TODO: Model increment value in VPlan, by turning the recipe into a
- // multi-def and a subclass of VPHeaderPHIRecipe.
- VecInd->addIncoming(LastInduction, VectorPH);
+ State.set(this, LastInduction);
}
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionBackedgeRecipe::print(
+ raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+ O << Indent;
+ printAsOperand(O, SlotTracker);
+ O << " = WIDEN-INDUCTION-INC ";
+ printOperands(O, SlotTracker);
+}
+#endif
+
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,
VPSlotTracker &SlotTracker) const {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index cee83d1015b536..9abf5f28936d53 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1820,12 +1820,91 @@ void VPlanTransforms::createInterleaveGroups(
}
}
+/// Expand a VPWidenIntOrFpInduction into separate recipes for the initial
+/// value, phi and backedge value. In the followng example:
+///
+/// vector.ph:
+/// Successor(s): vector loop
+///
+/// <x1> vector loop: {
+/// vector.body:
+/// WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir<1>, ir<%5>
+/// ...
+/// EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+/// No successors
+/// }
+///
+/// WIDEN-INDUCTION will get expanded to:
+///
+/// vector.ph:
+/// vp<%0> = WIDEN-INDUCTION-START ir<0>, ir<1>
+/// Successor(s): vector loop
+///
+/// <x1> vector loop: {
+/// vector.body:
+/// ir<%i> = WIDEN-INDUCTION-PHI vp<%0>, vp<%4>
+/// ...
+/// vp<%4> = WIDEN-INDUCTION-INC ir<1>, ir<%5>, ir<%i>
+/// EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+/// No successors
+/// }
+static void
+expandVPWidenIntOrFpInduction(VPWidenIntOrFpInductionRecipe *WidenIVR) {
+ VPlan *Plan = WidenIVR->getParent()->getPlan();
+ PHINode *PHI = WidenIVR->getPHINode();
+ VPValue *Start = WidenIVR->getStartValue();
+ VPValue *Step = WidenIVR->getStepValue();
+ VPValue *VF = WidenIVR->getVFValue();
+ const InductionDescriptor &ID = WidenIVR->getInductionDescriptor();
+ TruncInst *Trunc = WidenIVR->getTruncInst();
+
+ // The value from the original loop to which we are mapping the new induction
+ // variable.
+ Instruction *IV = Trunc ? cast<Instruction>(Trunc) : PHI;
+
+ // If the phi is truncated, truncate the start and step values.
+ VPBuilder Builder(Plan->getVectorPreheader());
+ if (isa<TruncInst>(IV)) {
+ assert(Start->getUnderlyingValue()->getType()->isIntegerTy() &&
+ "Truncation requires an integer type");
+ auto *TruncType = cast<IntegerType>(IV->getType());
+ Step = Builder.createScalarCast(Instruction::Trunc, Step, TruncType);
+ Start = Builder.createScalarCast(Instruction::Trunc, Start, TruncType);
+ }
+
+ // Construct the initial value of the vector IV in the vector loop preheader.
+ auto *StartR = new VPWidenIntOrFpInductionInitialRecipe(IV, Start, Step, ID);
+ Plan->getVectorPreheader()->insert(StartR, Builder.getInsertPoint());
+
+ // Create the widened phi of the vector IV.
+ auto *PhiR = new VPWidenIntOrFpInductionPHIRecipe(IV, StartR);
+ PhiR->insertBefore(WidenIVR);
+
+ // Create the backedge value for the vector IV.
+ VPValue *Prev = PhiR;
+ // If unrolled, use the last unrolled part in the increment.
+ if (auto *UnrolledPart = WidenIVR->getLastUnrolledPartOperand())
+ Prev = UnrolledPart;
+ auto *IncR = new VPWidenIntOrFpInductionBackedgeRecipe(
+ IV, Step, VF, Prev, WidenIVR->getSplatVFValue(), ID);
+ VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock();
+ ExitingBB->insert(IncR, ExitingBB->getTer...
[truncated]
|
We can reuse VPWidenPHI in llvm#118638, but it requires us to allow it in the non-native path. We also need to propagate the DebugLoc and use a different name in the generated PHI, so this splits these parts off in case we want it. We lose some debug info in dbg-outer-loop-vect.ll, but I think this is because the underlying phi node didn't have a DebugLoc to begin with. I think the current version is just carrying over the DebugLoc from the previous state.
|
||
/// A recipe to generate the PHI of a widened IV, expanded from | ||
/// VPWidenIntOrFpInductionRecipe. | ||
class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall direction aligns well with current and future VPlan roadmap. As is the current version adds a bunch of complexity though, will need some time to see how/if this could be further simplified |
// TODO: Model increment value in VPlan, by turning the recipe into a | ||
// multi-def and a subclass of VPHeaderPHIRecipe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.
Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe
, and combining VPInstruction::Mul
and VPInstruction::Add
(#82021). What are the key distinctions, advantages, or trade-offs between these two approaches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.
The VPHeaderPHIRecipe part of this todo is already done, but I'm not sure if the part about turning this into a multi-def will fix the issue.
If I'm understanding this correctly, if we had a multi-def recipe we would still be executing the increment/backedge value in the header before the rest of the loop body, and so we wouldn't be able to use the EVL in it.
Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe, and combining VPInstruction::Mul and VPInstruction::Add(#82021).
I actually initially tried to avoid the new recipes and just use VPInstructions but I struggled with trying to splat two scalar VPInstructions into a vector operand. But actually now that I realise that #82021 does it I'll give it another try!
I would prefer to avoid adding extra recipes if possible, I don't see any particular advantage to having explicit recipes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in general at this late stage there's less advantage from specialized recipes, better to use simpler recipes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in general at this late stage there's less advantage from specialized recipes, better to use simpler recipes
Appreciate the effort! Just wondering - for EVL transform, do you expect to generate vp intrinsics during the backedge recipe execution, or will a separate transformation be needed to convert the backedge recipes into WidenIntrinsicRecipes? |
I'm trying to get rid of the backedge recipe and just replace it with regular VPInstructions + VPWidenRecipes. In that case we could then rerun transformRecipestoEVLRecipes after the expansion? Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness? As a side note on RISC-V, I wonder if RISCVLOptimizer.cpp from #108640 might take care of removing the VL toggle for us in the backend as well |
Yes, VPInstructions should be good enough for correctness. Transform mul/ add to vp.mul/ vp.add is for performance.
I have the same thought, but it doesn't seem to be accepted. |
I don't think regular widening of these inductions (i.e. without EVL) will be correct. The main problem is vsetvl's 6.3.2 so that if increment is VLMAX, last two iteration may result to overincrement. |
This will allow us to use EVL as the increment value in the backedge value to deal with vsetvl's 6.3.2, instead of VF, e.g. something like %phi.next = add <vscale x 2 x i32> %phi.cur, %evl <- from @llvm.experimental.get.vector.length My thinking is that it shouldn't matter if it's a vp.add or a regular widened add since the phi value should be correct across all lanes for the next iteration? |
I see, I did misunderstand you then. Yes, than it should be correct to use regular add there. However, it does make more sense to stick with same approach: either emit vp-intrinsics always out of vectorizer or to have a minimal subset + postprocessing pass after vectorizer. Current vision is to do former. |
Agreed, that seems like a sensible way forwards. And just to echo what @Mel-Chen and @arcbbb are saying, we can do the EVL-recipe conversion as a separate incremental follow up to #115274 (which hopefully in turn can become a follow up to this PR) |
cd020fd
to
4247eaa
Compare
I've reworked this to remove the IV specific initial value and backedge recipes, and instead they're modelled in VPlan during expansion. I had to add two new recipes to model splats and step vectors in VPlan. I needed the former because although it looks like you can broadcast underlying scalar IR values, I couldn't see a way to broadcast a scalar VPValue with no underlying instruction. We could also get rid of the PHI recipe if we relax the native-path restriction on VPWidenPHIRecipe, see #118662 There's some extra diff now due to how this doesn't emit an identity add when building the initial value, I've also opened a separate PR to split this off: #119668 |
For the induction step, there's already code the materializes it (for unrolling), #119284 generalizes it to a VPInstruction opcode. Would that help? |
I think I can reuse that here, that would take care of the casting part. And I guess we would get the benefit of the constant step optimization too. I'll rebase this PR on top of it I think we would need to expand the VPWidenIntOrFpInductionRecipes first in convertToConcreteRecipes so that the WideIVSteps get a chance to be expanded afterwards? |
…VPWidenIntOrFpInductionRecipe
…uctionRecipe Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. The type of the element is specified through a sentinel value as is done in llvm#119284. This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.
This allows a different IR name for the generated phi to be used. This is split off from llvm#118638 and helps remove some of the diffs in it.
…pes, remove stray .
This allows a different IR name for the generated phi to be used. This is split off from #118638 and helps remove some of the diffs in it.
This allows a different IR name for the generated phi to be used. This is split off from llvm#118638 and helps remove some of the diffs in it.
…uctionRecipe Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.
…uctionRecipe Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.
…uctionRecipe Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.
…uctionRecipe (#129508) Split off from #118638, this adds VPInstruction::StepVector, which generates integer step vectors (0,1,2,...,VF). This is a step towards eventually modelling all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we materialize it just before unrolling so the operands stay in a fixed position. The need for a separate operand in VPWidenIntOrFpInductionRecipe, as well as the need to update it in optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638 when everything is expanded in convertToConcreteRecipes.
…ze/split-VPWidenIntOrFpInductionRecipe
* Remove temp stepvector operand from VPWidenIntOrFpInductionRecipe * Use VPInstruction::Broadcast
✅ With the latest revision this PR passed the C/C++ code formatter. |
Builder.createScalarCast(Instruction::CastOps::Trunc, VF, StepTy, DL); | ||
|
||
Inc = Builder.createNaryOp(MulOp, {Step, VF}, FMFs); | ||
Inc = Builder.createNaryOp(VPInstruction::Broadcast, Inc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reusing VPInstruction::Broadcast here to splat the vectors, hope that's ok. I also experimented with moving materializeBroadcasts
to after convertToConcreteRecipes
to see if we could avoid the need for the explicit broadcasts here, but it looks like it only handles live-ins and VPValues in the entry block at the moment.
@@ -966,6 +966,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const { | |||
case VPInstruction::BranchOnCount: | |||
case VPInstruction::BranchOnCond: | |||
case VPInstruction::ResumePhi: | |||
case VPInstruction::Broadcast: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed since we're now using VPInstruction::Broadcasts for arbitrary VPValues
switch (getOpcode()) { | ||
case Instruction::ZExt: | ||
case Instruction::Trunc: { | ||
if (isScalarCast()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed because one of the recipes expanded to is Instruction::UIToFP. Happy to split this off to an NFC if wanted.
@@ -2762,8 +2762,7 @@ LoopVectorizationCostModel::getVectorIntrinsicCost(CallInst *CI, | |||
|
|||
void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) { | |||
// Fix widened non-induction PHIs by setting up the PHI operands. | |||
if (EnableVPlanNativePath) | |||
fixNonInductionPHIs(State); | |||
fixNonInductionPHIs(State); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed because we're now using VPWidenPHI on the non-vplan-native path. Happy to split this off into an NFC as well if wanted
@@ -1874,12 +1874,13 @@ class VPWidenInductionRecipe : public VPHeaderPHIRecipe { | |||
}; | |||
|
|||
/// A recipe for handling phi nodes of integer and floating-point inductions, | |||
/// producing their vector values. | |||
/// producing their vector values. This won't execute any LLVM IR and will get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to adjust to something like the below, trying to be consistent with the existing terminology?
/// producing their vector values. This won't execute any LLVM IR and will get | |
/// producing their vector values. This is an abstract recipe and must be converted to concrete recipes before executing. |
@@ -2026,7 +2018,7 @@ class VPWidenPHIRecipe : public VPSingleDefRecipe, public VPPhiAccessors { | |||
public: | |||
/// Create a new VPWidenPHIRecipe for \p Phi with start value \p Start and | |||
/// debug location \p DL. | |||
VPWidenPHIRecipe(PHINode *Phi, VPValue *Start = nullptr, DebugLoc DL = {}, | |||
VPWidenPHIRecipe(Instruction *Phi, VPValue *Start = nullptr, DebugLoc DL = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
@@ -2390,6 +2380,129 @@ void VPlanTransforms::createInterleaveGroups( | |||
} | |||
} | |||
|
|||
/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial | |
/// Expand a VPWidenIntOrFpInduction into executable recipes, for the initial |
@@ -2390,6 +2380,129 @@ void VPlanTransforms::createInterleaveGroups( | |||
} | |||
} | |||
|
|||
/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial | |||
/// value, phi and backedge value. In the followng example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// value, phi and backedge value. In the followng example: | |
/// value, phi and backedge value. In the following example: |
/// | ||
/// vector.ph: | ||
/// ... | ||
/// vp<%induction> = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// vp<%induction> = ... | |
/// vp<%induction.start> = ... |
/// vector.ph: | ||
/// ... | ||
/// vp<%induction> = ... | ||
/// vp<%inc> = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// vp<%inc> = ... | |
/// vp<%induction.increment> = ... |
// If the phi is truncated, truncate the start and step values. | ||
VPBuilder Builder(Plan->getVectorPreheader()); | ||
if (isa<TruncInst>(IV)) { | ||
assert(Start->getUnderlyingValue()->getType()->isIntegerTy() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use VPlan type inference instead the underlying value here?
VPValue *SplatStart = Builder.createNaryOp(VPInstruction::Broadcast, Start); | ||
VPValue *SplatStep = Builder.createNaryOp(VPInstruction::Broadcast, Step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could materializeBroadcasts take care of this? I assume there's currently a phase-ordering issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah materializeBroadcasts runs before convertToConcrete recipes, but it also only works on live ins and values defined in the entry block. The increment is calculated in the loop and it needs to be broadcast too
// Move VPWidenPointerInductionRecipes to the back of the phis | ||
// since it may insert non-phi instructions in place, which will | ||
// interfere with other header phis if they come after. | ||
// | ||
// TODO: Expand out VPWidenPointerInductionRecipe into multiple | ||
// recipes here and remove this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Could we keep track of the current firstNonPhi insert pos instead and use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think firstNonPhi is enough, since any VPWidenIntOrFpInductionRecipe that gets executed after a VPWidenPointerInductionRecipe will end up generating something like:
%pointer.phi = phi ptr [ null, %vector.ph ], [ %ptr.ind, %vector.ph ]
%ptr.ind = getelementptr i8, ptr %pointer.phi, i64 16
%vector.gep = getelementptr i8, ptr %pointer.phi, <4 x i64> <i64 0, i64 4, i64 8, i64 12>
%vec.ind = phi <4 x i32> ; VPWidenIntOrFpInductionRecipe's VPWidenPHI
Alternatively, we could tweak VPWidenPHI to always insert its phi at firstNonPhi during execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like fixing up the location in VPWidenPHI actually removes some test diffs, so I've gone ahead and added that in ec5fe59
; VF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4 | ||
; VF4-NEXT: [[VEC_IND_NEXT]] = add <4 x i8> [[VEC_IND]], splat (i8 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep wide IV increment before canonical IV increment, to reduce test diff?
…hi in VPWidenPHI::execute
The motivation of this PR is to make #115274 easier to implement. It's similar to the idea in #82021 (but admittedly I didn't notice it until I had already written this!), and should allow us to add EVL support by just passing EVL to the VF operand.
The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body.
The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:
gets expanded to:
This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves.
After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).
I initially tried just splitting up VPWidenIntOrFpInductionRecipe immediately in
createWidenInductionRecipes
, but as pointed out in #114305 (comment) it turns out to be a total pain trying to detect these in other places likeremoveRedundantInductionCasts
andunrollHeaderPHIByUF
. In a way it's kind of like a pseudo-instruction at the MachineInstr level.This reuses VPWidenPHIRecipe and adds a new VPInstruction for generating a step vector.
I may be duplicating other work or missing something here, this was just the solution I stumbled upon. Open to any other ideas or approaches!