[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

lukel97 · 2024-12-04T13:51:07Z

The motivation of this PR is to make #115274 easier to implement. It's similar to the idea in #82021 (but admittedly I didn't notice it until I had already written this!), and should allow us to add EVL support by just passing EVL to the VF operand.

The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:

  Live-in ir<%5> = VF

  vector.ph:
  Successor(s): vector loop

  <x1> vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir<1>, ir<%5>
      ...
      EMIT branch-on-count vp<%index.next>, ir<%n.vec>
    No successors
  }

gets expanded to:

VPlan 'Final VPlan for VF={vscale x 1,vscale x 2},UF={1}' {
  Live-in ir<%5> = VF

  vector.ph:
    vp<%0> = WIDEN-INDUCTION-START ir<0>, ir<1>
  Successor(s): vector loop

  <x1> vector loop: {
    vector.body:
      ir<%i> = WIDEN-PHI vp<%0>, vp<%4>
      ...
      vp<%4> = WIDEN-INDUCTION-INC ir<1>, ir<%5>, ir<%i>
      EMIT branch-on-count vp<%index.next>, ir<%n.vec>
    No successors
  }

This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves.

After this #115274 should just become a matter of setting the VF operand to EVL (and building the increment step in the loop body, not the preheader).

I initially tried just splitting up VPWidenIntOrFpInductionRecipe immediately in createWidenInductionRecipes, but as pointed out in #114305 (comment) it turns out to be a total pain trying to detect these in other places like removeRedundantInductionCasts and unrollHeaderPHIByUF. In a way it's kind of like a pseudo-instruction at the MachineInstr level.

This reuses VPWidenPHIRecipe and adds a new VPInstruction for generating a step vector.

I may be duplicating other work or missing something here, this was just the solution I stumbled upon. Open to any other ideas or approaches!

llvmbot · 2024-12-04T13:51:46Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

The motivation of this PR is to make #115274 easier to implement. It's similar to the idea in #82021 (but admittedly I didn't notice it until I had already written this!), and should allow us to add EVL support by just passing EVL to the VF operand.

The current difficulty with widening IVs with EVL is that VPWidenIntOrFpInductionRecipe generates its own backedge value. Since it's a VPHeaderPHIRecipe the VF operand must be in the preheader, which means we can't use the EVL since it's defined in the loop body.

The gist in this PR is to take the approach in #114305 and expand VPWidenIntOrFpInductionRecipe into several recipes for the initial value, phi and backedge value just before execution. I.e. this example:

  Live-in ir&lt;%5&gt; = VF

  vector.ph:
  Successor(s): vector loop

  &lt;x1&gt; vector loop: {
    vector.body:
      WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir&lt;1&gt;, ir&lt;%5&gt;
      ...
      EMIT branch-on-count vp&lt;%index.next&gt;, ir&lt;%n.vec&gt;
    No successors
  }

gets expanded to:

VPlan 'Final VPlan for VF={vscale x 1,vscale x 2},UF={1}' {
  Live-in ir&lt;%5&gt; = VF

  vector.ph:
    vp&lt;%0&gt; = WIDEN-INDUCTION-START ir&lt;0&gt;, ir&lt;1&gt;
  Successor(s): vector loop

  &lt;x1&gt; vector loop: {
    vector.body:
      ir&lt;%i&gt; = WIDEN-INDUCTION-PHI vp&lt;%0&gt;, vp&lt;%4&gt;
      ...
      vp&lt;%4&gt; = WIDEN-INDUCTION-INC ir&lt;1&gt;, ir&lt;%5&gt;, ir&lt;%i&gt;
      EMIT branch-on-count vp&lt;%index.next&gt;, ir&lt;%n.vec&gt;
    No successors
  }

This allows us to a value defined in the loop in the backedge value, and also means we can just reuse the existing backedge fixups in VPlan::execute without having to specially handle it ourselves.

I initially tried just splitting up VPWidenIntOrFpInductionRecipe immediately in createWidenInductionRecipes, but as pointed out in #114305 (comment) it turns out to be a total pain trying to detect these in other places like removeRedundantInductionCasts and unrollHeaderPHIByUF. In a way it's kind of like a pseudo-instruction at the MachineInstr level.

I also tried to avoid the start and increment recipes by expanding them directly into VPInstructions/VPWidenIntrinsicRecipe/VPScalarCastRecipes, but I ran into some difficulties when trying to broadcast a scalar type to use in a widened binary op. (I'm happy to give this another try though!)

I hoped to make this an NFC, but unfortunately some of the splatted values get shuffled about in the preheaders.

I may be duplicating other work here, and this was just the solution I stumbled upon. Open to any other ideas or approaches!

Patch is 84.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118638.diff

19 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+6-16)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+151-6)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+11-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+67-43)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+79)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+18-18)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr88802.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+13-13)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-any-of-reductions.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/float-induction.ll (+35-35)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+4-4)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop-pred.ll (+18-18)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index b801d1863e252c..4e4c4dfd461824 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -1043,18 +1043,12 @@ void VPlan::execute(VPTransformState *State) {
     if (isa<VPWidenPHIRecipe>(&R))
       continue;
 
-    if (isa<VPWidenPointerInductionRecipe>(&R) ||
-        isa<VPWidenIntOrFpInductionRecipe>(&R)) {
-      PHINode *Phi = nullptr;
-      if (isa<VPWidenIntOrFpInductionRecipe>(&R)) {
-        Phi = cast<PHINode>(State->get(R.getVPSingleValue()));
-      } else {
-        auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
-        assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
-               "recipe generating only scalars should have been replaced");
-        auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
-        Phi = cast<PHINode>(GEP->getPointerOperand());
-      }
+    if (isa<VPWidenPointerInductionRecipe>(&R)) {
+      auto *WidenPhi = cast<VPWidenPointerInductionRecipe>(&R);
+      assert(!WidenPhi->onlyScalarsGenerated(State->VF.isScalable()) &&
+             "recipe generating only scalars should have been replaced");
+      auto *GEP = cast<GetElementPtrInst>(State->get(WidenPhi));
+      PHINode *Phi = cast<PHINode>(GEP->getPointerOperand());
 
       Phi->setIncomingBlock(1, VectorLatchBB);
 
@@ -1062,10 +1056,6 @@ void VPlan::execute(VPTransformState *State) {
       // consistent placement of all induction updates.
       Instruction *Inc = cast<Instruction>(Phi->getIncomingValue(1));
       Inc->moveBefore(VectorLatchBB->getTerminator()->getPrevNode());
-
-      // Use the steps for the last part as backedge value for the induction.
-      if (auto *IV = dyn_cast<VPWidenIntOrFpInductionRecipe>(&R))
-        Inc->setOperand(0, State->get(IV->getLastUnrolledPartOperand()));
       continue;
     }
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e1d828f038f9a2..513e973af8fc1a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -2089,7 +2089,9 @@ class VPHeaderPHIRecipe : public VPSingleDefRecipe {
 };
 
 /// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
+/// producing their vector values. This won't execute any LLVM IR and will get
+/// expanded later into VPWidenIntOrFpInitialRecipe, VPWidenIntOrFpPHIRecipe and
+/// VPWidenIntOrFpBackedgeRecipe.
 class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
   PHINode *IV;
   TruncInst *Trunc;
@@ -2122,9 +2124,10 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
 
   VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionSC)
 
-  /// Generate the vectorized and scalarized versions of the phi node as
-  /// needed by their users.
-  void execute(VPTransformState &State) override;
+  void execute(VPTransformState &State) override {
+    llvm_unreachable("cannot execute this recipe, should be expanded via "
+                     "expandVPWidenIntOrFpInductionRecipe");
+  }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Print the recipe.
@@ -2180,10 +2183,152 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
   }
 
   /// Returns the VPValue representing the value of this induction at
-  /// the last unrolled part, if it exists. Returns itself if unrolling did not
+  /// the last unrolled part, if it exists. Returns nullptr if unrolling did not
   /// take place.
   VPValue *getLastUnrolledPartOperand() {
-    return getNumOperands() == 5 ? getOperand(4) : this;
+    return getNumOperands() == 5 ? getOperand(4) : nullptr;
+  }
+};
+
+/// A recipe to compute the initial value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionInitialRecipe : public VPSingleDefRecipe {
+  Instruction *IV;
+  const InductionDescriptor &ID;
+
+public:
+  VPWidenIntOrFpInductionInitialRecipe(Instruction *IV, VPValue *Start,
+                                       VPValue *Step,
+                                       const InductionDescriptor &ID)
+      : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionStartSC, {Start, Step}),
+        IV(IV), ID(ID) {
+    assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+           "Expected either an induction phi-node or a truncate of it!");
+  }
+
+  ~VPWidenIntOrFpInductionInitialRecipe() override = default;
+
+  VPWidenIntOrFpInductionInitialRecipe *clone() override {
+    return new VPWidenIntOrFpInductionInitialRecipe(IV, getOperand(0),
+                                                    getOperand(1), ID);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionStartSC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  VPValue *getStartValue() { return getOperand(0); }
+  const VPValue *getStartValue() const { return getOperand(0); }
+
+  VPValue *getStepValue() { return getOperand(1); }
+  const VPValue *getStepValue() const { return getOperand(1); }
+
+  /// Returns the scalar type of the induction.
+  Type *getScalarType() const { return IV->getType(); }
+
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return true;
+  }
+};
+
+/// A recipe to generate the PHI of a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe {
+  Instruction *IV;
+
+public:
+  VPWidenIntOrFpInductionPHIRecipe(Instruction *IV, VPValue *Start)
+      : VPHeaderPHIRecipe(VPDef::VPWidenIntOrFpInductionPHISC, IV, Start),
+        IV(IV) {
+    assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+           "Expected either an induction phi-node or a truncate of it!");
+  }
+
+  ~VPWidenIntOrFpInductionPHIRecipe() override = default;
+
+  VPWidenIntOrFpInductionPHIRecipe *clone() override {
+    auto *R = new VPWidenIntOrFpInductionPHIRecipe(IV, getOperand(0));
+    R->addOperand(getBackedgeValue());
+    return R;
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionPHISC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+};
+
+/// A recipe to compute the backedge value for a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionBackedgeRecipe : public VPSingleDefRecipe {
+  Instruction *IV;
+  const InductionDescriptor &ID;
+
+public:
+  VPWidenIntOrFpInductionBackedgeRecipe(Instruction *IV, VPValue *Step,
+                                        VPValue *VF, VPValue *Prev,
+                                        VPValue *SplatVF,
+                                        const InductionDescriptor &ID)
+      : VPSingleDefRecipe(VPDef::VPWidenIntOrFpInductionSC, {Step, VF, Prev}),
+        IV(IV), ID(ID) {
+    assert((isa<PHINode>(IV) || isa<TruncInst>(IV)) &&
+           "Expected either an induction phi-node or a truncate of it!");
+    if (SplatVF)
+      addOperand(SplatVF);
+  }
+
+  ~VPWidenIntOrFpInductionBackedgeRecipe() override = default;
+
+  VPWidenIntOrFpInductionBackedgeRecipe *clone() override {
+    return new VPWidenIntOrFpInductionBackedgeRecipe(
+        IV, getOperand(0), getOperand(1), getOperand(2), getOperand(3), ID);
+  }
+
+  VP_CLASSOF_IMPL(VPDef::VPWidenIntOrFpInductionIncSC)
+
+  void execute(VPTransformState &State) override;
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  /// Print the recipe.
+  void print(raw_ostream &O, const Twine &Indent,
+             VPSlotTracker &SlotTracker) const override;
+#endif
+
+  VPValue *getStepValue() { return getOperand(0); }
+  const VPValue *getStepValue() const { return getOperand(0); }
+
+  VPValue *getVFValue() { return getOperand(1); }
+  const VPValue *getVFValue() const { return getOperand(1); }
+
+  VPValue *getPrevValue() { return getOperand(2); }
+  const VPValue *getPrevValue() const { return getOperand(2); }
+
+  VPValue *getSplatVFValue() {
+    // If the recipe has been unrolled (4 operands), return the VPValue for the
+    // induction increment.
+    return getNumOperands() == 4 ? getOperand(3) : nullptr;
+  }
+
+  /// Returns the scalar type of the induction.
+  Type *getScalarType() const { return IV->getType(); }
+
+  bool onlyFirstLaneUsed(const VPValue *Op) const override {
+    assert(is_contained(operands(), Op) &&
+           "Op must be an operand of the recipe");
+    return Op == getOperand(0) || Op == getOperand(1);
   }
 };
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 969d07b229e469..8a9e64b00850e2 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -214,14 +214,17 @@ Type *VPTypeAnalysis::inferScalarType(const VPValue *V) {
           .Case<VPActiveLaneMaskPHIRecipe, VPCanonicalIVPHIRecipe,
                 VPFirstOrderRecurrencePHIRecipe, VPReductionPHIRecipe,
                 VPWidenPointerInductionRecipe, VPEVLBasedIVPHIRecipe,
-                VPScalarPHIRecipe>([this](const auto *R) {
-            // Handle header phi recipes, except VPWidenIntOrFpInduction
-            // which needs special handling due it being possibly truncated.
-            // TODO: consider inferring/caching type of siblings, e.g.,
-            // backedge value, here and in cases below.
-            return inferScalarType(R->getStartValue());
-          })
-          .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe>(
+                VPScalarPHIRecipe, VPWidenIntOrFpInductionPHIRecipe>(
+              [this](const auto *R) {
+                // Handle header phi recipes, except VPWidenIntOrFpInduction
+                // which needs special handling due it being possibly truncated.
+                // TODO: consider inferring/caching type of siblings, e.g.,
+                // backedge value, here and in cases below.
+                return inferScalarType(R->getStartValue());
+              })
+          .Case<VPWidenIntOrFpInductionRecipe, VPDerivedIVRecipe,
+                VPWidenIntOrFpInductionInitialRecipe,
+                VPWidenIntOrFpInductionBackedgeRecipe>(
               [](const auto *R) { return R->getScalarType(); })
           .Case<VPReductionRecipe, VPPredInstPHIRecipe, VPWidenPHIRecipe,
                 VPScalarIVStepsRecipe, VPWidenGEPRecipe, VPVectorPointerRecipe,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index ef5f6e22f82206..8ae1e7382bce38 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1631,47 +1631,73 @@ static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
                            : ConstantFP::get(Ty, C);
 }
 
-void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
+void VPWidenIntOrFpInductionInitialRecipe::execute(VPTransformState &State) {
   assert(!State.Lane && "Int or FP induction being replicated.");
 
-  Value *Start = getStartValue()->getLiveInIRValue();
-  const InductionDescriptor &ID = getInductionDescriptor();
-  TruncInst *Trunc = getTruncInst();
+  Value *Start = State.get(getStartValue(), true);
   IRBuilderBase &Builder = State.Builder;
-  assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");
   assert(State.VF.isVector() && "must have vector VF");
 
-  // The value from the original loop to which we are mapping the new induction
-  // variable.
-  Instruction *EntryVal = Trunc ? cast<Instruction>(Trunc) : IV;
-
   // Fast-math-flags propagate from the original induction instruction.
   IRBuilder<>::FastMathFlagGuard FMFG(Builder);
-  if (ID.getInductionBinOp() && isa<FPMathOperator>(ID.getInductionBinOp()))
+  if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
     Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
 
   // Now do the actual transformations, and start with fetching the step value.
   Value *Step = State.get(getStepValue(), VPLane(0));
 
-  assert((isa<PHINode>(EntryVal) || isa<TruncInst>(EntryVal)) &&
-         "Expected either an induction phi-node or a truncate of it!");
-
-  // Construct the initial value of the vector IV in the vector loop preheader
-  auto CurrIP = Builder.saveIP();
-  BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
-  Builder.SetInsertPoint(VectorPH->getTerminator());
-  if (isa<TruncInst>(EntryVal)) {
-    assert(Start->getType()->isIntegerTy() &&
-           "Truncation requires an integer type");
-    auto *TruncType = cast<IntegerType>(EntryVal->getType());
-    Step = Builder.CreateTrunc(Step, TruncType);
-    Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);
-  }
-
+  // Construct the initial value of the vector IV
   Value *Zero = getSignedIntOrFpConstant(Start->getType(), 0);
   Value *SplatStart = Builder.CreateVectorSplat(State.VF, Start);
   Value *SteppedStart = getStepVector(
       SplatStart, Zero, Step, ID.getInductionOpcode(), State.VF, State.Builder);
+  State.set(this, SteppedStart);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionInitialRecipe::print(
+    raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+  O << Indent;
+  printAsOperand(O, SlotTracker);
+  O << " = WIDEN-INDUCTION-START ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionPHIRecipe::execute(VPTransformState &State) {
+  BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+
+  Value *Start = State.get(getOperand(0));
+  PHINode *Phi = State.Builder.CreatePHI(Start->getType(), 2, "vec.ind");
+  Phi->addIncoming(Start, VectorPH);
+  Phi->setDebugLoc(IV->getDebugLoc());
+  State.set(this, Phi);
+}
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionPHIRecipe::print(raw_ostream &O,
+                                             const Twine &Indent,
+                                             VPSlotTracker &SlotTracker) const {
+  O << Indent;
+  printAsOperand(O, SlotTracker);
+  O << " = WIDEN-INDUCTION-PHI ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
+void VPWidenIntOrFpInductionBackedgeRecipe::execute(VPTransformState &State) {
+  IRBuilderBase &Builder = State.Builder;
+
+  // Fast-math-flags propagate from the original induction instruction.
+  IRBuilder<>::FastMathFlagGuard FMFG(Builder);
+  if (isa_and_nonnull<FPMathOperator>(ID.getInductionBinOp()))
+    Builder.setFastMathFlags(ID.getInductionBinOp()->getFastMathFlags());
+
+  Value *Step = State.get(getStepValue(), VPLane(0));
+
+  auto CurrIP = Builder.saveIP();
+  BasicBlock *VectorPH = State.CFG.getPreheaderBBFor(this);
+  Builder.SetInsertPoint(VectorPH->getTerminator());
 
   // We create vector phi nodes for both integer and floating-point induction
   // variables. Here, we determine the kind of arithmetic we will perform.
@@ -1706,29 +1732,27 @@ void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
   }
 
   Builder.restoreIP(CurrIP);
-
-  // We may need to add the step a number of times, depending on the unroll
-  // factor. The last of those goes into the PHI.
-  PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind");
-  VecInd->insertBefore(State.CFG.PrevBB->getFirstInsertionPt());
-  VecInd->setDebugLoc(EntryVal->getDebugLoc());
-  State.set(this, VecInd);
+  Value *PrevVal = State.get(getPrevValue());
 
   Instruction *LastInduction = cast<Instruction>(
-      Builder.CreateBinOp(AddOp, VecInd, SplatVF, "vec.ind.next"));
-  if (isa<TruncInst>(EntryVal))
-    State.addMetadata(LastInduction, EntryVal);
-  LastInduction->setDebugLoc(EntryVal->getDebugLoc());
+      Builder.CreateBinOp(AddOp, PrevVal, SplatVF, "vec.ind.next"));
+  if (isa<TruncInst>(IV))
+    State.addMetadata(LastInduction, IV);
+  LastInduction->setDebugLoc(IV->getDebugLoc());
 
-  VecInd->addIncoming(SteppedStart, VectorPH);
-  // Add induction update using an incorrect block temporarily. The phi node
-  // will be fixed after VPlan execution. Note that at this point the latch
-  // block cannot be used, as it does not exist yet.
-  // TODO: Model increment value in VPlan, by turning the recipe into a
-  // multi-def and a subclass of VPHeaderPHIRecipe.
-  VecInd->addIncoming(LastInduction, VectorPH);
+  State.set(this, LastInduction);
 }
 
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+void VPWidenIntOrFpInductionBackedgeRecipe::print(
+    raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
+  O << Indent;
+  printAsOperand(O, SlotTracker);
+  O << " = WIDEN-INDUCTION-INC ";
+  printOperands(O, SlotTracker);
+}
+#endif
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenIntOrFpInductionRecipe::print(raw_ostream &O, const Twine &Indent,
                                           VPSlotTracker &SlotTracker) const {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index cee83d1015b536..9abf5f28936d53 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1820,12 +1820,91 @@ void VPlanTransforms::createInterleaveGroups(
   }
 }
 
+/// Expand a VPWidenIntOrFpInduction into separate recipes for the initial
+/// value, phi and backedge value. In the followng example:
+///
+///  vector.ph:
+///  Successor(s): vector loop
+///
+///  <x1> vector loop: {
+///    vector.body:
+///      WIDEN-INDUCTION %i = phi %bc.resume.val, %i.next, ir<1>, ir<%5>
+///      ...
+///      EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+///    No successors
+///  }
+///
+/// WIDEN-INDUCTION will get expanded to:
+///
+///  vector.ph:
+///    vp<%0> = WIDEN-INDUCTION-START ir<0>, ir<1>
+///  Successor(s): vector loop
+///
+///  <x1> vector loop: {
+///    vector.body:
+///      ir<%i> = WIDEN-INDUCTION-PHI vp<%0>, vp<%4>
+///      ...
+///      vp<%4> = WIDEN-INDUCTION-INC ir<1>, ir<%5>, ir<%i>
+///      EMIT branch-on-count vp<%index.next>, ir<%n.vec>
+///    No successors
+///  }
+static void
+expandVPWidenIntOrFpInduction(VPWidenIntOrFpInductionRecipe *WidenIVR) {
+  VPlan *Plan = WidenIVR->getParent()->getPlan();
+  PHINode *PHI = WidenIVR->getPHINode();
+  VPValue *Start = WidenIVR->getStartValue();
+  VPValue *Step = WidenIVR->getStepValue();
+  VPValue *VF = WidenIVR->getVFValue();
+  const InductionDescriptor &ID = WidenIVR->getInductionDescriptor();
+  TruncInst *Trunc = WidenIVR->getTruncInst();
+
+  // The value from the original loop to which we are mapping the new induction
+  // variable.
+  Instruction *IV = Trunc ? cast<Instruction>(Trunc) : PHI;
+
+  // If the phi is truncated, truncate the start and step values.
+  VPBuilder Builder(Plan->getVectorPreheader());
+  if (isa<TruncInst>(IV)) {
+    assert(Start->getUnderlyingValue()->getType()->isIntegerTy() &&
+           "Truncation requires an integer type");
+    auto *TruncType = cast<IntegerType>(IV->getType());
+    Step = Builder.createScalarCast(Instruction::Trunc, Step, TruncType);
+    Start = Builder.createScalarCast(Instruction::Trunc, Start, TruncType);
+  }
+
+  // Construct the initial value of the vector IV in the vector loop preheader.
+  auto *StartR = new VPWidenIntOrFpInductionInitialRecipe(IV, Start, Step, ID);
+  Plan->getVectorPreheader()->insert(StartR, Builder.getInsertPoint());
+
+  // Create the widened phi of the vector IV.
+  auto *PhiR = new VPWidenIntOrFpInductionPHIRecipe(IV, StartR);
+  PhiR->insertBefore(WidenIVR);
+
+  // Create the backedge value for the vector IV.
+  VPValue *Prev = PhiR;
+  // If unrolled, use the last unrolled part in the increment.
+  if (auto *UnrolledPart = WidenIVR->getLastUnrolledPartOperand())
+    Prev = UnrolledPart;
+  auto *IncR = new VPWidenIntOrFpInductionBackedgeRecipe(
+      IV, Step, VF, Prev, WidenIVR->getSplatVFValue(), ID);
+  VPBasicBlock *ExitingBB = Plan->getVectorLoopRegion()->getExitingBasicBlock();
+  ExitingBB->insert(IncR, ExitingBB->getTer...
[truncated]

lukel97 · 2024-12-04T13:51:52Z

cc @nikolaypanchenko

We can reuse VPWidenPHI in llvm#118638, but it requires us to allow it in the non-native path. We also need to propagate the DebugLoc and use a different name in the generated PHI, so this splits these parts off in case we want it. We lose some debug info in dbg-outer-loop-vect.ll, but I think this is because the underlying phi node didn't have a DebugLoc to begin with. I think the current version is just carrying over the DebugLoc from the previous state.

lukel97 · 2024-12-04T16:17:06Z

llvm/lib/Transforms/Vectorize/VPlan.h

+
+/// A recipe to generate the PHI of a widened IV, expanded from
+/// VPWidenIntOrFpInductionRecipe.
+class VPWidenIntOrFpInductionPHIRecipe : public VPHeaderPHIRecipe {


We could reuse VPWidenPHI for this like is done in #115274 but we need to enable it on the non-native vplan path and pass through the debugloc, so I've opened up a separate PR for that in #118662

llvm/lib/Transforms/Vectorize/VPlan.cpp

fhahn · 2024-12-05T19:58:23Z

Overall direction aligns well with current and future VPlan roadmap. As is the current version adds a bunch of complexity though, will need some time to see how/if this could be further simplified

Mel-Chen · 2024-12-09T11:32:24Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

-  // TODO: Model increment value in VPlan, by turning the recipe into a
-  // multi-def and a subclass of VPHeaderPHIRecipe.


Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.

Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe, and combining VPInstruction::Mul and VPInstruction::Add(#82021). What are the key distinctions, advantages, or trade-offs between these two approaches?

Could we first have a separate patch focused specifically on completing this TODO? If I understand correctly, this seems to be the primary issue at hand.

The VPHeaderPHIRecipe part of this todo is already done, but I'm not sure if the part about turning this into a multi-def will fix the issue.

If I'm understanding this correctly, if we had a multi-def recipe we would still be executing the increment/backedge value in the header before the rest of the loop body, and so we wouldn't be able to use the EVL in it.

Additionally, I’d like to understand the difference between introducing a new recipe, VPWidenIntOrFpInductionBackedgeRecipe, and combining VPInstruction::Mul and VPInstruction::Add(#82021).

I actually initially tried to avoid the new recipes and just use VPInstructions but I struggled with trying to splat two scalar VPInstructions into a vector operand. But actually now that I realise that #82021 does it I'll give it another try!

I would prefer to avoid adding extra recipes if possible, I don't see any particular advantage to having explicit recipes.

Yes, in general at this late stage there's less advantage from specialized recipes, better to use simpler recipes

arcbbb · 2024-12-10T09:54:40Z

Appreciate the effort! Just wondering - for EVL transform, do you expect to generate vp intrinsics during the backedge recipe execution, or will a separate transformation be needed to convert the backedge recipes into WidenIntrinsicRecipes?

lukel97 · 2024-12-10T10:57:03Z

Appreciate the effort! Just wondering - for EVL transform, do you expect to generate vp intrinsics during the backedge recipe execution, or will a separate transformation be needed to convert the backedge recipes into WidenIntrinsicRecipes?

I'm trying to get rid of the backedge recipe and just replace it with regular VPInstructions + VPWidenRecipes. In that case we could then rerun transformRecipestoEVLRecipes after the expansion?

Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness?

As a side note on RISC-V, I wonder if RISCVLOptimizer.cpp from #108640 might take care of removing the VL toggle for us in the backend as well

Mel-Chen · 2024-12-10T11:10:02Z

I'm trying to get rid of the backedge recipe and just replace it with regular VPInstructions + VPWidenRecipes. In that case we could then rerun transformRecipestoEVLRecipes after the expansion?

Yes, VPInstructions should be good enough for correctness. Transform mul/ add to vp.mul/ vp.add is for performance.

As a side note on RISC-V, I wonder if RISCVLOptimizer.cpp from #108640 might take care of removing the VL toggle for us in the backend as well

I have the same thought, but it doesn't seem to be accepted.
#109614 (comment)

npanchen · 2024-12-10T14:54:13Z

Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness?

I don't think regular widening of these inductions (i.e. without EVL) will be correct. The main problem is vsetvl's 6.3.2 so that if increment is VLMAX, last two iteration may result to overincrement.
fyi: QEMU has special rvv_vl_half_avl option to properly test 6.3.2

lukel97 · 2024-12-10T15:30:36Z

Although I just want to double check, I was thinking that doing the regular widening for now would still be correct. Can we convert it to VP intrinsics as a separate task after the EVL transform or is it needed for correctness?

I don't think regular widening of these inductions (i.e. without EVL) will be correct. The main problem is vsetvl's 6.3.2 so that if increment is VLMAX, last two iteration may result to overincrement. fyi: QEMU has special rvv_vl_half_avl option to properly test 6.3.2

This will allow us to use EVL as the increment value in the backedge value to deal with vsetvl's 6.3.2, instead of VF, e.g. something like

%phi.next = add <vscale x 2 x i32> %phi.cur, %evl <- from @llvm.experimental.get.vector.length

My thinking is that it shouldn't matter if it's a vp.add or a regular widened add since the phi value should be correct across all lanes for the next iteration?

npanchen · 2024-12-10T16:38:16Z

This will allow us to use EVL as the increment value in the backedge value to deal with vsetvl's 6.3.2, instead of VF, e.g. something like
My thinking is that it shouldn't matter if it's a vp.add or a regular widened add since the phi value should be correct across all lanes for the next iteration?

I see, I did misunderstand you then. Yes, than it should be correct to use regular add there. However, it does make more sense to stick with same approach: either emit vp-intrinsics always out of vectorizer or to have a minimal subset + postprocessing pass after vectorizer. Current vision is to do former.
By the way, unary and binary evl-recipes are already supported #93854, so it should be fine to use that, right ?

lukel97 · 2024-12-10T16:41:37Z

However, it does make more sense to stick with same approach: either emit vp-intrinsics always out of vectorizer or to have a minimal subset + postprocessing pass after vectorizer. Current vision is to do former. By the way, unary and binary evl-recipes are already supported #93854, so it should be fine to use that, right ?

Agreed, that seems like a sensible way forwards. And just to echo what @Mel-Chen and @arcbbb are saying, we can do the EVL-recipe conversion as a separate incremental follow up to #115274 (which hopefully in turn can become a follow up to this PR)

lukel97 · 2024-12-12T17:19:30Z

I've reworked this to remove the IV specific initial value and backedge recipes, and instead they're modelled in VPlan during expansion.

I had to add two new recipes to model splats and step vectors in VPlan. I needed the former because although it looks like you can broadcast underlying scalar IR values, I couldn't see a way to broadcast a scalar VPValue with no underlying instruction.

We could also get rid of the PHI recipe if we relax the native-path restriction on VPWidenPHIRecipe, see #118662

There's some extra diff now due to how this doesn't emit an identity add when building the initial value, I've also opened a separate PR to split this off: #119668

fhahn · 2024-12-12T17:23:43Z

I've reworked this to remove the IV specific initial value and backedge recipes, and instead they're modelled in VPlan during expansion.

I had to add two new recipes to model splats and step vectors in VPlan. I needed the former because although it looks like you can broadcast underlying scalar IR values, I couldn't see a way to broadcast a scalar VPValue with no underlying instruction.

For the induction step, there's already code the materializes it (for unrolling), #119284 generalizes it to a VPInstruction opcode. Would that help?

lukel97 · 2024-12-12T17:36:07Z

For the induction step, there's already code the materializes it (for unrolling), #119284 generalizes it to a VPInstruction opcode. Would that help?

I think I can reuse that here, that would take care of the casting part. And I guess we would get the benefit of the constant step optimization too. I'll rebase this PR on top of it

I think we would need to expand the VPWidenIntOrFpInductionRecipes first in convertToConcreteRecipes so that the WideIVSteps get a chance to be expanded afterwards?

…VPWidenIntOrFpInductionRecipe

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

…uctionRecipe Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. The type of the element is specified through a sentinel value as is done in llvm#119284. This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.

This allows a different IR name for the generated phi to be used. This is split off from llvm#118638 and helps remove some of the diffs in it.

…pes, remove stray .

This allows a different IR name for the generated phi to be used. This is split off from #118638 and helps remove some of the diffs in it.

This allows a different IR name for the generated phi to be used. This is split off from llvm#118638 and helps remove some of the diffs in it.

…uctionRecipe Split off from llvm#118638, this adds a new VPInstruction for integer step vectors (0,1,2,...), so that we can eventually model all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we add it just before execution in convertToConcreteRecipes. We need a dummy placeholder operand so we have somewhere to pass it, but this should go away when #llvm#118638 lands.

…uctionRecipe (#129508) Split off from #118638, this adds VPInstruction::StepVector, which generates integer step vectors (0,1,2,...,VF). This is a step towards eventually modelling all the separate parts of VPWidenIntOrFpInductionRecipe in VPlan. This is then used by VPWidenIntOrFpInductionRecipe, where we materialize it just before unrolling so the operands stay in a fixed position. The need for a separate operand in VPWidenIntOrFpInductionRecipe, as well as the need to update it in optimizeVectorInductionWidthForTCAndVFUF, should be removed with #118638 when everything is expanded in convertToConcreteRecipes.

…ze/split-VPWidenIntOrFpInductionRecipe

* Remove temp stepvector operand from VPWidenIntOrFpInductionRecipe * Use VPInstruction::Broadcast

github-actions · 2025-05-08T16:16:46Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lukel97 · 2025-05-08T16:21:13Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+          Builder.createScalarCast(Instruction::CastOps::Trunc, VF, StepTy, DL);
+
+    Inc = Builder.createNaryOp(MulOp, {Step, VF}, FMFs);
+    Inc = Builder.createNaryOp(VPInstruction::Broadcast, Inc);


I'm reusing VPInstruction::Broadcast here to splat the vectors, hope that's ok. I also experimented with moving materializeBroadcasts to after convertToConcreteRecipes to see if we could avoid the need for the explicit broadcasts here, but it looks like it only handles live-ins and VPValues in the entry block at the moment.

lukel97 · 2025-05-08T16:23:10Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

@@ -966,6 +966,7 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
  case VPInstruction::BranchOnCount:
  case VPInstruction::BranchOnCond:
  case VPInstruction::ResumePhi:
+  case VPInstruction::Broadcast:


This was needed since we're now using VPInstruction::Broadcasts for arbitrary VPValues

lukel97 · 2025-05-08T16:24:23Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

-  switch (getOpcode()) {
-  case Instruction::ZExt:
-  case Instruction::Trunc: {
+  if (isScalarCast()) {


Needed because one of the recipes expanded to is Instruction::UIToFP. Happy to split this off to an NFC if wanted.

lukel97 · 2025-05-08T16:25:30Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -2762,8 +2762,7 @@ LoopVectorizationCostModel::getVectorIntrinsicCost(CallInst *CI,

 void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State) {
  // Fix widened non-induction PHIs by setting up the PHI operands.
-  if (EnableVPlanNativePath)
-    fixNonInductionPHIs(State);
+  fixNonInductionPHIs(State);


Needed because we're now using VPWidenPHI on the non-vplan-native path. Happy to split this off into an NFC as well if wanted

fhahn · 2025-05-12T15:19:02Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1874,12 +1874,13 @@ class VPWidenInductionRecipe : public VPHeaderPHIRecipe {
 };

 /// A recipe for handling phi nodes of integer and floating-point inductions,
-/// producing their vector values.
+/// producing their vector values. This won't execute any LLVM IR and will get


Might be good to adjust to something like the below, trying to be consistent with the existing terminology?

Suggested change

/// producing their vector values. This won't execute any LLVM IR and will get

/// producing their vector values. This is an abstract recipe and must be converted to concrete recipes before executing.

fhahn · 2025-05-12T15:19:33Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -2026,7 +2018,7 @@ class VPWidenPHIRecipe : public VPSingleDefRecipe, public VPPhiAccessors {
 public:
  /// Create a new VPWidenPHIRecipe for \p Phi with start value \p Start and
  /// debug location \p DL.
-  VPWidenPHIRecipe(PHINode *Phi, VPValue *Start = nullptr, DebugLoc DL = {},
+  VPWidenPHIRecipe(Instruction *Phi, VPValue *Start = nullptr, DebugLoc DL = {},


Why is this needed?

fhahn · 2025-05-12T15:20:27Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -2390,6 +2380,129 @@ void VPlanTransforms::createInterleaveGroups(
  }
 }

+/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial


Suggested change

/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial

/// Expand a VPWidenIntOrFpInduction into executable recipes, for the initial

fhahn · 2025-05-12T15:20:41Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -2390,6 +2380,129 @@ void VPlanTransforms::createInterleaveGroups(
  }
 }

+/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial
+/// value, phi and backedge value. In the followng example:


Suggested change

/// value, phi and backedge value. In the followng example:

/// value, phi and backedge value. In the following example:

fhahn · 2025-05-12T15:21:05Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+///
+///  vector.ph:
+///    ...
+///    vp<%induction> = ...


Suggested change

/// vp<%induction> = ...

/// vp<%induction.start> = ...

fhahn · 2025-05-12T15:21:16Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+///  vector.ph:
+///    ...
+///    vp<%induction> = ...
+///    vp<%inc> = ...


Suggested change

/// vp<%inc> = ...

/// vp<%induction.increment> = ...

fhahn · 2025-05-12T15:22:26Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  // If the phi is truncated, truncate the start and step values.
+  VPBuilder Builder(Plan->getVectorPreheader());
+  if (isa<TruncInst>(IV)) {
+    assert(Start->getUnderlyingValue()->getType()->isIntegerTy() &&


can we use VPlan type inference instead the underlying value here?

fhahn · 2025-05-12T15:23:30Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+  VPValue *SplatStart = Builder.createNaryOp(VPInstruction::Broadcast, Start);
+  VPValue *SplatStep = Builder.createNaryOp(VPInstruction::Broadcast, Step);


Could materializeBroadcasts take care of this? I assume there's currently a phase-ordering issue?

Yeah materializeBroadcasts runs before convertToConcrete recipes, but it also only works on live ins and values defined in the entry block. The increment is calculated in the loop and it needs to be broadcast too

fhahn · 2025-05-12T15:24:52Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

+    // Move VPWidenPointerInductionRecipes to the back of the phis
+    // since it may insert non-phi instructions in place, which will
+    // interfere with other header phis if they come after.
+    //
+    // TODO: Expand out VPWidenPointerInductionRecipe into multiple
+    // recipes here and remove this


Is this needed? Could we keep track of the current firstNonPhi insert pos instead and use this?

I don't think firstNonPhi is enough, since any VPWidenIntOrFpInductionRecipe that gets executed after a VPWidenPointerInductionRecipe will end up generating something like:

%pointer.phi = phi ptr [ null, %vector.ph ], [ %ptr.ind, %vector.ph ] %ptr.ind = getelementptr i8, ptr %pointer.phi, i64 16 %vector.gep = getelementptr i8, ptr %pointer.phi, <4 x i64> <i64 0, i64 4, i64 8, i64 12> %vec.ind = phi <4 x i32> ; VPWidenIntOrFpInductionRecipe's VPWidenPHI

Alternatively, we could tweak VPWidenPHI to always insert its phi at firstNonPhi during execution?

It looks like fixing up the location in VPWidenPHI actually removes some test diffs, so I've gone ahead and added that in ec5fe59

fhahn · 2025-05-12T15:25:44Z

...orms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-remove-loop-region.ll

 ; VF4-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; VF4-NEXT:    [[VEC_IND_NEXT]] = add <4 x i8> [[VEC_IND]], splat (i8 4)


can we keep wide IV increment before canonical IV increment, to reduce test diff?

…hi in VPWidenPHI::execute

lukel97 requested review from fhahn, arcbbb, npanchen, alexey-bataev, Mel-Chen and ayalz December 4, 2024 13:51

llvmbot added vectorizers llvm:transforms labels Dec 4, 2024

lukel97 requested a review from preames December 4, 2024 13:55

lukel97 mentioned this pull request Dec 4, 2024

[VPlan] Allow VPWidenPHI in non-native path. NFC #118662

Closed

lukel97 commented Dec 4, 2024

View reviewed changes

lukel97 changed the title ~~[LV] Expand VPWidenIntOrFpInductionRecipe into separate recipes~~ [VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes Dec 4, 2024

alexey-bataev reviewed Dec 5, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/VPlan.cpp Outdated Show resolved Hide resolved

Mel-Chen reviewed Dec 9, 2024

View reviewed changes

lukel97 force-pushed the loop-vectorize/split-VPWidenIntOrFpInductionRecipe branch from cd020fd to 4247eaa Compare December 12, 2024 17:07

lukel97 mentioned this pull request Dec 13, 2024

[VPlan] Add opcode to create step for wide inductions. #119284

Merged

Merge remote-tracking branch 'origin/main' into loop-vectorize/split-…

5457030

…VPWidenIntOrFpInductionRecipe

fhahn reviewed Mar 3, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h Outdated Show resolved Hide resolved

Mel-Chen reviewed Mar 3, 2025

View reviewed changes

lukel97 mentioned this pull request Mar 3, 2025

[VPlan] Add VPInstruction::StepVector and use it in VPWidenIntOrFpInductionRecipe #129508

Merged

lukel97 mentioned this pull request Mar 3, 2025

[VPlan] Add Name argument to VPWidenPHIRecipe. NFC #129527

Merged

Update vplan test to include preheader, remove references to old reci…

68b6b66

…pes, remove stray .

lukel97 mentioned this pull request Mar 4, 2025

[VPlan] Move FOR splice cost into VPInstruction::FirstOrderRecurrenceSplice #129645

Merged

lukel97 added a commit that referenced this pull request Mar 4, 2025

[VPlan] Add Name argument to VPWidenPHIRecipe. NFC (#129527)

47fb9c4

This allows a different IR name for the generated phi to be used. This is split off from #118638 and helps remove some of the diffs in it.

arcbbb mentioned this pull request Mar 21, 2025

[LV, VPlan] Check if plan is compatible to EVL transform #92092

Merged

lukel97 mentioned this pull request Mar 27, 2025

[VPlan] Introduce VPInstructionWithType, use instead of VPScalarCast(NFC) #129706

Merged

lukel97 added 2 commits May 8, 2025 19:21

Merge branch 'main' of github.com:llvm/llvm-project into loop-vectori…

8257a57

…ze/split-VPWidenIntOrFpInductionRecipe

Update now that llvm#129508 is landed

0f8b4f6

* Remove temp stepvector operand from VPWidenIntOrFpInductionRecipe * Use VPInstruction::Broadcast

clang-format

7fc4859

lukel97 commented May 8, 2025

View reviewed changes

fhahn reviewed May 13, 2025

View reviewed changes

lukel97 added 2 commits May 14, 2025 15:22

Fix typo, use TypeInfo in assert

61bd641

Don't move VPWidenPointerInductionRecipe, instead fixup location of p…

ec5fe59

…hi in VPWidenPHI::execute

		// TODO: Model increment value in VPlan, by turning the recipe into a
		// multi-def and a subclass of VPHeaderPHIRecipe.

	/// producing their vector values. This won't execute any LLVM IR and will get
	/// producing their vector values. This is an abstract recipe and must be converted to concrete recipes before executing.

	/// Expand a VPWidenIntOrFpInduction into executable recipes. for the initial
	/// Expand a VPWidenIntOrFpInduction into executable recipes, for the initial

	/// value, phi and backedge value. In the followng example:
	/// value, phi and backedge value. In the following example:

		VPValue *SplatStart = Builder.createNaryOp(VPInstruction::Broadcast, Start);
		VPValue *SplatStep = Builder.createNaryOp(VPInstruction::Broadcast, Step);

		; VF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
		; VF4-NEXT: [[VEC_IND_NEXT]] = add <4 x i8> [[VEC_IND]], splat (i8 4)

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

Are you sure you want to change the base?

[VPlan] Expand VPWidenIntOrFpInductionRecipe into separate recipes #118638

Conversation

lukel97 commented Dec 4, 2024 • edited Loading

llvmbot commented Dec 4, 2024 • edited Loading

lukel97 commented Dec 4, 2024

Choose a reason for hiding this comment

fhahn commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arcbbb commented Dec 10, 2024

lukel97 commented Dec 10, 2024

Mel-Chen commented Dec 10, 2024

npanchen commented Dec 10, 2024

lukel97 commented Dec 10, 2024 • edited Loading

npanchen commented Dec 10, 2024

lukel97 commented Dec 10, 2024

lukel97 commented Dec 12, 2024

fhahn commented Dec 12, 2024

lukel97 commented Dec 12, 2024

github-actions bot commented May 8, 2025 • edited Loading

lukel97 May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukel97 May 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukel97 commented Dec 4, 2024 •

edited

Loading

llvmbot commented Dec 4, 2024 •

edited

Loading

lukel97 commented Dec 10, 2024 •

edited

Loading

github-actions bot commented May 8, 2025 •

edited

Loading

lukel97 May 8, 2025 •

edited

Loading

lukel97 May 14, 2025 •

edited

Loading