[LoopPeel] Implement initial peeling off the last loop iteration. #139551

fhahn · 2025-05-12T13:58:18Z

Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare.

At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations.

Both will be relaxed in follow-ups.

llvmbot · 2025-05-12T13:58:52Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare.

At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations.

Both will be relaxed in follow-ups.

Patch is 34.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139551.diff

6 Files Affected:

(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+3)
(modified) llvm/include/llvm/Transforms/Utils/LoopPeel.h (+7-4)
(modified) llvm/lib/Transforms/Scalar/LoopFuse.cpp (+2-1)
(modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+2-1)
(modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+254-110)
(modified) llvm/test/Transforms/LoopUnroll/peel-last-iteration.ll (+97-28)

diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3f639138d8b75..1aed98e8f50db 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -680,6 +680,9 @@ class TargetTransformInfo {
     /// If the value is true the peeling cost model can decide to peel only
     /// some iterations and in this case it will set this to false.
     bool PeelProfiledIterations;
+
+    /// Peel off the last PeelCount loop iterations.
+    bool PeelLast;
   };
 
   /// Get target-customized preferences for the generic loop peeling
diff --git a/llvm/include/llvm/Transforms/Utils/LoopPeel.h b/llvm/include/llvm/Transforms/Utils/LoopPeel.h
index 0b78700ca71bb..f7babaf036768 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopPeel.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopPeel.h
@@ -22,10 +22,13 @@ namespace llvm {
 bool canPeel(const Loop *L);
 
 /// VMap is the value-map that maps instructions from the original loop to
-/// instructions in the last peeled-off iteration.
-bool peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI, ScalarEvolution *SE,
-              DominatorTree &DT, AssumptionCache *AC, bool PreserveLCSSA,
-              ValueToValueMapTy &VMap);
+/// instructions in the last peeled-off iteration. If \p PeelLast is true, peel
+/// off the last \p PeelCount iterations from \p L. In that case, the caller has
+/// to make sure that the exit condition can be adjusted when peeling and that
+/// the loop executes at least 2 iterations.
+bool peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
+              ScalarEvolution *SE, DominatorTree &DT, AssumptionCache *AC,
+              bool PreserveLCSSA, ValueToValueMapTy &VMap);
 
 TargetTransformInfo::PeelingPreferences
 gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,
diff --git a/llvm/lib/Transforms/Scalar/LoopFuse.cpp b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
index 5bba3016ba4a1..d6bd92d520e28 100644
--- a/llvm/lib/Transforms/Scalar/LoopFuse.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
@@ -790,7 +790,8 @@ struct LoopFuser {
                       << " iterations of the first loop. \n");
 
     ValueToValueMapTy VMap;
-    FC0.Peeled = peelLoop(FC0.L, PeelCount, &LI, &SE, DT, &AC, true, VMap);
+    FC0.Peeled =
+        peelLoop(FC0.L, PeelCount, false, &LI, &SE, DT, &AC, true, VMap);
     if (FC0.Peeled) {
       LLVM_DEBUG(dbgs() << "Done Peeling\n");
 
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..0b9fee5727c6f 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1314,7 +1314,8 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
     });
 
     ValueToValueMapTy VMap;
-    if (peelLoop(L, PP.PeelCount, LI, &SE, DT, &AC, PreserveLCSSA, VMap)) {
+    if (peelLoop(L, PP.PeelCount, PP.PeelLast, LI, &SE, DT, &AC, PreserveLCSSA,
+                 VMap)) {
       simplifyLoopAfterUnroll(L, true, LI, &SE, &DT, &AC, &TTI, nullptr);
       // If the loop was peeled, we already "used up" the profile information
       // we had, so we don't want to unroll or peel again.
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index f6ace9c4e5d2f..f01c3948b87ed 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -49,6 +49,7 @@ using namespace llvm::PatternMatch;
 #define DEBUG_TYPE "loop-peel"
 
 STATISTIC(NumPeeled, "Number of loops peeled");
+STATISTIC(NumPeeledEnd, "Number of loops peeled from end");
 
 static cl::opt<unsigned> UnrollPeelCount(
     "unroll-peel-count", cl::Hidden,
@@ -325,19 +326,66 @@ static unsigned peelToTurnInvariantLoadsDerefencebale(Loop &L,
   return 0;
 }
 
+/// Returns true if the last iteration can be peeled off and the condition (Pred
+/// LeftAR, RightSCEV) is known at the last iteration and the inverse condition
+/// is known at the second-to-last. This function also has to make sure the loop
+/// exit condition can be adjusted when peeling and that the loop executes at
+/// least 2 iterations.
+static bool canPeelLastIteration(Loop &L, const SCEVAddRecExpr *LeftAR,
+                                 const SCEV *RightSCEV, ScalarEvolution &SE) {
+  const SCEV *BTC = SE.getBackedgeTakenCount(&L);
+  Value *Inc;
+  CmpPredicate Pred;
+  BasicBlock *Succ1;
+  BasicBlock *Succ2;
+  // The loop must execute at least 2 iterations to guarantee that peeled
+  // iteration executes.
+  // TODO: Add checks during codegen.
+  if (isa<SCEVCouldNotCompute>(BTC) ||
+      !SE.isKnownPredicate(CmpInst::ICMP_UGT, BTC, SE.getOne(BTC->getType())))
+    return false;
+
+  // Check if the exit condition of the loop can be adjusted by the peeling
+  // codegen. For now, it must
+  // * exit via the latch,
+  // * the exit condition must be a NE/EQ compare of an induction with step
+  // of 1.
+  BasicBlock *Latch = L.getLoopLatch();
+  if (Latch != L.getExitingBlock() ||
+      !match(Latch->getTerminator(),
+             m_Br(m_ICmp(Pred, m_Value(Inc), m_Value()), m_BasicBlock(Succ1),
+                  m_BasicBlock(Succ2))) ||
+      !((Pred == CmpInst::ICMP_EQ && Succ2 == L.getHeader()) ||
+        (Pred == CmpInst::ICMP_NE && Succ1 == L.getHeader())) ||
+      !isa<SCEVAddRecExpr>(SE.getSCEV(Inc)) ||
+      !cast<SCEVAddRecExpr>(SE.getSCEV(Inc))->getStepRecurrence(SE)->isOne())
+    return false;
+
+  const SCEV *ValAtLastIter =
+      SE.applyLoopGuards(LeftAR->evaluateAtIteration(BTC, SE), &L);
+  const SCEV *ValAtSecondToLastIter = LeftAR->evaluateAtIteration(
+      SE.getMinusSCEV(BTC, SE.getOne(BTC->getType())), SE);
+
+  return SE.isKnownPredicate(ICmpInst::getInversePredicate(Pred), ValAtLastIter,
+                             SE.applyLoopGuards(RightSCEV, &L)) &&
+         SE.isKnownPredicate(Pred, ValAtSecondToLastIter, RightSCEV);
+}
+
 // Return the number of iterations to peel off that make conditions in the
-// body true/false. For example, if we peel 2 iterations off the loop below,
-// the condition i < 2 can be evaluated at compile time.
+// body true/false. Positive return values indicate the iterations to peel of
+// from the front and negative return values indicate the number of iterations
+// from the back after removing the sign. For example, if we peel 2 iterations
+// off the loop below, the condition i < 2 can be evaluated at compile time.
 //  for (i = 0; i < n; i++)
 //    if (i < 2)
 //      ..
 //    else
 //      ..
 //   }
-static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
-                                         ScalarEvolution &SE) {
+static int countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
+                                    ScalarEvolution &SE) {
   assert(L.isLoopSimplifyForm() && "Loop needs to be in loop simplify form");
-  unsigned DesiredPeelCount = 0;
+  int DesiredPeelCount = 0;
 
   // Do not peel the entire loop.
   const SCEV *BE = SE.getConstantMaxBackedgeTakenCount(&L);
@@ -349,9 +397,9 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
   // return true if inversed condition become known before reaching the
   // MaxPeelCount limit.
   auto PeelWhilePredicateIsKnown =
-      [&](unsigned &PeelCount, const SCEV *&IterVal, const SCEV *BoundSCEV,
+      [&](int &PeelCount, const SCEV *&IterVal, const SCEV *BoundSCEV,
           const SCEV *Step, ICmpInst::Predicate Pred) {
-        while (PeelCount < MaxPeelCount &&
+        while (unsigned(std::abs(PeelCount)) < MaxPeelCount &&
                SE.isKnownPredicate(Pred, IterVal, BoundSCEV)) {
           IterVal = SE.getAddExpr(IterVal, Step);
           ++PeelCount;
@@ -408,7 +456,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
 
     // Check if extending the current DesiredPeelCount lets us evaluate Pred
     // or !Pred in the loop body statically.
-    unsigned NewPeelCount = DesiredPeelCount;
+    int NewPeelCount = DesiredPeelCount;
 
     const SCEV *IterVal = LeftAR->evaluateAtIteration(
         SE.getConstant(LeftSCEV->getType(), NewPeelCount), SE);
@@ -421,8 +469,11 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
 
     const SCEV *Step = LeftAR->getStepRecurrence(SE);
     if (!PeelWhilePredicateIsKnown(NewPeelCount, IterVal, RightSCEV, Step,
-                                   Pred))
+                                   Pred)) {
+      if (canPeelLastIteration(L, LeftAR, RightSCEV, SE))
+        DesiredPeelCount = -1;
       return;
+    }
 
     // However, for equality comparisons, that isn't always sufficient to
     // eliminate the comparsion in loop body, we may need to peel one more
@@ -433,7 +484,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
                              RightSCEV) &&
         !SE.isKnownPredicate(Pred, IterVal, RightSCEV) &&
         SE.isKnownPredicate(Pred, NextIterVal, RightSCEV)) {
-      if (NewPeelCount >= MaxPeelCount)
+      if (unsigned(std::abs(NewPeelCount)) >= MaxPeelCount)
         return; // Need to peel one more iteration, but can't. Give up.
       ++NewPeelCount; // Great!
     }
@@ -472,7 +523,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
     // Check that AddRec is not wrapping.
     if (!(IsSigned ? AddRec->hasNoSignedWrap() : AddRec->hasNoUnsignedWrap()))
       return;
-    unsigned NewPeelCount = DesiredPeelCount;
+    int NewPeelCount = DesiredPeelCount;
     const SCEV *IterVal = AddRec->evaluateAtIteration(
         SE.getConstant(AddRec->getType(), NewPeelCount), SE);
     if (!PeelWhilePredicateIsKnown(NewPeelCount, IterVal, BoundSCEV, Step,
@@ -593,8 +644,9 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
       DesiredPeelCount = std::max(DesiredPeelCount, *NumPeels);
   }
 
-  DesiredPeelCount = std::max(DesiredPeelCount,
-                              countToEliminateCompares(*L, MaxPeelCount, SE));
+  int CountToEliminateCmps = countToEliminateCompares(*L, MaxPeelCount, SE);
+  DesiredPeelCount =
+      std::max(DesiredPeelCount, unsigned(std::abs(CountToEliminateCmps)));
 
   if (DesiredPeelCount == 0)
     DesiredPeelCount = peelToTurnInvariantLoadsDerefencebale(*L, DT, AC);
@@ -609,6 +661,9 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
                         << " some Phis into invariants.\n");
       PP.PeelCount = DesiredPeelCount;
       PP.PeelProfiledIterations = false;
+      PP.PeelLast =
+          DesiredPeelCount == unsigned(std::abs(CountToEliminateCmps)) &&
+          CountToEliminateCmps < 0;
       return;
     }
   }
@@ -733,6 +788,7 @@ static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
 /// InsertBot.
 /// \param IterNumber The serial number of the iteration currently being
 /// peeled off.
+/// \param PeelLast Peel off the last iterations from \p L.
 /// \param ExitEdges The exit edges of the original loop.
 /// \param[out] NewBlocks A list of the blocks in the newly created clone
 /// \param[out] VMap The value map between the loop and the new clone.
@@ -740,7 +796,8 @@ static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
 /// \param LVMap A value-map that maps instructions from the original loop to
 /// instructions in the last peeled-off iteration.
 static void cloneLoopBlocks(
-    Loop *L, unsigned IterNumber, BasicBlock *InsertTop, BasicBlock *InsertBot,
+    Loop *L, unsigned IterNumber, bool PeelLast, BasicBlock *InsertTop,
+    BasicBlock *InsertBot,
     SmallVectorImpl<std::pair<BasicBlock *, BasicBlock *>> &ExitEdges,
     SmallVectorImpl<BasicBlock *> &NewBlocks, LoopBlocksDFS &LoopBlocks,
     ValueToValueMapTy &VMap, ValueToValueMapTy &LVMap, DominatorTree *DT,
@@ -804,16 +861,26 @@ static void cloneLoopBlocks(
 
   // Similarly, for the latch:
   // The original exiting edge is still hooked up to the loop exit.
-  // The backedge now goes to the "bottom", which is either the loop's real
-  // header (for the last peeled iteration) or the copied header of the next
-  // iteration (for every other iteration)
   BasicBlock *NewLatch = cast<BasicBlock>(VMap[Latch]);
-  auto *LatchTerm = cast<Instruction>(NewLatch->getTerminator());
-  for (unsigned idx = 0, e = LatchTerm->getNumSuccessors(); idx < e; ++idx)
-    if (LatchTerm->getSuccessor(idx) == Header) {
-      LatchTerm->setSuccessor(idx, InsertBot);
-      break;
+  if (PeelLast) {
+    // This is the last iteration and we definitely will go to the exit. Just
+    // set both successors to InsertBot and let the branch be simplified later.
+    assert(IterNumber == 0 && "Only peeling a single iteration implemented.");
+    auto *LatchTerm = cast<BranchInst>(NewLatch->getTerminator());
+    LatchTerm->setSuccessor(0, InsertBot);
+    LatchTerm->setSuccessor(1, InsertBot);
+  } else {
+    auto *LatchTerm = cast<Instruction>(NewLatch->getTerminator());
+    // The backedge now goes to the "bottom", which is either the loop's real
+    // header (for the last peeled iteration) or the copied header of the next
+    // iteration (for every other iteration)
+    for (unsigned idx = 0, e = LatchTerm->getNumSuccessors(); idx < e; ++idx) {
+      if (LatchTerm->getSuccessor(idx) == Header) {
+        LatchTerm->setSuccessor(idx, InsertBot);
+        break;
+      }
     }
+  }
   if (DT)
     DT->changeImmediateDominator(InsertBot, NewLatch);
 
@@ -821,23 +888,33 @@ static void cloneLoopBlocks(
   // that pick an incoming value from either the preheader, or the previous
   // loop iteration. Since this copy is no longer part of the loop, we
   // resolve this statically:
-  // For the first iteration, we use the value from the preheader directly.
-  // For any other iteration, we replace the phi with the value generated by
-  // the immediately preceding clone of the loop body (which represents
-  // the previous iteration).
-  for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
-    PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
-    if (IterNumber == 0) {
-      VMap[&*I] = NewPHI->getIncomingValueForBlock(PreHeader);
-    } else {
-      Value *LatchVal = NewPHI->getIncomingValueForBlock(Latch);
-      Instruction *LatchInst = dyn_cast<Instruction>(LatchVal);
-      if (LatchInst && L->contains(LatchInst))
-        VMap[&*I] = LVMap[LatchInst];
-      else
-        VMap[&*I] = LatchVal;
+  if (PeelLast) {
+    // For the last iteration, we use the value from the latch of the original
+    // loop directly.
+    for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
+      PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
+      VMap[&*I] = NewPHI->getIncomingValueForBlock(Latch);
+      NewPHI->eraseFromParent();
+    }
+  } else {
+    // For the first iteration, we use the value from the preheader directly.
+    // For any other iteration, we replace the phi with the value generated by
+    // the immediately preceding clone of the loop body (which represents
+    // the previous iteration).
+    for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
+      PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
+      if (IterNumber == 0) {
+        VMap[&*I] = NewPHI->getIncomingValueForBlock(PreHeader);
+      } else {
+        Value *LatchVal = NewPHI->getIncomingValueForBlock(Latch);
+        Instruction *LatchInst = dyn_cast<Instruction>(LatchVal);
+        if (LatchInst && L->contains(LatchInst))
+          VMap[&*I] = LVMap[LatchInst];
+        else
+          VMap[&*I] = LatchVal;
+      }
+      NewPHI->eraseFromParent();
     }
-    NewPHI->eraseFromParent();
   }
 
   // Fix up the outgoing values - we need to add a value for the iteration
@@ -905,11 +982,13 @@ llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,
 /// this provides a benefit, since the peeled off iterations, which account
 /// for the bulk of dynamic execution, can be further simplified by scalar
 /// optimizations.
-bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
+bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
                     ScalarEvolution *SE, DominatorTree &DT, AssumptionCache *AC,
                     bool PreserveLCSSA, ValueToValueMapTy &LVMap) {
   assert(PeelCount > 0 && "Attempt to peel out zero iterations?");
   assert(canPeel(L) && "Attempt to peel a loop which is not peelable?");
+  assert((!PeelLast || PeelCount == 1) &&
+         "can only peel off a single iteration from the end for now");
 
   LoopBlocksDFS LoopBlocks(L);
   LoopBlocks.perform(LI);
@@ -944,60 +1023,99 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
 
   Function *F = Header->getParent();
 
-  // Set up all the necessary basic blocks. It is convenient to split the
-  // preheader into 3 parts - two blocks to anchor the peeled copy of the loop
-  // body, and a new preheader for the "real" loop.
-
-  // Peeling the first iteration transforms.
-  //
-  // PreHeader:
-  // ...
-  // Header:
-  //   LoopBody
-  //   If (cond) goto Header
-  // Exit:
-  //
-  // into
-  //
-  // InsertTop:
-  //   LoopBody
-  //   If (!cond) goto Exit
-  // InsertBot:
-  // NewPreHeader:
-  // ...
-  // Header:
-  //  LoopBody
-  //  If (cond) goto Header
-  // Exit:
-  //
-  // Each following iteration will split the current bottom anchor in two,
-  // and put the new copy of the loop body between these two blocks. That is,
-  // after peeling another iteration from the example above, we'll split
-  // InsertBot, and get:
-  //
-  // InsertTop:
-  //   LoopBody
-  //   If (!cond) goto Exit
-  // InsertBot:
-  //   LoopBody
-  //   If (!cond) goto Exit
-  // InsertBot.next:
-  // NewPreHeader:
-  // ...
-  // Header:
-  //  LoopBody
-  //  If (cond) goto Header
-  // Exit:
-
-  BasicBlock *InsertTop = SplitEdge(PreHeader, Header, &DT, LI);
-  BasicBlock *InsertBot =
-      SplitBlock(InsertTop, InsertTop->getTerminator(), &DT, LI);
-  BasicBlock *NewPreHeader =
-      SplitBlock(InsertBot, InsertBot->getTerminator(), &DT, LI);
-
-  InsertTop->setName(Header->getName() + ".peel.begin");
-  InsertBot->setName(Header->getName() + ".peel.next");
-  NewPreHeader->setName(PreHeader->getName() + ".peel.newph");
+  // Set up all the necessary basic blocks.
+  BasicBlock *InsertTop;
+  BasicBlock *InsertBot;
+  BasicBlock *NewPreHeader;
+  DenseMap<Instruction *, Value *> ExitValues;
+  if (PeelLast) {
+    // It is convenient to split the single exit block from the latch the
+    // into 3 parts - two blocks to anchor the peeled copy of the loop body,
+    // and a new final  exit block.
+
+    // Peeling the last iteration transforms.
+    //
+    // PreHeader:
+    // ...
+    // Header:
+    //   LoopBody
+    //   If (cond) goto Header
+    // Exit:
+    //
+    // into
+    //
+    // Header:
+    //  LoopBody
+    //  If (cond) goto Header
+    // InsertTop:
+    //   LoopBody
+    //   If (!cond) goto InsertBot
+    // InsertBot:
+    // Exit:
+    // ...
+    BasicBlock *Exit = L->getExitBlock();
+    for (PHINode &P : Exit->phis())
+      ExitValues[&P] = P.getIncomingValueForBlock(Latch);
+
+    InsertTop = SplitEdge(Latch, Exit, &DT, LI);
+    InsertBot = SplitBlock(InsertTop, InsertTop->getTerminator(), &DT, LI);
+
+    InsertTop->setName(Exit->getName() + ".peel.begin");
+    InsertBot->setName(Exit->getName() + ".peel.next");
+  } else {
+    // It is convenient to split the preheader into 3 parts - two blocks to
+    // anchor the peeled copy of the loop body, and a new preheader for the
+    // "real" loop.
+
+    // Peeling the first iteration transforms.
+    //
+    // PreHeader:
+    // ...
+    // Header:
+    //   LoopBody
+    //   If (cond) goto Header
+    // Exit:
+    //
+    // into
+    //
+    // InsertTop:
+    //   LoopBody
+    //   If (!cond) goto Exit
+    // InsertBot:
+    // NewPreHeader:
+    // ...
+    // Header:
+    //  LoopBody
+    //  If (cond) goto Header
+    // Exit:
+    //
+    // Each following iteration will split the current bottom anchor in two,
+    // and put the new copy of the loop body between these two blocks. That
+    // is, after peeling another iteration from the example above, we'll
+    // split InsertBot, and get:
+    //
+    // InsertTop:
+    //   LoopBody
+    //   If (!cond) goto Exit
+    // InsertBot:
+    //   ...
[truncated]

efriedma-quic · 2025-05-12T18:39:50Z

llvm/include/llvm/Transforms/Utils/LoopPeel.h

-              DominatorTree &DT, AssumptionCache *AC, bool PreserveLCSSA,
-              ValueToValueMapTy &VMap);
+/// instructions in the last peeled-off iteration. If \p PeelLast is true, peel
+/// off the last \p PeelCount iterations from \p L. In that case, the caller has


Probably should say what happens if PeelLast is false.

Is "at least 2 iterations" still correct if PeelCount isn't 1?

Probably should say what happens if PeelLast is false.
updated, thanks

Is "at least 2 iterations" still correct if PeelCount isn't 1?

PeelCount for now must be 1 if PeelLast is true, documented, thanks

efriedma-quic · 2025-05-12T18:41:56Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+/// is known at the second-to-last. This function also has to make sure the loop
+/// exit condition can be adjusted when peeling and that the loop executes at
+/// least 2 iterations.
+static bool canPeelLastIteration(Loop &L, const SCEVAddRecExpr *LeftAR,


Should this be a public API, like canPeel?

Split off the legality checks and use it in comments and assert, thanks

efriedma-quic · 2025-05-12T18:46:07Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

 //  for (i = 0; i < n; i++)
 //    if (i < 2)
 //      ..
 //    else
 //      ..
 //   }
-static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
-                                         ScalarEvolution &SE) {
+static int countToEliminateCompares(Loop &L, unsigned MaxPeelCount,


Not sure I like the use of "int" here...

Is it possible to end up in a situation where we want to peel both the beginning and the end? What happens in that case?

Currently we only support either first or last. If both are possible, first should be preferred. I could track 2 separate peel counts to start with if preffered?

It's probably not that important to support both kinds of peeling on the same loop. But it seems simpler to understand if we always have separate peel-beginning and peel-end counts.

Split up thanks!

Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare. At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations. Both will be relaxed in follow-ups.

preames

This looks like a very useful improvement, thank you for working on this.

preames · 2025-05-13T14:50:51Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

@@ -439,6 +496,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
    }

    DesiredPeelCount = std::max(DesiredPeelCount, NewPeelCount);
+    DesiredPeelCountLast = std::max(DesiredPeelCountLast, NewPeelCount);
  };

  auto ComputePeelCountMinMax = [&](MinMaxIntrinsic *MinMax) {


As a follow up, you can handle the minmax case too.

preames · 2025-05-13T14:52:25Z

llvm/lib/Transforms/Utils/LoopPeel.cpp

+                        << " some Phis into invariants.\n");
+      PP.PeelCount = DesiredPeelCountLast;
+      PP.PeelProfiledIterations = false;
+      PP.PeelLast = true;


As a follow up, it would be nice if we could peel both first and last iteration off a single loop if profitable.

fhahn · 2025-05-13T20:01:53Z

This looks like a very useful improvement, thank you for working on this.

The initial version is still quite restricted, but I am planning on making various improvements in follow-ups, including the ones you mentiond (thanks!) and removing the minimum iteration restriction via a runtime check and better adjusting of the exit condition.

fhahn requested review from nikic, preames, annamthomas and efriedma-quic May 12, 2025 13:58

llvmbot added llvm:analysis llvm:transforms labels May 12, 2025

efriedma-quic reviewed May 12, 2025

View reviewed changes

fhahn force-pushed the peel-from-end branch from da8c8c6 to e3f7762 Compare May 12, 2025 21:23

fhahn added 3 commits May 13, 2025 12:01

!fixup address latest comments, thanks

4083a70

!fixup update to return pair of peel counts

95416ac

fhahn force-pushed the peel-from-end branch from e3f7762 to 95416ac Compare May 13, 2025 12:48

preames reviewed May 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopPeel] Implement initial peeling off the last loop iteration. #139551

[LoopPeel] Implement initial peeling off the last loop iteration. #139551

fhahn commented May 12, 2025

llvmbot commented May 12, 2025 •

edited

Loading

efriedma-quic May 12, 2025

fhahn May 12, 2025

efriedma-quic May 12, 2025

fhahn May 12, 2025

efriedma-quic May 12, 2025

fhahn May 12, 2025

efriedma-quic May 12, 2025

fhahn May 13, 2025

preames left a comment

preames May 13, 2025

preames May 13, 2025

fhahn commented May 13, 2025

[LoopPeel] Implement initial peeling off the last loop iteration. #139551

Are you sure you want to change the base?

[LoopPeel] Implement initial peeling off the last loop iteration. #139551

Conversation

fhahn commented May 12, 2025

llvmbot commented May 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented May 13, 2025

llvmbot commented May 12, 2025 •

edited

Loading