Skip to content

[LoopPeel] Implement initial peeling off the last loop iteration. #139551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 12, 2025

Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare.

At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations.

Both will be relaxed in follow-ups.

@llvmbot
Copy link
Member

llvmbot commented May 12, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare.

At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations.

Both will be relaxed in follow-ups.


Patch is 34.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139551.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+3)
  • (modified) llvm/include/llvm/Transforms/Utils/LoopPeel.h (+7-4)
  • (modified) llvm/lib/Transforms/Scalar/LoopFuse.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+2-1)
  • (modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+254-110)
  • (modified) llvm/test/Transforms/LoopUnroll/peel-last-iteration.ll (+97-28)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 3f639138d8b75..1aed98e8f50db 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -680,6 +680,9 @@ class TargetTransformInfo {
     /// If the value is true the peeling cost model can decide to peel only
     /// some iterations and in this case it will set this to false.
     bool PeelProfiledIterations;
+
+    /// Peel off the last PeelCount loop iterations.
+    bool PeelLast;
   };
 
   /// Get target-customized preferences for the generic loop peeling
diff --git a/llvm/include/llvm/Transforms/Utils/LoopPeel.h b/llvm/include/llvm/Transforms/Utils/LoopPeel.h
index 0b78700ca71bb..f7babaf036768 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopPeel.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopPeel.h
@@ -22,10 +22,13 @@ namespace llvm {
 bool canPeel(const Loop *L);
 
 /// VMap is the value-map that maps instructions from the original loop to
-/// instructions in the last peeled-off iteration.
-bool peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI, ScalarEvolution *SE,
-              DominatorTree &DT, AssumptionCache *AC, bool PreserveLCSSA,
-              ValueToValueMapTy &VMap);
+/// instructions in the last peeled-off iteration. If \p PeelLast is true, peel
+/// off the last \p PeelCount iterations from \p L. In that case, the caller has
+/// to make sure that the exit condition can be adjusted when peeling and that
+/// the loop executes at least 2 iterations.
+bool peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
+              ScalarEvolution *SE, DominatorTree &DT, AssumptionCache *AC,
+              bool PreserveLCSSA, ValueToValueMapTy &VMap);
 
 TargetTransformInfo::PeelingPreferences
 gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,
diff --git a/llvm/lib/Transforms/Scalar/LoopFuse.cpp b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
index 5bba3016ba4a1..d6bd92d520e28 100644
--- a/llvm/lib/Transforms/Scalar/LoopFuse.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopFuse.cpp
@@ -790,7 +790,8 @@ struct LoopFuser {
                       << " iterations of the first loop. \n");
 
     ValueToValueMapTy VMap;
-    FC0.Peeled = peelLoop(FC0.L, PeelCount, &LI, &SE, DT, &AC, true, VMap);
+    FC0.Peeled =
+        peelLoop(FC0.L, PeelCount, false, &LI, &SE, DT, &AC, true, VMap);
     if (FC0.Peeled) {
       LLVM_DEBUG(dbgs() << "Done Peeling\n");
 
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index d84b74dd0eecc..0b9fee5727c6f 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1314,7 +1314,8 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
     });
 
     ValueToValueMapTy VMap;
-    if (peelLoop(L, PP.PeelCount, LI, &SE, DT, &AC, PreserveLCSSA, VMap)) {
+    if (peelLoop(L, PP.PeelCount, PP.PeelLast, LI, &SE, DT, &AC, PreserveLCSSA,
+                 VMap)) {
       simplifyLoopAfterUnroll(L, true, LI, &SE, &DT, &AC, &TTI, nullptr);
       // If the loop was peeled, we already "used up" the profile information
       // we had, so we don't want to unroll or peel again.
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index f6ace9c4e5d2f..f01c3948b87ed 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -49,6 +49,7 @@ using namespace llvm::PatternMatch;
 #define DEBUG_TYPE "loop-peel"
 
 STATISTIC(NumPeeled, "Number of loops peeled");
+STATISTIC(NumPeeledEnd, "Number of loops peeled from end");
 
 static cl::opt<unsigned> UnrollPeelCount(
     "unroll-peel-count", cl::Hidden,
@@ -325,19 +326,66 @@ static unsigned peelToTurnInvariantLoadsDerefencebale(Loop &L,
   return 0;
 }
 
+/// Returns true if the last iteration can be peeled off and the condition (Pred
+/// LeftAR, RightSCEV) is known at the last iteration and the inverse condition
+/// is known at the second-to-last. This function also has to make sure the loop
+/// exit condition can be adjusted when peeling and that the loop executes at
+/// least 2 iterations.
+static bool canPeelLastIteration(Loop &L, const SCEVAddRecExpr *LeftAR,
+                                 const SCEV *RightSCEV, ScalarEvolution &SE) {
+  const SCEV *BTC = SE.getBackedgeTakenCount(&L);
+  Value *Inc;
+  CmpPredicate Pred;
+  BasicBlock *Succ1;
+  BasicBlock *Succ2;
+  // The loop must execute at least 2 iterations to guarantee that peeled
+  // iteration executes.
+  // TODO: Add checks during codegen.
+  if (isa<SCEVCouldNotCompute>(BTC) ||
+      !SE.isKnownPredicate(CmpInst::ICMP_UGT, BTC, SE.getOne(BTC->getType())))
+    return false;
+
+  // Check if the exit condition of the loop can be adjusted by the peeling
+  // codegen. For now, it must
+  // * exit via the latch,
+  // * the exit condition must be a NE/EQ compare of an induction with step
+  // of 1.
+  BasicBlock *Latch = L.getLoopLatch();
+  if (Latch != L.getExitingBlock() ||
+      !match(Latch->getTerminator(),
+             m_Br(m_ICmp(Pred, m_Value(Inc), m_Value()), m_BasicBlock(Succ1),
+                  m_BasicBlock(Succ2))) ||
+      !((Pred == CmpInst::ICMP_EQ && Succ2 == L.getHeader()) ||
+        (Pred == CmpInst::ICMP_NE && Succ1 == L.getHeader())) ||
+      !isa<SCEVAddRecExpr>(SE.getSCEV(Inc)) ||
+      !cast<SCEVAddRecExpr>(SE.getSCEV(Inc))->getStepRecurrence(SE)->isOne())
+    return false;
+
+  const SCEV *ValAtLastIter =
+      SE.applyLoopGuards(LeftAR->evaluateAtIteration(BTC, SE), &L);
+  const SCEV *ValAtSecondToLastIter = LeftAR->evaluateAtIteration(
+      SE.getMinusSCEV(BTC, SE.getOne(BTC->getType())), SE);
+
+  return SE.isKnownPredicate(ICmpInst::getInversePredicate(Pred), ValAtLastIter,
+                             SE.applyLoopGuards(RightSCEV, &L)) &&
+         SE.isKnownPredicate(Pred, ValAtSecondToLastIter, RightSCEV);
+}
+
 // Return the number of iterations to peel off that make conditions in the
-// body true/false. For example, if we peel 2 iterations off the loop below,
-// the condition i < 2 can be evaluated at compile time.
+// body true/false. Positive return values indicate the iterations to peel of
+// from the front and negative return values indicate the number of iterations
+// from the back after removing the sign. For example, if we peel 2 iterations
+// off the loop below, the condition i < 2 can be evaluated at compile time.
 //  for (i = 0; i < n; i++)
 //    if (i < 2)
 //      ..
 //    else
 //      ..
 //   }
-static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
-                                         ScalarEvolution &SE) {
+static int countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
+                                    ScalarEvolution &SE) {
   assert(L.isLoopSimplifyForm() && "Loop needs to be in loop simplify form");
-  unsigned DesiredPeelCount = 0;
+  int DesiredPeelCount = 0;
 
   // Do not peel the entire loop.
   const SCEV *BE = SE.getConstantMaxBackedgeTakenCount(&L);
@@ -349,9 +397,9 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
   // return true if inversed condition become known before reaching the
   // MaxPeelCount limit.
   auto PeelWhilePredicateIsKnown =
-      [&](unsigned &PeelCount, const SCEV *&IterVal, const SCEV *BoundSCEV,
+      [&](int &PeelCount, const SCEV *&IterVal, const SCEV *BoundSCEV,
           const SCEV *Step, ICmpInst::Predicate Pred) {
-        while (PeelCount < MaxPeelCount &&
+        while (unsigned(std::abs(PeelCount)) < MaxPeelCount &&
                SE.isKnownPredicate(Pred, IterVal, BoundSCEV)) {
           IterVal = SE.getAddExpr(IterVal, Step);
           ++PeelCount;
@@ -408,7 +456,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
 
     // Check if extending the current DesiredPeelCount lets us evaluate Pred
     // or !Pred in the loop body statically.
-    unsigned NewPeelCount = DesiredPeelCount;
+    int NewPeelCount = DesiredPeelCount;
 
     const SCEV *IterVal = LeftAR->evaluateAtIteration(
         SE.getConstant(LeftSCEV->getType(), NewPeelCount), SE);
@@ -421,8 +469,11 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
 
     const SCEV *Step = LeftAR->getStepRecurrence(SE);
     if (!PeelWhilePredicateIsKnown(NewPeelCount, IterVal, RightSCEV, Step,
-                                   Pred))
+                                   Pred)) {
+      if (canPeelLastIteration(L, LeftAR, RightSCEV, SE))
+        DesiredPeelCount = -1;
       return;
+    }
 
     // However, for equality comparisons, that isn't always sufficient to
     // eliminate the comparsion in loop body, we may need to peel one more
@@ -433,7 +484,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
                              RightSCEV) &&
         !SE.isKnownPredicate(Pred, IterVal, RightSCEV) &&
         SE.isKnownPredicate(Pred, NextIterVal, RightSCEV)) {
-      if (NewPeelCount >= MaxPeelCount)
+      if (unsigned(std::abs(NewPeelCount)) >= MaxPeelCount)
         return; // Need to peel one more iteration, but can't. Give up.
       ++NewPeelCount; // Great!
     }
@@ -472,7 +523,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
     // Check that AddRec is not wrapping.
     if (!(IsSigned ? AddRec->hasNoSignedWrap() : AddRec->hasNoUnsignedWrap()))
       return;
-    unsigned NewPeelCount = DesiredPeelCount;
+    int NewPeelCount = DesiredPeelCount;
     const SCEV *IterVal = AddRec->evaluateAtIteration(
         SE.getConstant(AddRec->getType(), NewPeelCount), SE);
     if (!PeelWhilePredicateIsKnown(NewPeelCount, IterVal, BoundSCEV, Step,
@@ -593,8 +644,9 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
       DesiredPeelCount = std::max(DesiredPeelCount, *NumPeels);
   }
 
-  DesiredPeelCount = std::max(DesiredPeelCount,
-                              countToEliminateCompares(*L, MaxPeelCount, SE));
+  int CountToEliminateCmps = countToEliminateCompares(*L, MaxPeelCount, SE);
+  DesiredPeelCount =
+      std::max(DesiredPeelCount, unsigned(std::abs(CountToEliminateCmps)));
 
   if (DesiredPeelCount == 0)
     DesiredPeelCount = peelToTurnInvariantLoadsDerefencebale(*L, DT, AC);
@@ -609,6 +661,9 @@ void llvm::computePeelCount(Loop *L, unsigned LoopSize,
                         << " some Phis into invariants.\n");
       PP.PeelCount = DesiredPeelCount;
       PP.PeelProfiledIterations = false;
+      PP.PeelLast =
+          DesiredPeelCount == unsigned(std::abs(CountToEliminateCmps)) &&
+          CountToEliminateCmps < 0;
       return;
     }
   }
@@ -733,6 +788,7 @@ static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
 /// InsertBot.
 /// \param IterNumber The serial number of the iteration currently being
 /// peeled off.
+/// \param PeelLast Peel off the last iterations from \p L.
 /// \param ExitEdges The exit edges of the original loop.
 /// \param[out] NewBlocks A list of the blocks in the newly created clone
 /// \param[out] VMap The value map between the loop and the new clone.
@@ -740,7 +796,8 @@ static void initBranchWeights(DenseMap<Instruction *, WeightInfo> &WeightInfos,
 /// \param LVMap A value-map that maps instructions from the original loop to
 /// instructions in the last peeled-off iteration.
 static void cloneLoopBlocks(
-    Loop *L, unsigned IterNumber, BasicBlock *InsertTop, BasicBlock *InsertBot,
+    Loop *L, unsigned IterNumber, bool PeelLast, BasicBlock *InsertTop,
+    BasicBlock *InsertBot,
     SmallVectorImpl<std::pair<BasicBlock *, BasicBlock *>> &ExitEdges,
     SmallVectorImpl<BasicBlock *> &NewBlocks, LoopBlocksDFS &LoopBlocks,
     ValueToValueMapTy &VMap, ValueToValueMapTy &LVMap, DominatorTree *DT,
@@ -804,16 +861,26 @@ static void cloneLoopBlocks(
 
   // Similarly, for the latch:
   // The original exiting edge is still hooked up to the loop exit.
-  // The backedge now goes to the "bottom", which is either the loop's real
-  // header (for the last peeled iteration) or the copied header of the next
-  // iteration (for every other iteration)
   BasicBlock *NewLatch = cast<BasicBlock>(VMap[Latch]);
-  auto *LatchTerm = cast<Instruction>(NewLatch->getTerminator());
-  for (unsigned idx = 0, e = LatchTerm->getNumSuccessors(); idx < e; ++idx)
-    if (LatchTerm->getSuccessor(idx) == Header) {
-      LatchTerm->setSuccessor(idx, InsertBot);
-      break;
+  if (PeelLast) {
+    // This is the last iteration and we definitely will go to the exit. Just
+    // set both successors to InsertBot and let the branch be simplified later.
+    assert(IterNumber == 0 && "Only peeling a single iteration implemented.");
+    auto *LatchTerm = cast<BranchInst>(NewLatch->getTerminator());
+    LatchTerm->setSuccessor(0, InsertBot);
+    LatchTerm->setSuccessor(1, InsertBot);
+  } else {
+    auto *LatchTerm = cast<Instruction>(NewLatch->getTerminator());
+    // The backedge now goes to the "bottom", which is either the loop's real
+    // header (for the last peeled iteration) or the copied header of the next
+    // iteration (for every other iteration)
+    for (unsigned idx = 0, e = LatchTerm->getNumSuccessors(); idx < e; ++idx) {
+      if (LatchTerm->getSuccessor(idx) == Header) {
+        LatchTerm->setSuccessor(idx, InsertBot);
+        break;
+      }
     }
+  }
   if (DT)
     DT->changeImmediateDominator(InsertBot, NewLatch);
 
@@ -821,23 +888,33 @@ static void cloneLoopBlocks(
   // that pick an incoming value from either the preheader, or the previous
   // loop iteration. Since this copy is no longer part of the loop, we
   // resolve this statically:
-  // For the first iteration, we use the value from the preheader directly.
-  // For any other iteration, we replace the phi with the value generated by
-  // the immediately preceding clone of the loop body (which represents
-  // the previous iteration).
-  for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
-    PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
-    if (IterNumber == 0) {
-      VMap[&*I] = NewPHI->getIncomingValueForBlock(PreHeader);
-    } else {
-      Value *LatchVal = NewPHI->getIncomingValueForBlock(Latch);
-      Instruction *LatchInst = dyn_cast<Instruction>(LatchVal);
-      if (LatchInst && L->contains(LatchInst))
-        VMap[&*I] = LVMap[LatchInst];
-      else
-        VMap[&*I] = LatchVal;
+  if (PeelLast) {
+    // For the last iteration, we use the value from the latch of the original
+    // loop directly.
+    for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
+      PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
+      VMap[&*I] = NewPHI->getIncomingValueForBlock(Latch);
+      NewPHI->eraseFromParent();
+    }
+  } else {
+    // For the first iteration, we use the value from the preheader directly.
+    // For any other iteration, we replace the phi with the value generated by
+    // the immediately preceding clone of the loop body (which represents
+    // the previous iteration).
+    for (BasicBlock::iterator I = Header->begin(); isa<PHINode>(I); ++I) {
+      PHINode *NewPHI = cast<PHINode>(VMap[&*I]);
+      if (IterNumber == 0) {
+        VMap[&*I] = NewPHI->getIncomingValueForBlock(PreHeader);
+      } else {
+        Value *LatchVal = NewPHI->getIncomingValueForBlock(Latch);
+        Instruction *LatchInst = dyn_cast<Instruction>(LatchVal);
+        if (LatchInst && L->contains(LatchInst))
+          VMap[&*I] = LVMap[LatchInst];
+        else
+          VMap[&*I] = LatchVal;
+      }
+      NewPHI->eraseFromParent();
     }
-    NewPHI->eraseFromParent();
   }
 
   // Fix up the outgoing values - we need to add a value for the iteration
@@ -905,11 +982,13 @@ llvm::gatherPeelingPreferences(Loop *L, ScalarEvolution &SE,
 /// this provides a benefit, since the peeled off iterations, which account
 /// for the bulk of dynamic execution, can be further simplified by scalar
 /// optimizations.
-bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
+bool llvm::peelLoop(Loop *L, unsigned PeelCount, bool PeelLast, LoopInfo *LI,
                     ScalarEvolution *SE, DominatorTree &DT, AssumptionCache *AC,
                     bool PreserveLCSSA, ValueToValueMapTy &LVMap) {
   assert(PeelCount > 0 && "Attempt to peel out zero iterations?");
   assert(canPeel(L) && "Attempt to peel a loop which is not peelable?");
+  assert((!PeelLast || PeelCount == 1) &&
+         "can only peel off a single iteration from the end for now");
 
   LoopBlocksDFS LoopBlocks(L);
   LoopBlocks.perform(LI);
@@ -944,60 +1023,99 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
 
   Function *F = Header->getParent();
 
-  // Set up all the necessary basic blocks. It is convenient to split the
-  // preheader into 3 parts - two blocks to anchor the peeled copy of the loop
-  // body, and a new preheader for the "real" loop.
-
-  // Peeling the first iteration transforms.
-  //
-  // PreHeader:
-  // ...
-  // Header:
-  //   LoopBody
-  //   If (cond) goto Header
-  // Exit:
-  //
-  // into
-  //
-  // InsertTop:
-  //   LoopBody
-  //   If (!cond) goto Exit
-  // InsertBot:
-  // NewPreHeader:
-  // ...
-  // Header:
-  //  LoopBody
-  //  If (cond) goto Header
-  // Exit:
-  //
-  // Each following iteration will split the current bottom anchor in two,
-  // and put the new copy of the loop body between these two blocks. That is,
-  // after peeling another iteration from the example above, we'll split
-  // InsertBot, and get:
-  //
-  // InsertTop:
-  //   LoopBody
-  //   If (!cond) goto Exit
-  // InsertBot:
-  //   LoopBody
-  //   If (!cond) goto Exit
-  // InsertBot.next:
-  // NewPreHeader:
-  // ...
-  // Header:
-  //  LoopBody
-  //  If (cond) goto Header
-  // Exit:
-
-  BasicBlock *InsertTop = SplitEdge(PreHeader, Header, &DT, LI);
-  BasicBlock *InsertBot =
-      SplitBlock(InsertTop, InsertTop->getTerminator(), &DT, LI);
-  BasicBlock *NewPreHeader =
-      SplitBlock(InsertBot, InsertBot->getTerminator(), &DT, LI);
-
-  InsertTop->setName(Header->getName() + ".peel.begin");
-  InsertBot->setName(Header->getName() + ".peel.next");
-  NewPreHeader->setName(PreHeader->getName() + ".peel.newph");
+  // Set up all the necessary basic blocks.
+  BasicBlock *InsertTop;
+  BasicBlock *InsertBot;
+  BasicBlock *NewPreHeader;
+  DenseMap<Instruction *, Value *> ExitValues;
+  if (PeelLast) {
+    // It is convenient to split the single exit block from the latch the
+    // into 3 parts - two blocks to anchor the peeled copy of the loop body,
+    // and a new final  exit block.
+
+    // Peeling the last iteration transforms.
+    //
+    // PreHeader:
+    // ...
+    // Header:
+    //   LoopBody
+    //   If (cond) goto Header
+    // Exit:
+    //
+    // into
+    //
+    // Header:
+    //  LoopBody
+    //  If (cond) goto Header
+    // InsertTop:
+    //   LoopBody
+    //   If (!cond) goto InsertBot
+    // InsertBot:
+    // Exit:
+    // ...
+    BasicBlock *Exit = L->getExitBlock();
+    for (PHINode &P : Exit->phis())
+      ExitValues[&P] = P.getIncomingValueForBlock(Latch);
+
+    InsertTop = SplitEdge(Latch, Exit, &DT, LI);
+    InsertBot = SplitBlock(InsertTop, InsertTop->getTerminator(), &DT, LI);
+
+    InsertTop->setName(Exit->getName() + ".peel.begin");
+    InsertBot->setName(Exit->getName() + ".peel.next");
+  } else {
+    // It is convenient to split the preheader into 3 parts - two blocks to
+    // anchor the peeled copy of the loop body, and a new preheader for the
+    // "real" loop.
+
+    // Peeling the first iteration transforms.
+    //
+    // PreHeader:
+    // ...
+    // Header:
+    //   LoopBody
+    //   If (cond) goto Header
+    // Exit:
+    //
+    // into
+    //
+    // InsertTop:
+    //   LoopBody
+    //   If (!cond) goto Exit
+    // InsertBot:
+    // NewPreHeader:
+    // ...
+    // Header:
+    //  LoopBody
+    //  If (cond) goto Header
+    // Exit:
+    //
+    // Each following iteration will split the current bottom anchor in two,
+    // and put the new copy of the loop body between these two blocks. That
+    // is, after peeling another iteration from the example above, we'll
+    // split InsertBot, and get:
+    //
+    // InsertTop:
+    //   LoopBody
+    //   If (!cond) goto Exit
+    // InsertBot:
+    //   ...
[truncated]

DominatorTree &DT, AssumptionCache *AC, bool PreserveLCSSA,
ValueToValueMapTy &VMap);
/// instructions in the last peeled-off iteration. If \p PeelLast is true, peel
/// off the last \p PeelCount iterations from \p L. In that case, the caller has
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should say what happens if PeelLast is false.

Is "at least 2 iterations" still correct if PeelCount isn't 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should say what happens if PeelLast is false.
updated, thanks

Is "at least 2 iterations" still correct if PeelCount isn't 1?

PeelCount for now must be 1 if PeelLast is true, documented, thanks

/// is known at the second-to-last. This function also has to make sure the loop
/// exit condition can be adjusted when peeling and that the loop executes at
/// least 2 iterations.
static bool canPeelLastIteration(Loop &L, const SCEVAddRecExpr *LeftAR,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a public API, like canPeel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off the legality checks and use it in comments and assert, thanks

// for (i = 0; i < n; i++)
// if (i < 2)
// ..
// else
// ..
// }
static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
ScalarEvolution &SE) {
static int countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like the use of "int" here...

Is it possible to end up in a situation where we want to peel both the beginning and the end? What happens in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only support either first or last. If both are possible, first should be preferred. I could track 2 separate peel counts to start with if preffered?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not that important to support both kinds of peeling on the same loop. But it seems simpler to understand if we always have separate peel-beginning and peel-end counts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split up thanks!

fhahn added 3 commits May 13, 2025 12:01
Generalize countToEliminateCompares to also consider peeling off the
last iteration if it eliminates a compare. At the moment, codegen for
peeling off the last iteration is quite restrictive and callers have to
make sure that the exit condition can be adjusted when peeling and that
the loop executes at least 2 iterations. Both will be relaxed in
follow-ups.
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very useful improvement, thank you for working on this.

@@ -439,6 +496,7 @@ static unsigned countToEliminateCompares(Loop &L, unsigned MaxPeelCount,
}

DesiredPeelCount = std::max(DesiredPeelCount, NewPeelCount);
DesiredPeelCountLast = std::max(DesiredPeelCountLast, NewPeelCount);
};

auto ComputePeelCountMinMax = [&](MinMaxIntrinsic *MinMax) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up, you can handle the minmax case too.

<< " some Phis into invariants.\n");
PP.PeelCount = DesiredPeelCountLast;
PP.PeelProfiledIterations = false;
PP.PeelLast = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up, it would be nice if we could peel both first and last iteration off a single loop if profitable.

@fhahn
Copy link
Contributor Author

fhahn commented May 13, 2025

This looks like a very useful improvement, thank you for working on this.

The initial version is still quite restricted, but I am planning on making various improvements in follow-ups, including the ones you mentiond (thanks!) and removing the minimum iteration restriction via a runtime check and better adjusting of the exit condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants