Skip to content

[LoopVersioningLICM] Only mark pointers with generated checks as noalias #135168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

john-brawn-arm
Copy link
Collaborator

@john-brawn-arm john-brawn-arm commented Apr 10, 2025

Currently when we version a loop all loads and stores have the noalias metadata added to them. If there were some pointers that could not be analysed, and thus we could not generate runtime aliasing checks for, then we should not mark loads and stores using these pointers as noalias.

This is done by getting rid of setNoAliasToLoop and instead using annotateLoopWithNoAlias, as that already correctly handles partial alias information. This does result in slightly different aliasing metadata being generated, but it looks like it's more precise.

Currently this doesn't result in any change to the transforms that LoopVersioningLICM does, as LoopAccessAnalysis discards all results if it couldn't analyse every pointer leading to no loop versioning happening, but an upcoming patch will change that and we need this first otherwise we incorrectly mark some pointers as noalias even when they aren't.

Currently when we version a loop all loads and stores have the noalias
metadata added to them. If there were some pointers that could not be
analyzed, and thus we could not generate runtime aliasing checks for,
then we should not mark loads and stores using these pointers as
noalias.

Currently this does nothing, as LoopAccessAnalysis discards all
results if it couldn't analyse every pointer leading to no loop
ersioning happening, but an upcoming patch will change that and we
need this first otherwise we incorrectly mark some pointers as noalias
even when they aren't.
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: John Brawn (john-brawn-arm)

Changes

Currently when we version a loop all loads and stores have the noalias metadata added to them. If there were some pointers that could not be analysed, and thus we could not generate runtime aliasing checks for, then we should not mark loads and stores using these pointers as noalias.

Currently this does nothing, as LoopAccessAnalysis discards all results if it couldn't analyse every pointer leading to no loop versioning happening, but an upcoming patch will change that and we need this first otherwise we incorrectly mark some pointers as noalias even when they aren't.


Full diff: https://github.com/llvm/llvm-project/pull/135168.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp (+20-10)
  • (added) llvm/test/Transforms/LoopVersioningLICM/load-from-unknown-address.ll (+307)
diff --git a/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp b/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
index 6e91c4fa6e230..74b424655a9fa 100644
--- a/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopVersioningLICM.cpp
@@ -344,6 +344,13 @@ bool LoopVersioningLICM::instructionSafeForVersioning(Instruction *I) {
     }
     LoadAndStoreCounter++;
     Value *Ptr = St->getPointerOperand();
+    // Don't allow stores that we don't have runtime checks for, as we won't be
+    // able to mark them noalias meaning they would prevent any code motion.
+    auto &Pointers = LAI->getRuntimePointerChecking()->Pointers;
+    if (!any_of(Pointers, [&](auto &P) { return P.PointerValue == Ptr; })) {
+      LLVM_DEBUG(dbgs() << "    Found a store without a runtime check.\n");
+      return false;
+    }
     // Check loop invariant.
     if (SE->isLoopInvariant(SE->getSCEV(Ptr), CurLoop))
       InvariantCounter++;
@@ -361,6 +368,13 @@ bool LoopVersioningLICM::legalLoopInstructions() {
   InvariantCounter = 0;
   IsReadOnlyLoop = true;
   using namespace ore;
+  // Get LoopAccessInfo from current loop via the proxy.
+  LAI = &LAIs.getInfo(*CurLoop);
+  // Check LoopAccessInfo for need of runtime check.
+  if (LAI->getRuntimePointerChecking()->getChecks().empty()) {
+    LLVM_DEBUG(dbgs() << "    LAA: Runtime check not found !!\n");
+    return false;
+  }
   // Iterate over loop blocks and instructions of each block and check
   // instruction safety.
   for (auto *Block : CurLoop->getBlocks())
@@ -374,13 +388,6 @@ bool LoopVersioningLICM::legalLoopInstructions() {
         return false;
       }
     }
-  // Get LoopAccessInfo from current loop via the proxy.
-  LAI = &LAIs.getInfo(*CurLoop);
-  // Check LoopAccessInfo for need of runtime check.
-  if (LAI->getRuntimePointerChecking()->getChecks().empty()) {
-    LLVM_DEBUG(dbgs() << "    LAA: Runtime check not found !!\n");
-    return false;
-  }
   // Number of runtime-checks should be less then RuntimeMemoryCheckThreshold
   if (LAI->getNumRuntimePointerChecks() >
       VectorizerParams::RuntimeMemoryCheckThreshold) {
@@ -515,13 +522,16 @@ void LoopVersioningLICM::setNoAliasToLoop(Loop *VerLoop) {
   StringRef Name = "LVAliasScope";
   MDNode *NewScope = MDB.createAnonymousAliasScope(NewDomain, Name);
   SmallVector<Metadata *, 4> Scopes{NewScope}, NoAliases{NewScope};
+  auto &Pointers = LAI->getRuntimePointerChecking()->Pointers;
+
   // Iterate over each instruction of loop.
   // set no-alias for all load & store instructions.
   for (auto *Block : CurLoop->getBlocks()) {
     for (auto &Inst : *Block) {
-      // Only interested in instruction that may modify or read memory.
-      if (!Inst.mayReadFromMemory() && !Inst.mayWriteToMemory())
-        continue;
+      // We can only add noalias to pointers that we've inserted checks for
+      Value *V = getLoadStorePointerOperand(&Inst);
+      if (!V || !any_of(Pointers, [&](auto &P) { return P.PointerValue == V; }))
+	continue;
       // Set no-alias for current instruction.
       Inst.setMetadata(
           LLVMContext::MD_noalias,
diff --git a/llvm/test/Transforms/LoopVersioningLICM/load-from-unknown-address.ll b/llvm/test/Transforms/LoopVersioningLICM/load-from-unknown-address.ll
new file mode 100644
index 0000000000000..e9b2954039198
--- /dev/null
+++ b/llvm/test/Transforms/LoopVersioningLICM/load-from-unknown-address.ll
@@ -0,0 +1,307 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt < %s -S -passes='function(loop-versioning-licm,loop-mssa(licm))' | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+
+; In these tests we have a loop where we can calculate the bounds of some memory
+; accesses but not others.
+
+; Load from a gep whose bounds can't be calculated as the offset is loaded from memory
+; FIXME: Not knowing the bounds of the gep shouldn't stop us from hoisting the load of rval
+define void @gep_loaded_offset(ptr %p, ptr %q, ptr %r, i32 %n) {
+; CHECK-LABEL: define void @gep_loaded_offset(
+; CHECK-SAME: ptr [[P:%.*]], ptr [[Q:%.*]], ptr [[R:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_ADDR:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[P_ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], %[[WHILE_BODY]] ], [ [[P]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_ADDR]], -1
+; CHECK-NEXT:    [[RVAL:%.*]] = load i64, ptr [[R]], align 4
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[Q]], i64 [[RVAL]]
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds nuw i8, ptr [[P_ADDR]], i64 4
+; CHECK-NEXT:    store i32 [[VAL]], ptr [[P_ADDR]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n.addr = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %p.addr = phi ptr [ %incdec.ptr, %while.body ], [ %p, %entry ]
+  %dec = add nsw i32 %n.addr, -1
+  %rval = load i64, ptr %r, align 4
+  %arrayidx = getelementptr inbounds i32, ptr %q, i64 %rval
+  %val = load i32, ptr %arrayidx, align 4
+  %incdec.ptr = getelementptr inbounds nuw i8, ptr %p.addr, i64 4
+  store i32 %val, ptr %p.addr, align 4
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}
+
+; As above but with a store to the loaded address. This should prevent the loop
+; from being versioned, as we wouldn't be able to do any code motion.
+define void @gep_loaded_offset_with_store(ptr %p, ptr %q, ptr %r, i32 %n) {
+; CHECK-LABEL: define void @gep_loaded_offset_with_store(
+; CHECK-SAME: ptr [[P:%.*]], ptr [[Q:%.*]], ptr [[R:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_ADDR:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[P_ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], %[[WHILE_BODY]] ], [ [[P]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_ADDR]], -1
+; CHECK-NEXT:    [[RVAL:%.*]] = load i64, ptr [[R]], align 4
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[Q]], i64 [[RVAL]]
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    store i32 0, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds nuw i8, ptr [[P_ADDR]], i64 4
+; CHECK-NEXT:    store i32 [[VAL]], ptr [[P_ADDR]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n.addr = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %p.addr = phi ptr [ %incdec.ptr, %while.body ], [ %p, %entry ]
+  %dec = add nsw i32 %n.addr, -1
+  %rval = load i64, ptr %r, align 4
+  %arrayidx = getelementptr inbounds i32, ptr %q, i64 %rval
+  %val = load i32, ptr %arrayidx, align 4
+  store i32 0, ptr %arrayidx, align 4
+  %incdec.ptr = getelementptr inbounds nuw i8, ptr %p.addr, i64 4
+  store i32 %val, ptr %p.addr, align 4
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}
+
+; Load from a gep whose bounds can't be calculated as the pointer is loaded from memory
+; FIXME: Not knowing the bounds of the gep shouldn't stop us from hoisting the load of rval
+define void @gep_loaded_base(ptr %p, ptr %q, ptr %r, i32 %n) {
+; CHECK-LABEL: define void @gep_loaded_base(
+; CHECK-SAME: ptr [[P:%.*]], ptr [[Q:%.*]], ptr [[R:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_ADDR:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[P_ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], %[[WHILE_BODY]] ], [ [[P]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_ADDR]], -1
+; CHECK-NEXT:    [[RVAL:%.*]] = load ptr, ptr [[R]], align 4
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[RVAL]], i64 0
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds nuw i8, ptr [[P_ADDR]], i64 4
+; CHECK-NEXT:    store i32 [[VAL]], ptr [[P_ADDR]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n.addr = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %p.addr = phi ptr [ %incdec.ptr, %while.body ], [ %p, %entry ]
+  %dec = add nsw i32 %n.addr, -1
+  %rval = load ptr, ptr %r, align 4
+  %arrayidx = getelementptr inbounds i32, ptr %rval, i64 0
+  %val = load i32, ptr %arrayidx, align 4
+  %incdec.ptr = getelementptr inbounds nuw i8, ptr %p.addr, i64 4
+  store i32 %val, ptr %p.addr, align 4
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}
+
+; Load from a gep with an offset that scalar evolution can't describe
+; FIXME: Not knowing the bounds of the gep shouldn't stop us from hoisting the load of qval
+define void @gep_strange_offset(ptr %p, ptr %q, ptr %r, i32 %n) {
+; CHECK-LABEL: define void @gep_strange_offset(
+; CHECK-SAME: ptr [[P:%.*]], ptr [[Q:%.*]], ptr [[R:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_ADDR:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[P_ADDR:%.*]] = phi ptr [ [[INCDEC_PTR:%.*]], %[[WHILE_BODY]] ], [ [[P]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_ADDR]], -1
+; CHECK-NEXT:    [[QVAL:%.*]] = load i32, ptr [[Q]], align 4
+; CHECK-NEXT:    [[REM:%.*]] = srem i32 [[DEC]], 2
+; CHECK-NEXT:    [[IDXPROM:%.*]] = sext i32 [[REM]] to i64
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[R]], i64 [[IDXPROM]]
+; CHECK-NEXT:    [[VAL:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[VAL]], [[QVAL]]
+; CHECK-NEXT:    [[INCDEC_PTR]] = getelementptr inbounds nuw i8, ptr [[P_ADDR]], i64 4
+; CHECK-NEXT:    store i32 [[ADD]], ptr [[P_ADDR]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n.addr = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %p.addr = phi ptr [ %incdec.ptr, %while.body ], [ %p, %entry ]
+  %dec = add nsw i32 %n.addr, -1
+  %qval = load i32, ptr %q, align 4
+  %rem = srem i32 %dec, 2
+  %idxprom = sext i32 %rem to i64
+  %arrayidx = getelementptr inbounds i32, ptr %r, i64 %idxprom
+  %val = load i32, ptr %arrayidx, align 4
+  %add = add nsw i32 %val, %qval
+  %incdec.ptr = getelementptr inbounds nuw i8, ptr %p.addr, i64 4
+  store i32 %add, ptr %p.addr, align 4
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}
+
+; A memcpy-like loop where the source address is loaded from a pointer
+; FIXME: We should be able to hoist the load of the source address pointer
+define void @memcpy_load_src(ptr %dst, ptr %src, i32 %n) {
+; CHECK-LABEL: define void @memcpy_load_src(
+; CHECK-SAME: ptr [[DST:%.*]], ptr [[SRC:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_VAL:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DST_VAL:%.*]] = phi ptr [ [[DST_VAL_NEXT:%.*]], %[[WHILE_BODY]] ], [ [[DST]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_VAL]], -1
+; CHECK-NEXT:    [[SRC_VAL:%.*]] = load ptr, ptr [[SRC]], align 8
+; CHECK-NEXT:    [[SRC_VAL_NEXT:%.*]] = getelementptr inbounds nuw i8, ptr [[SRC_VAL]], i64 1
+; CHECK-NEXT:    [[DST_VAL_NEXT]] = getelementptr inbounds nuw i8, ptr [[DST_VAL]], i64 1
+; CHECK-NEXT:    store ptr [[SRC_VAL_NEXT]], ptr [[SRC]], align 8
+; CHECK-NEXT:    [[VAL:%.*]] = load i8, ptr [[SRC_VAL]], align 1
+; CHECK-NEXT:    store i8 [[VAL]], ptr [[DST_VAL]], align 1
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n_val = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %dst_val = phi ptr [ %dst_val.next, %while.body ], [ %dst, %entry ]
+  %dec = add nsw i32 %n_val, -1
+  %src_val = load ptr, ptr %src, align 8
+  %src_val.next = getelementptr inbounds nuw i8, ptr %src_val, i64 1
+  %dst_val.next = getelementptr inbounds nuw i8, ptr %dst_val, i64 1
+  store ptr %src_val.next, ptr %src, align 8
+  %val = load i8, ptr %src_val, align 1
+  store i8 %val, ptr %dst_val, align 1
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}
+
+; A memcpy-like loop where the destination address is loaded from a pointer
+; FIXME: We could hoist the load of the destination address, but doing the
+; bounds check of the store through that pointer itself requires using the
+; hoisted load.
+define void @memcpy_load_dst(ptr %dst, ptr %src, i32 %n) {
+; CHECK-LABEL: define void @memcpy_load_dst(
+; CHECK-SAME: ptr [[DST:%.*]], ptr [[SRC:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_VAL:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[SRC_VAL:%.*]] = phi ptr [ [[SRC_VAL_NEXT:%.*]], %[[WHILE_BODY]] ], [ [[SRC]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_VAL]], -1
+; CHECK-NEXT:    [[DST_VAL:%.*]] = load ptr, ptr [[DST]], align 8
+; CHECK-NEXT:    [[SRC_VAL_NEXT]] = getelementptr inbounds nuw i8, ptr [[SRC_VAL]], i64 1
+; CHECK-NEXT:    [[DST_VAL_NEXT:%.*]] = getelementptr inbounds nuw i8, ptr [[DST_VAL]], i64 1
+; CHECK-NEXT:    store ptr [[DST_VAL_NEXT]], ptr [[DST]], align 8
+; CHECK-NEXT:    [[VAL:%.*]] = load i8, ptr [[SRC_VAL]], align 1
+; CHECK-NEXT:    store i8 [[VAL]], ptr [[DST_VAL]], align 1
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n_val = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %src_val = phi ptr [ %src_val.next, %while.body ], [ %src, %entry ]
+  %dec = add nsw i32 %n_val, -1
+  %dst_val = load ptr, ptr %dst, align 8
+  %src_val.next = getelementptr inbounds nuw i8, ptr %src_val, i64 1
+  %dst_val.next = getelementptr inbounds nuw i8, ptr %dst_val, i64 1
+  store ptr %dst_val.next, ptr %dst, align 8
+  %val = load i8, ptr %src_val, align 1
+  store i8 %val, ptr %dst_val, align 1
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}
+
+; A memcpy-like loop where both the source and destination pointers are loaded from pointers
+; FIXME: We could hoist the loads of both addresses, but doing the bounds check
+; of the store through the destination address itself requires using the hoisted
+; load.
+define void @memcpy_load_src_dst(ptr %dst, ptr %src, i32 %n) {
+; CHECK-LABEL: define void @memcpy_load_src_dst(
+; CHECK-SAME: ptr [[DST:%.*]], ptr [[SRC:%.*]], i32 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br label %[[WHILE_BODY:.*]]
+; CHECK:       [[WHILE_BODY]]:
+; CHECK-NEXT:    [[N_VAL:%.*]] = phi i32 [ [[DEC:%.*]], %[[WHILE_BODY]] ], [ [[N]], %[[ENTRY]] ]
+; CHECK-NEXT:    [[DEC]] = add nsw i32 [[N_VAL]], -1
+; CHECK-NEXT:    [[SRC_VAL:%.*]] = load ptr, ptr [[SRC]], align 8
+; CHECK-NEXT:    [[DST_VAL:%.*]] = load ptr, ptr [[DST]], align 8
+; CHECK-NEXT:    [[SRC_VAL_NEXT:%.*]] = getelementptr inbounds nuw i8, ptr [[SRC_VAL]], i64 1
+; CHECK-NEXT:    [[DST_VAL_NEXT:%.*]] = getelementptr inbounds nuw i8, ptr [[DST_VAL]], i64 1
+; CHECK-NEXT:    store ptr [[SRC_VAL_NEXT]], ptr [[SRC]], align 8
+; CHECK-NEXT:    store ptr [[DST_VAL_NEXT]], ptr [[DST]], align 8
+; CHECK-NEXT:    [[VAL:%.*]] = load i8, ptr [[SRC_VAL]], align 1
+; CHECK-NEXT:    store i8 [[VAL]], ptr [[DST_VAL]], align 1
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[DEC]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label %[[WHILE_END:.*]], label %[[WHILE_BODY]]
+; CHECK:       [[WHILE_END]]:
+; CHECK-NEXT:    ret void
+;
+entry:
+  br label %while.body
+
+while.body:
+  %n_val = phi i32 [ %dec, %while.body ], [ %n, %entry ]
+  %dec = add nsw i32 %n_val, -1
+  %src_val = load ptr, ptr %src, align 8
+  %dst_val = load ptr, ptr %dst, align 8
+  %src_val.next = getelementptr inbounds nuw i8, ptr %src_val, i64 1
+  %dst_val.next = getelementptr inbounds nuw i8, ptr %dst_val, i64 1
+  store ptr %src_val.next, ptr %src, align 8
+  store ptr %dst_val.next, ptr %dst, align 8
+  %val = load i8, ptr %src_val, align 1
+  store i8 %val, ptr %dst_val, align 1
+  %tobool.not = icmp eq i32 %dec, 0
+  br i1 %tobool.not, label %while.end, label %while.body
+
+while.end:
+  ret void
+}

Copy link

github-actions bot commented Apr 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@john-brawn-arm
Copy link
Collaborator Author

The "upcoming patch" I mention that will be adjusting LoopAccessAnalysis is john-brawn-arm@5609bbd

@john-brawn-arm
Copy link
Collaborator Author

Ping.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-brawn-arm
Copy link
Collaborator Author

Would it help to use https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Transforms/Utils/LoopVersioning.h#L100 here?

Yes, it looks like that would let us remove LoopVersioningLICM::setNoAliasToLoop. The alias metadata that's generated is slightly different, though it looks like it's more precise - rather than generating a single alias.scope that the loads/stores don't alias, we instead have two alias.scope which don't alias each other.

@john-brawn-arm
Copy link
Collaborator Author

Ping

if (!any_of(Pointers, [&](auto &P) { return P.PointerValue == Ptr; })) {
LLVM_DEBUG(dbgs() << " Found a store without a runtime check.\n");
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed for this patch, or for the future LAA change you have in mind?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to do this change in this patch, as it's part of handling the case where only some pointers have had checks generated for them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but doesn't LVer.annotateLoopWithNoAlias() automatically take care of this? It should only add annotations to instructions it has RT checks for

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not intended as a legality check (maybe it shouldn't be in instructionSafeForVersioning...) but as a profitability check, on the reasoning that missing noalias metadata on a store will prevent LICM of accesses.

But thinking about this more carefully, I'm not sure this heuristic really makes sense. A loop may well have perfectly analyzable stores that will not interfere with LICM of other accesses (say if the store is to a non-escaped alloca).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a profitability check. An example of where we need this is the function gep_loaded_offset_with_store in the added load-from-unknown-address.ll test. Without this check we version the loop, but LICM can't do anything and we end up with two identical copies of the loop. We could possibly have a more precise check here, but I think that's something to look at in the future not in this patch.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this work out

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm concerned that this is now unconditionally computing LAA, but as this pass is not enabled by default and unlikely to be enabled I guess this is fine for now.

@john-brawn-arm john-brawn-arm merged commit 7d867c6 into llvm:main May 12, 2025
11 checks passed
@john-brawn-arm john-brawn-arm deleted the loop_versioning_licm_partial_noalias branch May 12, 2025 09:15
@llvm-ci
Copy link
Collaborator

llvm-ci commented May 12, 2025

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/15860

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: tools/lldb-dap/console/TestDAP_console.py (1175 of 3039)
PASS: lldb-api :: tools/lldb-dap/exception/TestDAP_exception.py (1176 of 3039)
PASS: lldb-api :: tools/lldb-dap/databreakpoint/TestDAP_setDataBreakpoints.py (1177 of 3039)
UNSUPPORTED: lldb-api :: tools/lldb-dap/exception/objc/TestDAP_exception_objc.py (1178 of 3039)
PASS: lldb-api :: tools/lldb-dap/exception/cpp/TestDAP_exception_cpp.py (1179 of 3039)
UNSUPPORTED: lldb-api :: tools/lldb-dap/extendedStackTrace/TestDAP_extendedStackTrace.py (1180 of 3039)
PASS: lldb-api :: tools/lldb-dap/instruction-breakpoint/TestDAP_instruction_breakpoint.py (1181 of 3039)
PASS: lldb-api :: tools/lldb-dap/disconnect/TestDAP_disconnect.py (1182 of 3039)
PASS: lldb-api :: tools/lldb-dap/io/TestDAP_io.py (1183 of 3039)
PASS: lldb-api :: tools/lldb-dap/locations/TestDAP_locations.py (1184 of 3039)
FAIL: lldb-api :: tools/lldb-dap/module-event/TestDAP_module_event.py (1185 of 3039)
******************** TEST 'lldb-api :: tools/lldb-dap/module-event/TestDAP_module_event.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/module-event -p TestDAP_module_event.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 7d867c6d094cb1b7d98c5aab983558254fbd0fdc)
  clang revision 7d867c6d094cb1b7d98c5aab983558254fbd0fdc
  llvm revision 7d867c6d094cb1b7d98c5aab983558254fbd0fdc
Skipping the following test categories: ['libc++', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
========= DEBUG ADAPTER PROTOCOL LOGS =========
1747042634.538522005 --> (stdin/stdout) {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"$__lldb_sourceInitFile":false},"seq":1}
1747042634.543543339 <-- (stdin/stdout) {"body":{"$__lldb_version":"lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 7d867c6d094cb1b7d98c5aab983558254fbd0fdc)\n  clang revision 7d867c6d094cb1b7d98c5aab983558254fbd0fdc\n  llvm revision 7d867c6d094cb1b7d98c5aab983558254fbd0fdc","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"default":false,"filter":"cpp_catch","label":"C++ Catch"},{"default":false,"filter":"cpp_throw","label":"C++ Throw"},{"default":false,"filter":"objc_catch","label":"Objective-C Catch"},{"default":false,"filter":"objc_throw","label":"Objective-C Throw"}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionInfoRequest":true,"supportsExceptionOptions":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsRestartRequest":true,"supportsSetVariable":true,"supportsStepInTargetsRequest":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
1747042634.543655157 <-- (stdin/stdout) {"event":"initialized","seq":0,"type":"event"}
1747042634.544250011 --> (stdin/stdout) {"command":"configurationDone","type":"request","arguments":{},"seq":2}
1747042634.544427633 <-- (stdin/stdout) {"command":"configurationDone","request_seq":2,"seq":0,"success":true,"type":"response"}
1747042634.544709444 --> (stdin/stdout) {"command":"launch","type":"request","arguments":{"program":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module-event/TestDAP_module_event.test_module_event/a.out","stopOnEntry":true,"initCommands":["settings clear --all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"","settings set use-color false","settings set show-statusline false"],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false},"seq":3}
1747042634.545340300 <-- (stdin/stdout) {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":0,"type":"event"}
1747042634.545396566 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings clear --all\n"},"event":"output","seq":0,"type":"event"}
1747042634.545415163 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":0,"type":"event"}
1747042634.545428753 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":0,"type":"event"}
1747042634.545441628 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.disable-aslr false\n"},"event":"output","seq":0,"type":"event"}
1747042634.545454741 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.detach-on-error false\n"},"event":"output","seq":0,"type":"event"}
1747042634.545493841 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set target.auto-apply-fixits false\n"},"event":"output","seq":0,"type":"event"}
1747042634.545506001 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set plugin.process.gdb-remote.packet-timeout 60\n"},"event":"output","seq":0,"type":"event"}
1747042634.545520306 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"\n"},"event":"output","seq":0,"type":"event"}
1747042634.545532465 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set use-color false\n"},"event":"output","seq":0,"type":"event"}
1747042634.545543671 <-- (stdin/stdout) {"body":{"category":"console","output":"(lldb) settings set show-statusline false\n"},"event":"output","seq":0,"type":"event"}
1747042634.974434614 <-- (stdin/stdout) {"command":"launch","request_seq":3,"seq":0,"success":true,"type":"response"}
1747042634.974518538 <-- (stdin/stdout) {"body":{"module":{"addressRange":"4156297216","debugInfoSize":"983.3KB","id":"0D794E6C-AF7E-D8CB-B9BA-E385B4F8753F-5A793D65","name":"ld-linux-armhf.so.3","path":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolFilePath":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1747042634.974704981 <-- (stdin/stdout) {"body":{"isLocalProcess":true,"name":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module-event/TestDAP_module_event.test_module_event/a.out","startMethod":"launch","systemProcessId":1302576},"event":"process","seq":0,"type":"event"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants