Skip to content

[AMDGPU] Move kernarg preload logic to separate pass #130434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 12, 2025

Conversation

kerbowa
Copy link
Member

@kerbowa kerbowa commented Mar 8, 2025

Moves kernarg preload logic to its own module pass. Cloned function
declarations are removed when preloading hidden arguments. The inreg
attribute is now added in this pass instead of AMDGPUAttributor. The
rest of the logic is copied from AMDGPULowerKernelArguments which now
only check whether an arguments is marked inreg to avoid replacing
direct uses of preloaded arguments. This change requires test updates to
remove inreg from lit tests with kernels that don't actually want
preloading.

Copy link
Member Author

kerbowa commented Mar 8, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@kerbowa kerbowa requested review from arsenm and shiltian March 8, 2025 19:35
@kerbowa kerbowa marked this pull request as ready for review March 8, 2025 19:35
@llvmbot
Copy link
Member

llvmbot commented Mar 8, 2025

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Austin Kerbow (kerbowa)

Changes

Moves kernarg preload logic to its own module pass. Cloned function
declarations are removed when preloading hidden arguments. The inreg
attribute is now added in this pass instead of AMDGPUAttributor. The
rest of the logic is copied from AMDGPULowerKernelArguments which now
only check whether an arguments is marked inreg to avoid replacing
direct uses of preloaded arguments. This change requires test updates to
remove inreg from lit tests with kernels that don't actually want
preloading.


Patch is 74.77 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130434.diff

19 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+15)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (-21)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp (+2-254)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1)
  • (added) llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp (+358)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+8)
  • (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.raw.ptr.buffer.atomic.fadd-with-ret.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-memcpy.ll (+6-27)
  • (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+4)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-implicit-kernargs-IR-lowering.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-implicit-kernargs-debug-info.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-kernargs-IR-lowering.ll (+13-31)
  • (removed) llvm/test/CodeGen/AMDGPU/preload-kernargs-inreg-hints.ll (-263)
  • (modified) llvm/test/CodeGen/AMDGPU/preload-kernargs.ll (+3-7)
  • (modified) llvm/test/CodeGen/AMDGPU/wwm-reserved.ll (+4-4)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 57297288eecb4..4c26f148310e2 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -65,6 +65,7 @@ ModulePass *createAMDGPULowerBufferFatPointersPass();
 FunctionPass *createSIModeRegisterPass();
 FunctionPass *createGCNPreRAOptimizationsLegacyPass();
 FunctionPass *createAMDGPUPreloadKernArgPrologLegacyPass();
+ModulePass *createAMDGPUPreloadKernelArgumentsLegacyPass(const TargetMachine *);
 
 struct AMDGPUSimplifyLibCallsPass : PassInfoMixin<AMDGPUSimplifyLibCallsPass> {
   AMDGPUSimplifyLibCallsPass() {}
@@ -234,6 +235,9 @@ extern char &GCNRegPressurePrinterID;
 void initializeAMDGPUPreloadKernArgPrologLegacyPass(PassRegistry &);
 extern char &AMDGPUPreloadKernArgPrologLegacyID;
 
+void initializeAMDGPUPreloadKernelArgumentsLegacyPass(PassRegistry &);
+extern char &AMDGPUPreloadKernelArgumentsLegacyID;
+
 // Passes common to R600 and SI
 FunctionPass *createAMDGPUPromoteAlloca();
 void initializeAMDGPUPromoteAllocaPass(PassRegistry&);
@@ -345,6 +349,17 @@ class AMDGPUAttributorPass : public PassInfoMixin<AMDGPUAttributorPass> {
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
 };
 
+class AMDGPUPreloadKernelArgumentsPass
+    : public PassInfoMixin<AMDGPUPreloadKernelArgumentsPass> {
+  const AMDGPUTargetMachine &TM;
+
+public:
+  explicit AMDGPUPreloadKernelArgumentsPass(const AMDGPUTargetMachine &TM)
+      : TM(TM) {}
+
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
 class AMDGPUAnnotateUniformValuesPass
     : public PassInfoMixin<AMDGPUAnnotateUniformValuesPass> {
 public:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index cfff66fa07f98..bbfa88a3fe872 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -28,10 +28,6 @@ void initializeCycleInfoWrapperPassPass(PassRegistry &);
 
 using namespace llvm;
 
-static cl::opt<unsigned> KernargPreloadCount(
-    "amdgpu-kernarg-preload-count",
-    cl::desc("How many kernel arguments to preload onto SGPRs"), cl::init(0));
-
 static cl::opt<unsigned> IndirectCallSpecializationThreshold(
     "amdgpu-indirect-call-specialization-threshold",
     cl::desc(
@@ -1319,21 +1315,6 @@ struct AAAMDGPUNoAGPR
 
 const char AAAMDGPUNoAGPR::ID = 0;
 
-static void addPreloadKernArgHint(Function &F, TargetMachine &TM) {
-  const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
-  for (unsigned I = 0;
-       I < F.arg_size() &&
-       I < std::min(KernargPreloadCount.getValue(), ST.getMaxNumUserSGPRs());
-       ++I) {
-    Argument &Arg = *F.getArg(I);
-    // Check for incompatible attributes.
-    if (Arg.hasByRefAttr() || Arg.hasNestAttr())
-      break;
-
-    Arg.addAttr(Attribute::InReg);
-  }
-}
-
 static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
                     AMDGPUAttributorOptions Options) {
   SetVector<Function *> Functions;
@@ -1383,8 +1364,6 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
     if (!AMDGPU::isEntryFunctionCC(CC)) {
       A.getOrCreateAAFor<AAAMDFlatWorkGroupSize>(IRPosition::function(*F));
       A.getOrCreateAAFor<AAAMDWavesPerEU>(IRPosition::function(*F));
-    } else if (CC == CallingConv::AMDGPU_KERNEL) {
-      addPreloadKernArgHint(*F, TM);
     }
 
     for (auto &I : instructions(F)) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
index 09412d1b0f1cc..d2dd6869f1070 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
@@ -27,231 +27,6 @@ using namespace llvm;
 
 namespace {
 
-class PreloadKernelArgInfo {
-private:
-  Function &F;
-  const GCNSubtarget &ST;
-  unsigned NumFreeUserSGPRs;
-
-  enum HiddenArg : unsigned {
-    HIDDEN_BLOCK_COUNT_X,
-    HIDDEN_BLOCK_COUNT_Y,
-    HIDDEN_BLOCK_COUNT_Z,
-    HIDDEN_GROUP_SIZE_X,
-    HIDDEN_GROUP_SIZE_Y,
-    HIDDEN_GROUP_SIZE_Z,
-    HIDDEN_REMAINDER_X,
-    HIDDEN_REMAINDER_Y,
-    HIDDEN_REMAINDER_Z,
-    END_HIDDEN_ARGS
-  };
-
-  // Stores information about a specific hidden argument.
-  struct HiddenArgInfo {
-    // Offset in bytes from the location in the kernearg segment pointed to by
-    // the implicitarg pointer.
-    uint8_t Offset;
-    // The size of the hidden argument in bytes.
-    uint8_t Size;
-    // The name of the hidden argument in the kernel signature.
-    const char *Name;
-  };
-
-  static constexpr HiddenArgInfo HiddenArgs[END_HIDDEN_ARGS] = {
-      {0, 4, "_hidden_block_count_x"}, {4, 4, "_hidden_block_count_y"},
-      {8, 4, "_hidden_block_count_z"}, {12, 2, "_hidden_group_size_x"},
-      {14, 2, "_hidden_group_size_y"}, {16, 2, "_hidden_group_size_z"},
-      {18, 2, "_hidden_remainder_x"},  {20, 2, "_hidden_remainder_y"},
-      {22, 2, "_hidden_remainder_z"}};
-
-  static HiddenArg getHiddenArgFromOffset(unsigned Offset) {
-    for (unsigned I = 0; I < END_HIDDEN_ARGS; ++I)
-      if (HiddenArgs[I].Offset == Offset)
-        return static_cast<HiddenArg>(I);
-
-    return END_HIDDEN_ARGS;
-  }
-
-  static Type *getHiddenArgType(LLVMContext &Ctx, HiddenArg HA) {
-    if (HA < END_HIDDEN_ARGS)
-      return Type::getIntNTy(Ctx, HiddenArgs[HA].Size * 8);
-
-    llvm_unreachable("Unexpected hidden argument.");
-  }
-
-  static const char *getHiddenArgName(HiddenArg HA) {
-    if (HA < END_HIDDEN_ARGS) {
-      return HiddenArgs[HA].Name;
-    }
-    llvm_unreachable("Unexpected hidden argument.");
-  }
-
-  // Clones the function after adding implicit arguments to the argument list
-  // and returns the new updated function. Preloaded implicit arguments are
-  // added up to and including the last one that will be preloaded, indicated by
-  // LastPreloadIndex. Currently preloading is only performed on the totality of
-  // sequential data from the kernarg segment including implicit (hidden)
-  // arguments. This means that all arguments up to the last preloaded argument
-  // will also be preloaded even if that data is unused.
-  Function *cloneFunctionWithPreloadImplicitArgs(unsigned LastPreloadIndex) {
-    FunctionType *FT = F.getFunctionType();
-    LLVMContext &Ctx = F.getParent()->getContext();
-    SmallVector<Type *, 16> FTypes(FT->param_begin(), FT->param_end());
-    for (unsigned I = 0; I <= LastPreloadIndex; ++I)
-      FTypes.push_back(getHiddenArgType(Ctx, HiddenArg(I)));
-
-    FunctionType *NFT =
-        FunctionType::get(FT->getReturnType(), FTypes, FT->isVarArg());
-    Function *NF =
-        Function::Create(NFT, F.getLinkage(), F.getAddressSpace(), F.getName());
-
-    NF->copyAttributesFrom(&F);
-    NF->copyMetadata(&F, 0);
-    NF->setIsNewDbgInfoFormat(F.IsNewDbgInfoFormat);
-
-    F.getParent()->getFunctionList().insert(F.getIterator(), NF);
-    NF->takeName(&F);
-    NF->splice(NF->begin(), &F);
-
-    Function::arg_iterator NFArg = NF->arg_begin();
-    for (Argument &Arg : F.args()) {
-      Arg.replaceAllUsesWith(&*NFArg);
-      NFArg->takeName(&Arg);
-      ++NFArg;
-    }
-
-    AttrBuilder AB(Ctx);
-    AB.addAttribute(Attribute::InReg);
-    AB.addAttribute("amdgpu-hidden-argument");
-    AttributeList AL = NF->getAttributes();
-    for (unsigned I = 0; I <= LastPreloadIndex; ++I) {
-      AL = AL.addParamAttributes(Ctx, NFArg->getArgNo(), AB);
-      NFArg++->setName(getHiddenArgName(HiddenArg(I)));
-    }
-
-    NF->setAttributes(AL);
-    F.replaceAllUsesWith(NF);
-    F.setCallingConv(CallingConv::C);
-    F.clearMetadata();
-
-    return NF;
-  }
-
-public:
-  PreloadKernelArgInfo(Function &F, const GCNSubtarget &ST) : F(F), ST(ST) {
-    setInitialFreeUserSGPRsCount();
-  }
-
-  // Returns the maximum number of user SGPRs that we have available to preload
-  // arguments.
-  void setInitialFreeUserSGPRsCount() {
-    GCNUserSGPRUsageInfo UserSGPRInfo(F, ST);
-    NumFreeUserSGPRs = UserSGPRInfo.getNumFreeUserSGPRs();
-  }
-
-  bool tryAllocPreloadSGPRs(unsigned AllocSize, uint64_t ArgOffset,
-                            uint64_t LastExplicitArgOffset) {
-    //  Check if this argument may be loaded into the same register as the
-    //  previous argument.
-    if (ArgOffset - LastExplicitArgOffset < 4 &&
-        !isAligned(Align(4), ArgOffset))
-      return true;
-
-    // Pad SGPRs for kernarg alignment.
-    ArgOffset = alignDown(ArgOffset, 4);
-    unsigned Padding = ArgOffset - LastExplicitArgOffset;
-    unsigned PaddingSGPRs = alignTo(Padding, 4) / 4;
-    unsigned NumPreloadSGPRs = alignTo(AllocSize, 4) / 4;
-    if (NumPreloadSGPRs + PaddingSGPRs > NumFreeUserSGPRs)
-      return false;
-
-    NumFreeUserSGPRs -= (NumPreloadSGPRs + PaddingSGPRs);
-    return true;
-  }
-
-  // Try to allocate SGPRs to preload implicit kernel arguments.
-  void tryAllocImplicitArgPreloadSGPRs(uint64_t ImplicitArgsBaseOffset,
-                                       uint64_t LastExplicitArgOffset,
-                                       IRBuilder<> &Builder) {
-    Function *ImplicitArgPtr = Intrinsic::getDeclarationIfExists(
-        F.getParent(), Intrinsic::amdgcn_implicitarg_ptr);
-    if (!ImplicitArgPtr)
-      return;
-
-    const DataLayout &DL = F.getParent()->getDataLayout();
-    // Pair is the load and the load offset.
-    SmallVector<std::pair<LoadInst *, unsigned>, 4> ImplicitArgLoads;
-    for (auto *U : ImplicitArgPtr->users()) {
-      Instruction *CI = dyn_cast<Instruction>(U);
-      if (!CI || CI->getParent()->getParent() != &F)
-        continue;
-
-      for (auto *U : CI->users()) {
-        int64_t Offset = 0;
-        auto *Load = dyn_cast<LoadInst>(U); // Load from ImplicitArgPtr?
-        if (!Load) {
-          if (GetPointerBaseWithConstantOffset(U, Offset, DL) != CI)
-            continue;
-
-          Load = dyn_cast<LoadInst>(*U->user_begin()); // Load from GEP?
-        }
-
-        if (!Load || !Load->isSimple())
-          continue;
-
-        // FIXME: Expand to handle 64-bit implicit args and large merged loads.
-        LLVMContext &Ctx = F.getParent()->getContext();
-        Type *LoadTy = Load->getType();
-        HiddenArg HA = getHiddenArgFromOffset(Offset);
-        if (HA == END_HIDDEN_ARGS || LoadTy != getHiddenArgType(Ctx, HA))
-          continue;
-
-        ImplicitArgLoads.push_back(std::make_pair(Load, Offset));
-      }
-    }
-
-    if (ImplicitArgLoads.empty())
-      return;
-
-    // Allocate loads in order of offset. We need to be sure that the implicit
-    // argument can actually be preloaded.
-    std::sort(ImplicitArgLoads.begin(), ImplicitArgLoads.end(), less_second());
-
-    // If we fail to preload any implicit argument we know we don't have SGPRs
-    // to preload any subsequent ones with larger offsets. Find the first
-    // argument that we cannot preload.
-    auto *PreloadEnd = std::find_if(
-        ImplicitArgLoads.begin(), ImplicitArgLoads.end(),
-        [&](const std::pair<LoadInst *, unsigned> &Load) {
-          unsigned LoadSize = DL.getTypeStoreSize(Load.first->getType());
-          unsigned LoadOffset = Load.second;
-          if (!tryAllocPreloadSGPRs(LoadSize,
-                                    LoadOffset + ImplicitArgsBaseOffset,
-                                    LastExplicitArgOffset))
-            return true;
-
-          LastExplicitArgOffset =
-              ImplicitArgsBaseOffset + LoadOffset + LoadSize;
-          return false;
-        });
-
-    if (PreloadEnd == ImplicitArgLoads.begin())
-      return;
-
-    unsigned LastHiddenArgIndex = getHiddenArgFromOffset(PreloadEnd[-1].second);
-    Function *NF = cloneFunctionWithPreloadImplicitArgs(LastHiddenArgIndex);
-    assert(NF);
-    for (const auto *I = ImplicitArgLoads.begin(); I != PreloadEnd; ++I) {
-      LoadInst *LoadInst = I->first;
-      unsigned LoadOffset = I->second;
-      unsigned HiddenArgIndex = getHiddenArgFromOffset(LoadOffset);
-      unsigned Index = NF->arg_size() - LastHiddenArgIndex + HiddenArgIndex - 1;
-      Argument *Arg = NF->getArg(Index);
-      LoadInst->replaceAllUsesWith(Arg);
-    }
-  }
-};
-
 class AMDGPULowerKernelArguments : public FunctionPass {
 public:
   static char ID;
@@ -311,10 +86,6 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
       Attribute::getWithDereferenceableBytes(Ctx, TotalKernArgSize));
 
   uint64_t ExplicitArgOffset = 0;
-  // Preloaded kernel arguments must be sequential.
-  bool InPreloadSequence = true;
-  PreloadKernelArgInfo PreloadInfo(F, ST);
-
   for (Argument &Arg : F.args()) {
     const bool IsByRef = Arg.hasByRefAttr();
     Type *ArgTy = IsByRef ? Arg.getParamByRefType() : Arg.getType();
@@ -325,25 +96,10 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
     uint64_t AllocSize = DL.getTypeAllocSize(ArgTy);
 
     uint64_t EltOffset = alignTo(ExplicitArgOffset, ABITypeAlign) + BaseOffset;
-    uint64_t LastExplicitArgOffset = ExplicitArgOffset;
     ExplicitArgOffset = alignTo(ExplicitArgOffset, ABITypeAlign) + AllocSize;
 
-    // Guard against the situation where hidden arguments have already been
-    // lowered and added to the kernel function signiture, i.e. in a situation
-    // where this pass has run twice.
-    if (Arg.hasAttribute("amdgpu-hidden-argument"))
-      break;
-
-    // Try to preload this argument into user SGPRs.
-    if (Arg.hasInRegAttr() && InPreloadSequence && ST.hasKernargPreload() &&
-        !Arg.getType()->isAggregateType())
-      if (PreloadInfo.tryAllocPreloadSGPRs(AllocSize, EltOffset,
-                                           LastExplicitArgOffset))
-        continue;
-
-    InPreloadSequence = false;
-
-    if (Arg.use_empty())
+    // Skip inreg arguments which should be preloaded.
+    if (Arg.use_empty() || Arg.hasInRegAttr())
       continue;
 
     // If this is byval, the loads are already explicit in the function. We just
@@ -483,14 +239,6 @@ static bool lowerKernelArguments(Function &F, const TargetMachine &TM) {
   KernArgSegment->addRetAttr(
       Attribute::getWithAlignment(Ctx, std::max(KernArgBaseAlign, MaxAlign)));
 
-  if (InPreloadSequence) {
-    uint64_t ImplicitArgsBaseOffset =
-        alignTo(ExplicitArgOffset, ST.getAlignmentForImplicitArgPtr()) +
-        BaseOffset;
-    PreloadInfo.tryAllocImplicitArgPreloadSGPRs(ImplicitArgsBaseOffset,
-                                                ExplicitArgOffset, Builder);
-  }
-
   return true;
 }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 1050855176c04..ca9a4877907c8 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -29,6 +29,7 @@ MODULE_PASS("amdgpu-printf-runtime-binding", AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-remove-incompatible-functions", AMDGPURemoveIncompatibleFunctionsPass(*this))
 MODULE_PASS("amdgpu-sw-lower-lds", AMDGPUSwLowerLDSPass(*this))
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
+MODULE_PASS("amdgpu-preload-kernel-arguments", AMDGPUPreloadKernelArgumentsPass(*this))
 #undef MODULE_PASS
 
 #ifndef MODULE_PASS_WITH_PARAMS
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp
new file mode 100644
index 0000000000000..2b2f649836ebe
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernelArguments.cpp
@@ -0,0 +1,358 @@
+//===- AMDGPUPreloadKernelArguments.cpp - Preload Kernel Arguments --------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file This pass preloads kernel arguments into user_data SGPRs before kernel
+/// execution begins. The number of registers available for preloading depends
+/// on the number of free user SGPRs, up to the hardware's maximum limit.
+/// Implicit arguments enabled in the kernel descriptor are allocated first,
+/// followed by SGPRs used for preloaded kernel arguments. (Reference:
+/// https://llvm.org/docs/AMDGPUUsage.html#initial-kernel-execution-state)
+/// Additionally, hidden kernel arguments may be preloaded, in which case they
+/// are appended to the kernel signature after explicit arguments. Preloaded
+/// arguments will be marked with `inreg`.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "llvm/Analysis/ValueTracking.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/Verifier.h"
+#include "llvm/Pass.h"
+
+#define DEBUG_TYPE "amdgpu-preload-kernel-arguments"
+
+using namespace llvm;
+
+static cl::opt<unsigned> KernargPreloadCount(
+    "amdgpu-kernarg-preload-count",
+    cl::desc("How many kernel arguments to preload onto SGPRs"), cl::init(0));
+
+namespace {
+
+class AMDGPUPreloadKernelArgumentsLegacy : public ModulePass {
+  const AMDGPUTargetMachine *TM;
+
+public:
+  static char ID;
+  explicit AMDGPUPreloadKernelArgumentsLegacy(
+      const AMDGPUTargetMachine *TM = nullptr);
+
+  StringRef getPassName() const override {
+    return "AMDGPU Preload Kernel Arguments";
+  }
+
+  bool runOnModule(Module &M) override;
+};
+
+class PreloadKernelArgInfo {
+private:
+  Function &F;
+  const GCNSubtarget &ST;
+  unsigned NumFreeUserSGPRs;
+
+  enum HiddenArg : unsigned {
+    HIDDEN_BLOCK_COUNT_X,
+    HIDDEN_BLOCK_COUNT_Y,
+    HIDDEN_BLOCK_COUNT_Z,
+    HIDDEN_GROUP_SIZE_X,
+    HIDDEN_GROUP_SIZE_Y,
+    HIDDEN_GROUP_SIZE_Z,
+    HIDDEN_REMAINDER_X,
+    HIDDEN_REMAINDER_Y,
+    HIDDEN_REMAINDER_Z,
+    END_HIDDEN_ARGS
+  };
+
+  // Stores information about a specific hidden argument.
+  struct HiddenArgInfo {
+    // Offset in bytes from the location in the kernearg segment pointed to by
+    // the implicitarg pointer.
+    uint8_t Offset;
+    // The size of the hidden argument in bytes.
+    uint8_t Size;
+    // The name of the hidden argument in the kernel signature.
+    const char *Name;
+  };
+
+  static constexpr HiddenArgInfo HiddenArgs[END_HIDDEN_ARGS] = {
+      {0, 4, "_hidden_block_count_x"}, {4, 4, "_hidden_block_count_y"},
+      {8, 4, "_hidden_block_count_z"}, {12, 2, "_hidden_group_size_x"},
+      {14, 2, "_hidden_group_size_y"}, {16, 2, "_hidden_group_size_z"},
+      {18, 2, "_hidden_remainder_x"},  {20, 2, "_hidden_remainder_y"},
+      {22, 2, "_hidden_remainder_z"}};
+
+  static HiddenArg getHiddenArgFromOffset(unsigned Offset) {
+    for (unsigned I = 0; I < END_HIDDEN_ARGS; ++I)
+      if (HiddenArgs[I].Offset == Offset)
+        return static_cast<HiddenArg>(I);
+
+    return END_HIDDEN_ARGS;
+  }
+
+  static Type *getHiddenArgType(LLVMContext &Ctx, HiddenArg HA) {
+    if (HA < END_HIDDEN_ARGS)
+      return Type::getIntNTy(Ctx, HiddenArgs[HA].Size * 8);
+
+    llvm_unreachable("Unexpected hidden argument.");
+  }
+
+  static const char *getHiddenArgName(HiddenArg HA) {
+    if (HA < END_HIDDEN_ARGS) {
+      return HiddenArgs[HA].Name;
+    }
+    llvm_unreachable("Unexpected hidden argument.");
+  }
+
+  // Clones the function after adding implicit arguments to the argument list
+  // and returns the new updated function. Preloaded implicit arguments are
+  // added up to and including the last one that will be preloaded, indicated by
+  // LastPreloadIndex. Currently preloading is only performed on the totality of
+  // sequential data from the kernarg segment including implicit (hidden)
+  // arguments. This means that all arguments up to the last preloaded argument
+  // will also be preloaded even if that data is unused.
+  Function *cloneFunctionWithPreloadImplicitArgs(unsigned LastPreloadIndex) {
+    FunctionType *FT = F.getFunctionType();
+    LLVMContext &Ctx = F.getParent()->getContext();
+    SmallVector<Type *, 16> FTypes(FT->param_begin(), FT->param_end());
+    for (unsigned I = 0; I <= LastPreloadIndex; ++I)
+      FTypes.push_back(getHiddenArgType(Ctx, HiddenArg(I)));
+
+    FunctionType *NFT =
+        FunctionType::get(FT->getReturnType(), FTypes, FT->isVarArg());
+    Function *NF =
+        Functio...
[truncated]

@kerbowa kerbowa force-pushed the users/kerbowa/preload-kernarg-pass branch from 805e24c to a23a8e4 Compare March 10, 2025 04:33
@kerbowa kerbowa requested a review from krzysz00 April 7, 2025 15:23
@kerbowa
Copy link
Member Author

kerbowa commented Apr 7, 2025

If anyone has opinions about the best approach for this please let me know because we need to do something. There are three PRs up that are achieving the same end, but I don't want to just rebase/update all three. At different points I've seen interest in each approach based on discussion in the PRs.

  1. [AMDGPU] Move kernarg preload logic to separate pass #130434
  2. [AMDGPU] Move kernarg preload logic to AMDGPU Attributor #123547
  3. [AMDGPU] Make AMDGPULowerKernelArguments a module pass #112790

@arsenm @shiltian @krzysz00

@shiltian
Copy link
Contributor

shiltian commented Apr 7, 2025

I think this is the best way to handle it. Moving it to the attributor isn't acceptable to me, at least based on what we have right now. I'd need to clearly see what benefits we get from running it iteratively, and how it would actually work if we were to make it part of the attributor.

@kerbowa kerbowa force-pushed the users/kerbowa/preload-kernarg-pass branch from a23a8e4 to 2e0ce8a Compare May 1, 2025 21:59
@kerbowa kerbowa requested a review from shiltian May 2, 2025 03:16
Comment on lines 110 to 112
if (HA < END_HIDDEN_ARGS) {
return HiddenArgs[HA].Name;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (HA < END_HIDDEN_ARGS) {
return HiddenArgs[HA].Name;
}
if (HA < END_HIDDEN_ARGS)
return HiddenArgs[HA].Name;

@@ -1317,6 +1318,10 @@ void AMDGPUPassConfig::addIRPasses() {
}

void AMDGPUPassConfig::addCodeGenPrepare() {
if (TM->getTargetTriple().getArch() == Triple::amdgcn &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we need the TargetTriple check here. If the execution reaches here, the compilation is already for amdgcn triple and not for r600. Anyways, it is unrelated to this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized that it is the common base class for both amdgcn and r600.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have isAMDGCN() so better to use that.

@cdevadas
Copy link
Collaborator

cdevadas commented May 8, 2025

See if @shiltian has any concerns before merging it.

Comment on lines +36 to +38
static cl::opt<unsigned> KernargPreloadCount(
"amdgpu-kernarg-preload-count",
cl::desc("How many kernel arguments to preload onto SGPRs"), cl::init(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can make this a pass parameter in a follow up

namespace {

class AMDGPUPreloadKernelArgumentsLegacy : public ModulePass {
const AMDGPUTargetMachine *TM;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const AMDGPUTargetMachine *TM;
const GCNTargetMachine *TM;

public:
static char ID;
explicit AMDGPUPreloadKernelArgumentsLegacy(
const AMDGPUTargetMachine *TM = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const AMDGPUTargetMachine *TM = nullptr);
const GCNTargetMachine *TM = nullptr);

Comment on lines +1173 to +1174
; GCN-O3-NEXT: AMDGPU Preload Kernel Arguments
; GCN-O3-NEXT: FunctionPass Manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved up with the other module passes, or is there some benefit to running the other function passes first

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@@ -523,6 +525,8 @@
; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics
; GCN-O1-OPTS-NEXT: Expand reduction intrinsics
; GCN-O1-OPTS-NEXT: Early CSE
; GCN-O1-OPTS-NEXT: AMDGPU Preload Kernel Arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick side question: does this pass need to run before the main CodeGenPrepare?

If not ... I have a plan to move AMDGPULowerBufferFatPointers after LoadStoreVectorizer, and, given the desire for buffer pointer alignment reform, might want to move kernel argument lowering after that so we can see function arguments as themselves.

Not exactly relevant or blocking, just thought I'd check on this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to benefit from maximally canonical IR and run a bit earlier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pass mostly wants to run before function attributes are finalized that may influence the number of available user SGPRs. It also needs to be before LowerKernelArguments. Beyond that there may be some small benefit to lowering the hidden arguments earlier, but only if it doesn't run into the first issue.

kerbowa added 3 commits May 11, 2025 17:16
Moves kernarg preload logic to its own module pass. Cloned function
declarations are removed when preloading hidden arguments. The inreg
attribute is now added in this pass instead of AMDGPUAttributor. The
rest of the logic is copied from AMDGPULowerKernelArguments which now
only check whether an arguments is marked inreg to avoid replacing
direct uses of preloaded arguments. This change requires test updates to
remove inreg from lit tests with kernels that don't actually want
preloading.
@kerbowa kerbowa force-pushed the users/kerbowa/preload-kernarg-pass branch from e1e85fa to bc78db2 Compare May 12, 2025 03:08
@kerbowa kerbowa merged commit 2c9a46c into main May 12, 2025
11 checks passed
@kerbowa kerbowa deleted the users/kerbowa/preload-kernarg-pass branch May 12, 2025 04:18
@llvm-ci
Copy link
Collaborator

llvm-ci commented May 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-android running on sanitizer-buildbot-android while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/8942

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[       OK ] AddressSanitizer.AtoiAndFriendsOOBTest (2290 ms)
[ RUN      ] AddressSanitizer.HasFeatureAddressSanitizerTest
[       OK ] AddressSanitizer.HasFeatureAddressSanitizerTest (0 ms)
[ RUN      ] AddressSanitizer.CallocReturnsZeroMem
[       OK ] AddressSanitizer.CallocReturnsZeroMem (13 ms)
[ DISABLED ] AddressSanitizer.DISABLED_TSDTest
[ RUN      ] AddressSanitizer.IgnoreTest
[       OK ] AddressSanitizer.IgnoreTest (0 ms)
[ RUN      ] AddressSanitizer.SignalTest
[       OK ] AddressSanitizer.SignalTest (183 ms)
[ RUN      ] AddressSanitizer.ReallocTest
[       OK ] AddressSanitizer.ReallocTest (50 ms)
[ RUN      ] AddressSanitizer.WrongFreeTest
[       OK ] AddressSanitizer.WrongFreeTest (115 ms)
[ RUN      ] AddressSanitizer.LongJmpTest
[       OK ] AddressSanitizer.LongJmpTest (0 ms)
[ RUN      ] AddressSanitizer.ThreadStackReuseTest
[       OK ] AddressSanitizer.ThreadStackReuseTest (8 ms)
[ DISABLED ] AddressSanitizer.DISABLED_MemIntrinsicUnalignedAccessTest
[ DISABLED ] AddressSanitizer.DISABLED_LargeFunctionSymbolizeTest
[ DISABLED ] AddressSanitizer.DISABLED_MallocFreeUnwindAndSymbolizeTest
[ RUN      ] AddressSanitizer.UseThenFreeThenUseTest
[       OK ] AddressSanitizer.UseThenFreeThenUseTest (114 ms)
[ RUN      ] AddressSanitizer.FileNameInGlobalReportTest
[       OK ] AddressSanitizer.FileNameInGlobalReportTest (118 ms)
[ DISABLED ] AddressSanitizer.DISABLED_StressStackReuseAndExceptionsTest
[ RUN      ] AddressSanitizer.MlockTest
[       OK ] AddressSanitizer.MlockTest (0 ms)
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadedTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowIn
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowLeft
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowRight
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFHigh
[ DISABLED ] AddressSanitizer.DISABLED_DemoOOM
[ DISABLED ] AddressSanitizer.DISABLED_DemoDoubleFreeTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoNullDerefTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoFunctionStaticTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoTooMuchMemoryTest
[ RUN      ] AddressSanitizer.LongDoubleNegativeTest
[       OK ] AddressSanitizer.LongDoubleNegativeTest (0 ms)
[----------] 19 tests from AddressSanitizer (28019 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 2 test suites ran. (28033 ms total)
[  PASSED  ] 22 tests.

  YOU HAVE 1 DISABLED TEST

Step 34 (run instrumented asan tests [aarch64/bluejay-userdebug/TQ3A.230805.001]) failure: run instrumented asan tests [aarch64/bluejay-userdebug/TQ3A.230805.001] (failure)
...
[ RUN      ] AddressSanitizer.HasFeatureAddressSanitizerTest
[       OK ] AddressSanitizer.HasFeatureAddressSanitizerTest (0 ms)
[ RUN      ] AddressSanitizer.CallocReturnsZeroMem
[       OK ] AddressSanitizer.CallocReturnsZeroMem (13 ms)
[ DISABLED ] AddressSanitizer.DISABLED_TSDTest
[ RUN      ] AddressSanitizer.IgnoreTest
[       OK ] AddressSanitizer.IgnoreTest (0 ms)
[ RUN      ] AddressSanitizer.SignalTest
[       OK ] AddressSanitizer.SignalTest (183 ms)
[ RUN      ] AddressSanitizer.ReallocTest
[       OK ] AddressSanitizer.ReallocTest (50 ms)
[ RUN      ] AddressSanitizer.WrongFreeTest
[       OK ] AddressSanitizer.WrongFreeTest (115 ms)
[ RUN      ] AddressSanitizer.LongJmpTest
[       OK ] AddressSanitizer.LongJmpTest (0 ms)
[ RUN      ] AddressSanitizer.ThreadStackReuseTest
[       OK ] AddressSanitizer.ThreadStackReuseTest (8 ms)
[ DISABLED ] AddressSanitizer.DISABLED_MemIntrinsicUnalignedAccessTest
[ DISABLED ] AddressSanitizer.DISABLED_LargeFunctionSymbolizeTest
[ DISABLED ] AddressSanitizer.DISABLED_MallocFreeUnwindAndSymbolizeTest
[ RUN      ] AddressSanitizer.UseThenFreeThenUseTest
[       OK ] AddressSanitizer.UseThenFreeThenUseTest (114 ms)
[ RUN      ] AddressSanitizer.FileNameInGlobalReportTest
[       OK ] AddressSanitizer.FileNameInGlobalReportTest (118 ms)
[ DISABLED ] AddressSanitizer.DISABLED_StressStackReuseAndExceptionsTest
[ RUN      ] AddressSanitizer.MlockTest
[       OK ] AddressSanitizer.MlockTest (0 ms)
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadedTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoThreadStackTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowIn
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowLeft
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFLowRight
[ DISABLED ] AddressSanitizer.DISABLED_DemoUAFHigh
[ DISABLED ] AddressSanitizer.DISABLED_DemoOOM
[ DISABLED ] AddressSanitizer.DISABLED_DemoDoubleFreeTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoNullDerefTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoFunctionStaticTest
[ DISABLED ] AddressSanitizer.DISABLED_DemoTooMuchMemoryTest
[ RUN      ] AddressSanitizer.LongDoubleNegativeTest
[       OK ] AddressSanitizer.LongDoubleNegativeTest (0 ms)
[----------] 19 tests from AddressSanitizer (28019 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 2 test suites ran. (28033 ms total)
[  PASSED  ] 22 tests.

  YOU HAVE 1 DISABLED TEST
program finished with exit code 0
elapsedTime=2442.027961

@llvm-ci
Copy link
Collaborator

llvm-ci commented May 12, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu-no-asserts running on doug-worker-6 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/202/builds/1187

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'AddressSanitizer-x86_64-linux :: TestCases/asan_lsan_deadlock.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/buildbot-root/gcc-no-asserts/build/./bin/clang  --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only  -m64  -O0 /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp -o /home/buildbot/buildbot-root/gcc-no-asserts/build/runtimes/runtimes-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/asan_lsan_deadlock.cpp.tmp # RUN: at line 4
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp -o /home/buildbot/buildbot-root/gcc-no-asserts/build/runtimes/runtimes-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/asan_lsan_deadlock.cpp.tmp
env ASAN_OPTIONS=detect_leaks=1 not  /home/buildbot/buildbot-root/gcc-no-asserts/build/runtimes/runtimes-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/asan_lsan_deadlock.cpp.tmp 2>&1 | FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp # RUN: at line 5
+ env ASAN_OPTIONS=detect_leaks=1 not /home/buildbot/buildbot-root/gcc-no-asserts/build/runtimes/runtimes-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/asan_lsan_deadlock.cpp.tmp
+ FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp
�[1m/home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp:58:12: �[0m�[0;1;31merror: �[0m�[1mCHECK: expected string not found in input
�[0m // CHECK: SUMMARY: AddressSanitizer: stack-buffer-overflow
�[0;1;32m           ^
�[0m�[1m<stdin>:1:1: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m=================================================================
�[0;1;32m^
�[0m�[1m<stdin>:2:10: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m==1654962==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b36cbc00034 at pc 0x5f4326b64fd0 bp 0x7b36c9dfdce0 sp 0x7b36c9dfdcd8
�[0;1;32m         ^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m================================================================= �[0m
�[0;1;31mcheck:58'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
�[0m�[0;1;30m            2: �[0m�[1m�[0;1;46m==1654962==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b36cbc00034 at pc 0x5f4326b64fd0 bp 0x7b36c9dfdce0 sp 0x7b36c9dfdcd8 �[0m
�[0;1;31mcheck:58'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;35mcheck:58'1              ?                                                                                                                                    possible intended match
�[0m�[0;1;30m            3: �[0m�[1m�[0;1;46mWRITE of size 4 at 0x7b36cbc00034 thread T2 �[0m
�[0;1;31mcheck:58'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m>>>>>>

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented May 12, 2025

LLVM Buildbot has detected a new failure on builder clang-ppc64-aix running on aix-ppc64 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/3528

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lit :: timeout-hang.py' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 13
not env -u FILECHECK_OPTS "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt  --timeout=1 --param external=0 | "/home/llvm/llvm-external-buildbots/workers/env/bin/python3.11" /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# executed command: not env -u FILECHECK_OPTS /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit.py -j1 --order=lexical Inputs/timeout-hang/run-nonexistent.txt --timeout=1 --param external=0
# .---command stderr------------
# | lit.py: /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 1 seconds was requested on the command line. Forcing timeout to be 1 seconds.
# `-----------------------------
# executed command: /home/llvm/llvm-external-buildbots/workers/env/bin/python3.11 /home/llvm/llvm-external-buildbots/workers/aix-ppc64/clang-ppc64-aix/build/utils/lit/tests/timeout-hang.py 1
# .---command stdout------------
# | Testing took as long or longer than timeout
# `-----------------------------
# error: command failed with exit status: 1

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants