Skip to content

[HIP] change default offload archs #139281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yxsamliu
Copy link
Collaborator

@yxsamliu yxsamliu commented May 9, 2025

Currently, HIP uses gfx906 as the default offload
arch, which only works on systems with gfx906.

For non-interactive uses, this is less of a concern since they all set explicit offload arch's for supported GPU's. The only use of default offload arch is when the clang wrapper hipcc is used as a C++ compiler
during compiler detection of cmake, where the default offload arch is used to compile a C++ program as HIP program and executed. However since there is no
kernel, the default offload arch just works.

However, gfx906 as default offload arch is very
inconvenient for interactive users since in most cases they would like to compile for the GPU on the system.

With this patch, if AMD GPU's are detected on the system, they will be used as default offload arch's for HIP. Otherwise, if amd-llvm-spirv is found and executable, amdgcnspirv will be used as the default offload arch, since it works for all AMD GPU's supporting HIP.
Otherwise, the original default offload arch is used, which is gfx906.

Currently, HIP uses gfx906 as the default offload
arch, which only works on systems with gfx906.

For non-interactive uses, this is less of a concern
since they all set explicit offload archs for supported
GPU's. The only use of default offload arch is when
the clang wrapper hipcc is used as a C++ compiler
during compiler detection of cmake, where the default
offload arch is used to compile a C++ program as HIP
program and executed. However since there is no
kernel, the default offload arch just works.

However, gfx906 as default offload arch is very
inconenient for interactive users since in most cases
they would like to compile for the GPU on the system.

With this patch, if AMD GPU's are detected on the system,
they will be used as default offload archs for HIP.
Otherwise, if amd-llvm-spirv is found and executable,
amdgcnspirv will be used as the default offload arch,
since it works for all AMD GPU's supporting HIP.
Otherwise, the original default offload arch is used,
which is gfx906.
@yxsamliu yxsamliu requested review from arsenm, Artem-B, jhuber6 and Copilot May 9, 2025 15:40
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels May 9, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates HIP’s default offload architecture selection to use the detected AMD GPU’s when available, falling back to spirv-based offloading if needed. Key changes include:

  • Updating the test check for HIP default offload architectures.
  • Adding the new getHIPDefaultOffloadArchs function in ToolChain.cpp to determine the default offload arch.
  • Modifying Driver.cpp and ToolChain.h to utilize the new function.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
clang/test/Driver/hip-default-gpu-arch.hip Updated test CHECK pattern to validate the new triple and target-cpu arguments.
clang/lib/Driver/ToolChain.cpp Introduced getHIPDefaultOffloadArchs to select the default HIP offload architecture.
clang/lib/Driver/Driver.cpp Updated HIP offload arch insertion to rely on the new helper function.
clang/include/clang/Driver/ToolChain.h Added the function prototype and accompanying documentation for HIP default arch.
Comments suppressed due to low confidence (1)

clang/lib/Driver/ToolChain.cpp:1596

  • [nitpick] Consider renaming 'Prog' to a more descriptive identifier like 'spirvCompilerPath' to improve code readability and clarity.
auto Prog = GetProgramPath("amd-llvm-spirv");

@llvmbot
Copy link
Member

llvmbot commented May 9, 2025

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

Currently, HIP uses gfx906 as the default offload
arch, which only works on systems with gfx906.

For non-interactive uses, this is less of a concern since they all set explicit offload arch's for supported GPU's. The only use of default offload arch is when the clang wrapper hipcc is used as a C++ compiler
during compiler detection of cmake, where the default offload arch is used to compile a C++ program as HIP program and executed. However since there is no
kernel, the default offload arch just works.

However, gfx906 as default offload arch is very
inconvenient for interactive users since in most cases they would like to compile for the GPU on the system.

With this patch, if AMD GPU's are detected on the system, they will be used as default offload arch's for HIP. Otherwise, if amd-llvm-spirv is found and executable, amdgcnspirv will be used as the default offload arch, since it works for all AMD GPU's supporting HIP.
Otherwise, the original default offload arch is used, which is gfx906.


Full diff: https://github.com/llvm/llvm-project/pull/139281.diff

4 Files Affected:

  • (modified) clang/include/clang/Driver/ToolChain.h (+4)
  • (modified) clang/lib/Driver/Driver.cpp (+5-1)
  • (modified) clang/lib/Driver/ToolChain.cpp (+26)
  • (modified) clang/test/Driver/hip-default-gpu-arch.hip (+1-1)
diff --git a/clang/include/clang/Driver/ToolChain.h b/clang/include/clang/Driver/ToolChain.h
index 58edf2b3887b0..46771c450ef71 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -780,6 +780,10 @@ class ToolChain {
   virtual Expected<SmallVector<std::string>>
   getSystemGPUArchs(const llvm::opt::ArgList &Args) const;
 
+  /// getHIPDefaultOffloadArchs - Get the default offload arch's for HIP.
+  virtual SmallVector<StringRef>
+  getHIPDefaultOffloadArchs(const llvm::opt::ArgList &Args) const;
+
   /// addProfileRTLibs - When -fprofile-instr-profile is specified, try to pass
   /// a suitable profile runtime library to the linker.
   virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index e844f0d6d5400..d746c8c77a99d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3503,6 +3503,9 @@ class OffloadingActionBuilder final {
             GpuArchList.push_back(OffloadArch::AMDGCNSPIRV);
           else
             GpuArchList.push_back(OffloadArch::Generic);
+        } else if (AssociatedOffloadKind == Action::OFK_HIP) {
+          for (auto A : ToolChains.front()->getHIPDefaultOffloadArchs(Args))
+            GpuArchList.push_back(A.data());
         } else {
           GpuArchList.push_back(DefaultOffloadArch);
         }
@@ -4825,7 +4828,8 @@ Driver::getOffloadArchs(Compilation &C, const llvm::opt::DerivedArgList &Args,
     if (Kind == Action::OFK_Cuda) {
       Archs.insert(OffloadArchToString(OffloadArch::CudaDefault));
     } else if (Kind == Action::OFK_HIP) {
-      Archs.insert(OffloadArchToString(OffloadArch::HIPDefault));
+      for (auto A : TC->getHIPDefaultOffloadArchs(Args))
+        Archs.insert(A);
     } else if (Kind == Action::OFK_SYCL) {
       Archs.insert(StringRef());
     } else if (Kind == Action::OFK_OpenMP) {
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 3c52abb0ab78e..2b6db476118c4 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1574,6 +1574,32 @@ ToolChain::getSystemGPUArchs(const llvm::opt::ArgList &Args) const {
   return SmallVector<std::string>();
 }
 
+SmallVector<StringRef>
+ToolChain::getHIPDefaultOffloadArchs(const llvm::opt::ArgList &Args) const {
+  if (getTriple().isSPIRV()) {
+    if (getTriple().getVendor() == llvm::Triple::AMD)
+      return {OffloadArchToString(OffloadArch::AMDGCNSPIRV)};
+    return {OffloadArchToString(OffloadArch::Generic)};
+  }
+
+  if (!getTriple().isAMDGPU())
+    return {};
+
+  SmallVector<StringRef> GpuArchList;
+  auto GPUsOrErr = getSystemGPUArchs(Args);
+  if (GPUsOrErr) {
+    for (auto &G : *GPUsOrErr)
+      GpuArchList.push_back(Args.MakeArgString(G));
+    return GpuArchList;
+  }
+  llvm::consumeError(GPUsOrErr.takeError());
+  auto Prog = GetProgramPath("amd-llvm-spirv");
+  if (!Prog.empty() && llvm::sys::fs::can_execute(Prog))
+    return {OffloadArchToString(OffloadArch::AMDGCNSPIRV)};
+
+  return {OffloadArchToString(OffloadArch::HIPDefault)};
+}
+
 SanitizerMask ToolChain::getSupportedSanitizers() const {
   // Return sanitizers which don't require runtime support and are not
   // platform dependent.
diff --git a/clang/test/Driver/hip-default-gpu-arch.hip b/clang/test/Driver/hip-default-gpu-arch.hip
index d55a3ea151f9a..b68e738ccc345 100644
--- a/clang/test/Driver/hip-default-gpu-arch.hip
+++ b/clang/test/Driver/hip-default-gpu-arch.hip
@@ -1,3 +1,3 @@
 // RUN: %clang -### -nogpulib -nogpuinc -c %s 2>&1 | FileCheck %s
 
-// CHECK: {{.*}}clang{{.*}}"-target-cpu" "gfx906"
+// CHECK: {{.*}}clang{{.*}}"-triple" "{{amdgcn|spirv64}}-amd-amdhsa"{{.*}} "-target-cpu" "{{amdgcnspirv|gfx.*}}"

@yxsamliu yxsamliu requested review from AlexVlx and b-sumner May 9, 2025 15:42
@Artem-B
Copy link
Member

Artem-B commented May 9, 2025

@jhuber6 do you think can we use native instead? I think it would be a somewhat better option here.
If we have to choose a GPU variant by default, we may as well choose the actual GPU, rather than a conditional choice between generic SPIR-V or an old GPU, which has the disadvantage of not being the right choice for anyone, and depending on another moving part (amd-llvm-spirv)

@jhuber6
Copy link
Contributor

jhuber6 commented May 9, 2025

@jhuber6 do you think can we use native instead? I think it would be a somewhat better option here. If we have to choose a GPU variant by default, we may as well choose the actual GPU, rather than a conditional choice between generic SPIR-V or an old GPU, which has the disadvantage of not being the right choice for anyone, and depending on another moving part (amd-llvm-spirv)

OpenMP already defaults to native if you don't specify anything. Having the actual toolchain variant on commandline arguments makes me a little uncertain. Long term I'd prefer people do something like --offload-targets=spirv64-amd-amdhsa or something and that just defaults to the proper toolchain.

Copy link
Contributor

@cgmb cgmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yxsamliu! To add a bit more context, this was my original email that raised the topic:

I wonder if we should consider revisiting the decision to default clang to gfx906 1. I came across how that decision was made shortly after it was implemented, and at the time I thought it was silly not to default to native. The decision was made to change the default from gfx803 to gfx906 because the former wasn’t supported, but AMD dropped support for the latter shortly thereafter. Using the same criteria as that previous decision, we should be changing the default again to something newer.

However, the default of "some random GPU architecture that changes every few releases" is just plain bad. If a user tries to run a program that was built for gfx906 only because nothing else was specified, the most likely result will be that the program doesn’t work on their machine. I’m not sure what the current behaviour of the HIP Runtime is, but historically, running a program built for a different GPU than your own would result in a crash that might not even include an error message.

My workstations all have gfx906 GPUs, but this is arguably worse! It means that when I work on build scripts, if I make a mistake and fail to specify the target architecture somewhere, the built result will still work on my machine! This hides errors in my build scripts, which then become a problem when I share those scripts with others working on different hardware. Even for my own work, I wouldn’t want to depend on gfx906 being the default target. My scripts would all break when I update clang to some future version.

I would suggest that we should either (a) change the default GPU target to native and make the failure to detect the user’s GPU into a hard compiler error, or (b) change the default GPU target to SPIR-V so that it works on every machine.

@Artem-B
Copy link
Member

Artem-B commented May 9, 2025

@cgmb

I would suggest that we should either (a) change the default GPU target to native and make the failure to detect the user’s GPU into a hard compiler error, or (b) change the default GPU target to SPIR-V so that it works on every machine.

The thing is that the sensible GPU target default does not exist for the GPUs in principle, IMO. We're just arguing about the least terrible variant to choose.

  • any fixed GPU version -- wrong for most users. It kind of forces everyone to explicitly specify what they need. Or they presumably don't care.
  • native -- correct for many users doing local builds, wrong for the production builds that are often done on machines w/o GPU, or with some GPU that's irrelevant for the build.
  • generic one-target-suits-all. Kind of functional, but likely a suboptimal choice for everyone.

I am personally biased towards the first variant, as it's consistently and predictably bad for everyone. Other options tend to create an illusion of "just working", but are prone to creating unwanted surprises.

spir-v would probably be an OK alternative. It's not great (probably leases some performance on the table vs the native build), but the binary will presumably work on other machines.

@cgmb
Copy link
Contributor

cgmb commented May 13, 2025

I’m not sure what the current behaviour of the HIP Runtime is

Here's a quick test with the in-box packages on Ubuntu 24.04 with my gfx1103 laptop:

$ apt-get -y install hipcc
$ cat <<'EOF' > main.hip
#include <stdio.h>
#include <stdlib.h>
#include <hip/hip_runtime.h>

#define CHECK_HIP(expr) do {              \
  hipError_t result = (expr);             \
  if (result != hipSuccess) {             \
    fprintf(stderr, "%s:%d: %s (%d)\n",   \
      __FILE__, __LINE__,                 \
      hipGetErrorString(result), result); \
    exit(EXIT_FAILURE);                   \
  }                                       \
} while(0)

__global__ void sq_arr(float *arr, int n) {
  int tid = blockDim.x*blockIdx.x + threadIdx.x;
  if (tid < n) {
    arr[tid] = arr[tid] * arr[tid];
  }
}

int main() {
  enum { N = 5 };
  float hArr[N] = { 1, 2, 3, 4, 5 };
  float *dArr;
  CHECK_HIP(hipMalloc(&dArr, sizeof(float) * N));
  CHECK_HIP(hipMemcpy(dArr, hArr, sizeof(float) * N, hipMemcpyHostToDevice));
  sq_arr<<<dim3(1), dim3(32,1,1), 0, 0>>>(dArr, N);
  CHECK_HIP(hipMemcpy(hArr, dArr, sizeof(float) * N, hipMemcpyDeviceToHost));
  for (int i = 0; i < N; ++i) {
    printf("%f\n", hArr[i]);
  }
  CHECK_HIP(hipFree(dArr));
  return 0;
}
EOF
$ /usr/bin/clang++-17 --hip-link -x hip main.hip
$ ./a.out 
1.000000
2.000000
3.000000
4.000000
5.000000

It seems that the sample program ran without crashing, but the kernel function was never actually executed. You can catch an "invalid device function" error if you call HIP_CHECK(hipGetLastError()) at the start of main.

  • any fixed GPU version -- wrong for most users. It kind of forces everyone to explicitly specify what they need. Or they presumably don't care.

My concern is that it doesn't force everyone to specify what they need. If there's no reasonable default, I'd be fine with forcing everyone to explicitly specify what they need. A compiler error would make it easier to notice when a file is not getting --offload-arch passed to it.

With that said, I understand that might be troublesome for users that legitimately don't care about the architecture of the artifact produced. Those users are probably mostly compiler developers or build tools checking that the compiler works. Still, having a useful default for that group is the main reason why I endorsed SPIR-V as a default over having no default.

I am personally biased towards the first variant, as it's consistently and predictably bad for everyone. Other options tend to create an illusion of "just working", but are prone to creating unwanted surprises.

That is exactly my argument in favour of SPIR-V. At the moment, when you compile and run a program without specifying your offload architecture, you are unlikely to notice a problem until you encounter surprising behaviour at runtime. SPIR-V is not ideal for anyone, but it's a default that we can set today and never change again.

@jhuber6
Copy link
Contributor

jhuber6 commented May 13, 2025

Honestly an error if unspecified would probably be better than what we do now.

@arsenm
Copy link
Contributor

arsenm commented May 13, 2025

spir-v would probably be an OK alternative. It's not great (probably leases some performance on the table vs the native build), but the binary will presumably work on other machines.

I'd probably go with just error and emit nothing, but I'm sure that will break someone's build somewhere

@arsenm
Copy link
Contributor

arsenm commented May 13, 2025

It seems that the sample program ran without crashing, but the kernel function was never actually executed. You can catch an "invalid device function" error if you call HIP_CHECK(hipGetLastError()) at the start of main.

Lack of proper error checking in the host APIs or host API usage isn't really a problem we solve here. Anything like a system call should be always checked for failure.

@yxsamliu
Copy link
Collaborator Author

The main obstacle of letting clang emit error when --offload-arch is not specified is HIP apps using hipcc as CMAKE_CXX_COMPILER. hipcc adds -xhip by default for .cpp programs. This is a known and long existing issue.

Another option is to have multiple --offload-arch options by default, which covers gfx9 generic, gfx10 generic, gfx11 generic, and gfx12 generic. This should make the program work for most of GPU's.

@AlexVlx
Copy link
Contributor

AlexVlx commented May 13, 2025

The main obstacle of letting clang emit error when --offload-arch is not specified is HIP apps using hipcc as CMAKE_CXX_COMPILER. hipcc adds -xhip by default for .cpp programs. This is a known and long existing issue.

Another option is to have multiple --offload-arch options by default, which covers gfx9 generic, gfx10 generic, gfx11 generic, and gfx12 generic. This should make the program work for most of GPU's.

So perhaps we can fork this linguistically? I.e.:

  • the HIP language should default to SPIRV (which will work everywhere; additionally, I think that some of the concerns in this thread re: performance / capability are misplaced, considering we're not using generic SPIRV, but that's a different kettle of fish);
  • everything else just errors out if --offload-arch / -mcpu are not set.
    Trying to add multiple / all offload-archs is just committing to being on an endless treadmill. Part of why we added SPIRV support is to deal with this.

@jhuber6
Copy link
Contributor

jhuber6 commented May 13, 2025

The main obstacle of letting clang emit error when --offload-arch is not specified is HIP apps using hipcc as CMAKE_CXX_COMPILER. hipcc adds -xhip by default for .cpp programs. This is a known and long existing issue.
Another option is to have multiple --offload-arch options by default, which covers gfx9 generic, gfx10 generic, gfx11 generic, and gfx12 generic. This should make the program work for most of GPU's.

Seems silly to do that when we have the .hip extension, but I guess it's a convenience that we can't really turn off at this stage.

So perhaps we can fork this linguistically? I.e.:

* the HIP language should default to SPIRV (which will work everywhere; additionally, I think that some of the concerns in this thread re: performance / capability are misplaced, considering we're not using generic SPIRV, but that's a different kettle of fish);

* everything else just errors out if --offload-arch / -mcpu are not set.
  Trying to add multiple / all offload-archs is just committing to being on an endless treadmill. Part of why we added SPIRV support is to deal with this.

We still rely on external Khronos tools for SPIR-V right? I don't think we can shift that to the default until it's all LLVM.

@AlexVlx
Copy link
Contributor

AlexVlx commented May 13, 2025

The main obstacle of letting clang emit error when --offload-arch is not specified is HIP apps using hipcc as CMAKE_CXX_COMPILER. hipcc adds -xhip by default for .cpp programs. This is a known and long existing issue.
Another option is to have multiple --offload-arch options by default, which covers gfx9 generic, gfx10 generic, gfx11 generic, and gfx12 generic. This should make the program work for most of GPU's.

Seems silly to do that when we have the .hip extension, but I guess it's a convenience that we can't really turn off at this stage.

So perhaps we can fork this linguistically? I.e.:

* the HIP language should default to SPIRV (which will work everywhere; additionally, I think that some of the concerns in this thread re: performance / capability are misplaced, considering we're not using generic SPIRV, but that's a different kettle of fish);

* everything else just errors out if --offload-arch / -mcpu are not set.
  Trying to add multiple / all offload-archs is just committing to being on an endless treadmill. Part of why we added SPIRV support is to deal with this.

We still rely on external Khronos tools for SPIR-V right? I don't think we can shift that to the default until it's all LLVM.

Sure, however we are actually switching to the BE within the next couple of months, so it would be useful to incorporate that into planning / what we do here. It'd probably be preferable to have one default switch and then stick with that.

@yxsamliu
Copy link
Collaborator Author

So we will wait until amdgcnspirv uses SPRIV backend by default, then switch HIP default offload arch to amdgcnspirv. That sounds a reasonable solution to me.

@AlexVlx
Copy link
Contributor

AlexVlx commented May 13, 2025

I think that in general we also need to decide on what happens when you pick an amdgcn— triple. IMHO for that case we should probably error out if no mcpu is provided, since there’s no reasonable default, except for “all”, but that would be incredibly disruptive.

@yxsamliu
Copy link
Collaborator Author

I think that in general we also need to decide on what happens when you pick an amdgcn— triple. IMHO for that case we should probably error out if no mcpu is provided, since there’s no reasonable default, except for “all”, but that would be incredibly disruptive.

That happens for device libs. In that case the default processor is "generic", and clang emits LLVM IR not assuming any target processor. I think for emitting LLVM IR we should allow that. We can emit error if it emits assembly or objects.

@jhuber6
Copy link
Contributor

jhuber6 commented May 13, 2025

I think that in general we also need to decide on what happens when you pick an amdgcn— triple. IMHO for that case we should probably error out if no mcpu is provided, since there’s no reasonable default, except for “all”, but that would be incredibly disruptive.

That happens for device libs. In that case the default processor is "generic", and clang emits LLVM IR not assuming any target processor. I think for emitting LLVM IR we should allow that. We can emit error if it emits assembly or objects.

Also used for all my libraries. Right now we default to like gfx700 or something if you go to the backend, could make that an error.

@arsenm
Copy link
Contributor

arsenm commented May 13, 2025

That happens for device libs. In that case the default processor is "generic", and clang emits LLVM IR not assuming any target processor. I think for emitting LLVM IR we should allow that. We can emit error if it emits assembly or objects.

Which is a huge hack and we should probably not allow. We should probably hard error on codegen, there's no such thing as the generic target. It's a never ending source of bug reports

@jhuber6
Copy link
Contributor

jhuber6 commented May 13, 2025

That happens for device libs. In that case the default processor is "generic", and clang emits LLVM IR not assuming any target processor. I think for emitting LLVM IR we should allow that. We can emit error if it emits assembly or objects.

Which is a huge hack and we should probably not allow. We should probably hard error on codegen, there's no such thing as the generic target. It's a never ending source of bug reports

It's just the AMDGCN target without any +features, right? The only issue I was aware of was assuming w64 when unspecified but you fixed that previously.

@arsenm
Copy link
Contributor

arsenm commented May 13, 2025

It's just the AMDGCN target without any +features, right? The only issue I was aware of was assuming w64 when unspecified but you fixed that previously.

Almost, but it's problematic in several ways. The problems multiply once you start adding in manually specified target_features attributes

@jhuber6
Copy link
Contributor

jhuber6 commented May 13, 2025

It's just the AMDGCN target without any +features, right? The only issue I was aware of was assuming w64 when unspecified but you fixed that previously.

Almost, but it's problematic in several ways. The problems multiply once you start adding in manually specified target_features attributes

I don't do that anywhere so probably why I haven't encountered those issues (so far). Though I wonder how the proposed AMDGCN version of __nvvm_reflect Alex is working on will interact with that as well, since I think the SPIR-V target would share similar issues if we're emitting builtins that require certain target features?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants