-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[HIP] change default offload archs #139281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently, HIP uses gfx906 as the default offload arch, which only works on systems with gfx906. For non-interactive uses, this is less of a concern since they all set explicit offload archs for supported GPU's. The only use of default offload arch is when the clang wrapper hipcc is used as a C++ compiler during compiler detection of cmake, where the default offload arch is used to compile a C++ program as HIP program and executed. However since there is no kernel, the default offload arch just works. However, gfx906 as default offload arch is very inconenient for interactive users since in most cases they would like to compile for the GPU on the system. With this patch, if AMD GPU's are detected on the system, they will be used as default offload archs for HIP. Otherwise, if amd-llvm-spirv is found and executable, amdgcnspirv will be used as the default offload arch, since it works for all AMD GPU's supporting HIP. Otherwise, the original default offload arch is used, which is gfx906.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates HIP’s default offload architecture selection to use the detected AMD GPU’s when available, falling back to spirv-based offloading if needed. Key changes include:
- Updating the test check for HIP default offload architectures.
- Adding the new getHIPDefaultOffloadArchs function in ToolChain.cpp to determine the default offload arch.
- Modifying Driver.cpp and ToolChain.h to utilize the new function.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
clang/test/Driver/hip-default-gpu-arch.hip | Updated test CHECK pattern to validate the new triple and target-cpu arguments. |
clang/lib/Driver/ToolChain.cpp | Introduced getHIPDefaultOffloadArchs to select the default HIP offload architecture. |
clang/lib/Driver/Driver.cpp | Updated HIP offload arch insertion to rely on the new helper function. |
clang/include/clang/Driver/ToolChain.h | Added the function prototype and accompanying documentation for HIP default arch. |
Comments suppressed due to low confidence (1)
clang/lib/Driver/ToolChain.cpp:1596
- [nitpick] Consider renaming 'Prog' to a more descriptive identifier like 'spirvCompilerPath' to improve code readability and clarity.
auto Prog = GetProgramPath("amd-llvm-spirv");
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Yaxun (Sam) Liu (yxsamliu) ChangesCurrently, HIP uses gfx906 as the default offload For non-interactive uses, this is less of a concern since they all set explicit offload arch's for supported GPU's. The only use of default offload arch is when the clang wrapper hipcc is used as a C++ compiler However, gfx906 as default offload arch is very With this patch, if AMD GPU's are detected on the system, they will be used as default offload arch's for HIP. Otherwise, if Full diff: https://github.com/llvm/llvm-project/pull/139281.diff 4 Files Affected:
diff --git a/clang/include/clang/Driver/ToolChain.h b/clang/include/clang/Driver/ToolChain.h
index 58edf2b3887b0..46771c450ef71 100644
--- a/clang/include/clang/Driver/ToolChain.h
+++ b/clang/include/clang/Driver/ToolChain.h
@@ -780,6 +780,10 @@ class ToolChain {
virtual Expected<SmallVector<std::string>>
getSystemGPUArchs(const llvm::opt::ArgList &Args) const;
+ /// getHIPDefaultOffloadArchs - Get the default offload arch's for HIP.
+ virtual SmallVector<StringRef>
+ getHIPDefaultOffloadArchs(const llvm::opt::ArgList &Args) const;
+
/// addProfileRTLibs - When -fprofile-instr-profile is specified, try to pass
/// a suitable profile runtime library to the linker.
virtual void addProfileRTLibs(const llvm::opt::ArgList &Args,
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index e844f0d6d5400..d746c8c77a99d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3503,6 +3503,9 @@ class OffloadingActionBuilder final {
GpuArchList.push_back(OffloadArch::AMDGCNSPIRV);
else
GpuArchList.push_back(OffloadArch::Generic);
+ } else if (AssociatedOffloadKind == Action::OFK_HIP) {
+ for (auto A : ToolChains.front()->getHIPDefaultOffloadArchs(Args))
+ GpuArchList.push_back(A.data());
} else {
GpuArchList.push_back(DefaultOffloadArch);
}
@@ -4825,7 +4828,8 @@ Driver::getOffloadArchs(Compilation &C, const llvm::opt::DerivedArgList &Args,
if (Kind == Action::OFK_Cuda) {
Archs.insert(OffloadArchToString(OffloadArch::CudaDefault));
} else if (Kind == Action::OFK_HIP) {
- Archs.insert(OffloadArchToString(OffloadArch::HIPDefault));
+ for (auto A : TC->getHIPDefaultOffloadArchs(Args))
+ Archs.insert(A);
} else if (Kind == Action::OFK_SYCL) {
Archs.insert(StringRef());
} else if (Kind == Action::OFK_OpenMP) {
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 3c52abb0ab78e..2b6db476118c4 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1574,6 +1574,32 @@ ToolChain::getSystemGPUArchs(const llvm::opt::ArgList &Args) const {
return SmallVector<std::string>();
}
+SmallVector<StringRef>
+ToolChain::getHIPDefaultOffloadArchs(const llvm::opt::ArgList &Args) const {
+ if (getTriple().isSPIRV()) {
+ if (getTriple().getVendor() == llvm::Triple::AMD)
+ return {OffloadArchToString(OffloadArch::AMDGCNSPIRV)};
+ return {OffloadArchToString(OffloadArch::Generic)};
+ }
+
+ if (!getTriple().isAMDGPU())
+ return {};
+
+ SmallVector<StringRef> GpuArchList;
+ auto GPUsOrErr = getSystemGPUArchs(Args);
+ if (GPUsOrErr) {
+ for (auto &G : *GPUsOrErr)
+ GpuArchList.push_back(Args.MakeArgString(G));
+ return GpuArchList;
+ }
+ llvm::consumeError(GPUsOrErr.takeError());
+ auto Prog = GetProgramPath("amd-llvm-spirv");
+ if (!Prog.empty() && llvm::sys::fs::can_execute(Prog))
+ return {OffloadArchToString(OffloadArch::AMDGCNSPIRV)};
+
+ return {OffloadArchToString(OffloadArch::HIPDefault)};
+}
+
SanitizerMask ToolChain::getSupportedSanitizers() const {
// Return sanitizers which don't require runtime support and are not
// platform dependent.
diff --git a/clang/test/Driver/hip-default-gpu-arch.hip b/clang/test/Driver/hip-default-gpu-arch.hip
index d55a3ea151f9a..b68e738ccc345 100644
--- a/clang/test/Driver/hip-default-gpu-arch.hip
+++ b/clang/test/Driver/hip-default-gpu-arch.hip
@@ -1,3 +1,3 @@
// RUN: %clang -### -nogpulib -nogpuinc -c %s 2>&1 | FileCheck %s
-// CHECK: {{.*}}clang{{.*}}"-target-cpu" "gfx906"
+// CHECK: {{.*}}clang{{.*}}"-triple" "{{amdgcn|spirv64}}-amd-amdhsa"{{.*}} "-target-cpu" "{{amdgcnspirv|gfx.*}}"
|
@jhuber6 do you think can we use |
OpenMP already defaults to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yxsamliu! To add a bit more context, this was my original email that raised the topic:
I wonder if we should consider revisiting the decision to default clang to gfx906 1. I came across how that decision was made shortly after it was implemented, and at the time I thought it was silly not to default to
native
. The decision was made to change the default from gfx803 to gfx906 because the former wasn’t supported, but AMD dropped support for the latter shortly thereafter. Using the same criteria as that previous decision, we should be changing the default again to something newer.However, the default of "some random GPU architecture that changes every few releases" is just plain bad. If a user tries to run a program that was built for gfx906 only because nothing else was specified, the most likely result will be that the program doesn’t work on their machine. I’m not sure what the current behaviour of the HIP Runtime is, but historically, running a program built for a different GPU than your own would result in a crash that might not even include an error message.
My workstations all have gfx906 GPUs, but this is arguably worse! It means that when I work on build scripts, if I make a mistake and fail to specify the target architecture somewhere, the built result will still work on my machine! This hides errors in my build scripts, which then become a problem when I share those scripts with others working on different hardware. Even for my own work, I wouldn’t want to depend on gfx906 being the default target. My scripts would all break when I update clang to some future version.
I would suggest that we should either (a) change the default GPU target to
native
and make the failure to detect the user’s GPU into a hard compiler error, or (b) change the default GPU target to SPIR-V so that it works on every machine.
The thing is that the sensible GPU target default does not exist for the GPUs in principle, IMO. We're just arguing about the least terrible variant to choose.
I am personally biased towards the first variant, as it's consistently and predictably bad for everyone. Other options tend to create an illusion of "just working", but are prone to creating unwanted surprises. spir-v would probably be an OK alternative. It's not great (probably leases some performance on the table vs the native build), but the binary will presumably work on other machines. |
Here's a quick test with the in-box packages on Ubuntu 24.04 with my gfx1103 laptop:
It seems that the sample program ran without crashing, but the kernel function was never actually executed. You can catch an "invalid device function" error if you call
My concern is that it doesn't force everyone to specify what they need. If there's no reasonable default, I'd be fine with forcing everyone to explicitly specify what they need. A compiler error would make it easier to notice when a file is not getting With that said, I understand that might be troublesome for users that legitimately don't care about the architecture of the artifact produced. Those users are probably mostly compiler developers or build tools checking that the compiler works. Still, having a useful default for that group is the main reason why I endorsed SPIR-V as a default over having no default.
That is exactly my argument in favour of SPIR-V. At the moment, when you compile and run a program without specifying your offload architecture, you are unlikely to notice a problem until you encounter surprising behaviour at runtime. SPIR-V is not ideal for anyone, but it's a default that we can set today and never change again. |
Honestly an error if unspecified would probably be better than what we do now. |
I'd probably go with just error and emit nothing, but I'm sure that will break someone's build somewhere |
Lack of proper error checking in the host APIs or host API usage isn't really a problem we solve here. Anything like a system call should be always checked for failure. |
The main obstacle of letting clang emit error when Another option is to have multiple |
So perhaps we can fork this linguistically? I.e.:
|
Seems silly to do that when we have the
We still rely on external Khronos tools for SPIR-V right? I don't think we can shift that to the default until it's all LLVM. |
Sure, however we are actually switching to the BE within the next couple of months, so it would be useful to incorporate that into planning / what we do here. It'd probably be preferable to have one default switch and then stick with that. |
So we will wait until amdgcnspirv uses SPRIV backend by default, then switch HIP default offload arch to amdgcnspirv. That sounds a reasonable solution to me. |
I think that in general we also need to decide on what happens when you pick an amdgcn— triple. IMHO for that case we should probably error out if no mcpu is provided, since there’s no reasonable default, except for “all”, but that would be incredibly disruptive. |
That happens for device libs. In that case the default processor is "generic", and clang emits LLVM IR not assuming any target processor. I think for emitting LLVM IR we should allow that. We can emit error if it emits assembly or objects. |
Also used for all my libraries. Right now we default to like gfx700 or something if you go to the backend, could make that an error. |
Which is a huge hack and we should probably not allow. We should probably hard error on codegen, there's no such thing as the generic target. It's a never ending source of bug reports |
It's just the AMDGCN target without any |
Almost, but it's problematic in several ways. The problems multiply once you start adding in manually specified target_features attributes |
I don't do that anywhere so probably why I haven't encountered those issues (so far). Though I wonder how the proposed AMDGCN version of |
Currently, HIP uses gfx906 as the default offload
arch, which only works on systems with gfx906.
For non-interactive uses, this is less of a concern since they all set explicit offload arch's for supported GPU's. The only use of default offload arch is when the clang wrapper hipcc is used as a C++ compiler
during compiler detection of cmake, where the default offload arch is used to compile a C++ program as HIP program and executed. However since there is no
kernel, the default offload arch just works.
However, gfx906 as default offload arch is very
inconvenient for interactive users since in most cases they would like to compile for the GPU on the system.
With this patch, if AMD GPU's are detected on the system, they will be used as default offload arch's for HIP. Otherwise, if
amd-llvm-spirv
is found and executable,amdgcnspirv
will be used as the default offload arch, since it works for all AMD GPU's supporting HIP.Otherwise, the original default offload arch is used, which is gfx906.