Skip to content

[Offload] Do not load images from the same descriptor on the same device #139147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented May 8, 2025

Summary:
Right now we generally assume that we have one image per device. The
binary descriptor represents a single 'compilation'. This means that
each image is going to contain the same code built for different
architectures when used through the OpenMP interface. This is
problematic when we have cases where the same code will then be loaded
multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This
patch is the quick and dirty slution, we just prevent this from
happening at all. This means we use the first one we find, which might
not be overly optimal, but it should be better than the alternative.
Note that this does not affect shared library loads as it is per binary
descriptor, not per device.

@llvmbot
Copy link
Member

llvmbot commented May 8, 2025

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
Right now we generally assume that we have one image per device. The
binary descriptor represents a single 'compilation'. This means that
each image is going to contain the same code built for different
architectures when used through the OpenMP interface. This is
problematic when we have cases where the same code will then be loaded
multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This
patch is the quick and dirty slution, we just prevent this from
happening at all. This means we use the first one we find, which might
not be overly optimal, but it should be better than the alternative.
Note that this does not affect shared library loads as it is per binary
descriptor, not per device.


Full diff: https://github.com/llvm/llvm-project/pull/139147.diff

1 Files Affected:

  • (modified) offload/libomptarget/PluginManager.cpp (+12)
diff --git a/offload/libomptarget/PluginManager.cpp b/offload/libomptarget/PluginManager.cpp
index d6d529a207587..712458b4de8dd 100644
--- a/offload/libomptarget/PluginManager.cpp
+++ b/offload/libomptarget/PluginManager.cpp
@@ -202,6 +202,7 @@ void PluginManager::registerLib(__tgt_bin_desc *Desc) {
     PM->addDeviceImage(*Desc, Desc->DeviceImages[i]);
 
   // Register the images with the RTLs that understand them, if any.
+  llvm::DenseMap<GenericPluginTy *, llvm::DenseSet<int32_t>> UsedDevices;
   for (DeviceImageTy &DI : PM->deviceImages()) {
     // Obtain the image and information that was previously extracted.
     __tgt_device_image *Img = &DI.getExecutableImage();
@@ -232,6 +233,17 @@ void PluginManager::registerLib(__tgt_bin_desc *Desc) {
         if (!initializeDevice(R, DeviceId))
           continue;
 
+        // We only want a single matching image to be registered for each binary
+        // descriptor. This prevents multiple of the same image from being
+        // registered for the same device in the case that they are mutually
+        // compatible, such as sm_80 and sm_89.
+        if (!UsedDevices[&R].insert(DeviceId).second) {
+          DP("Image " DPxMOD
+             " is a duplicate, not loaded on RTL %s device %d!\n",
+             DPxPTR(Img->ImageStart), R.getName(), DeviceId);
+          continue;
+        }
+
         // Initialize (if necessary) translation table for this library.
         PM->TrlTblMtx.lock();
         if (!PM->HostEntriesBeginToTransTable.count(Desc->HostEntriesBegin)) {

Copy link
Contributor

@dpalermo dpalermo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that this fixes the problem I was seeing with --offload-arch=gfx942,gfx9-4-generic too. Thanks!

Summary:
Right now we generally assume that we have one image per device. The
binary descriptor represents a single 'compilation'. This means that
each image is going to contain the same code built for different
architectures when used through the OpenMP interface. This is
problematic when we have cases where the same code will then be loaded
multiple times (like wiht sm_80, sm_89 or the generic GFX ISAs). This
patch is the quick and dirty slution, we just prevent this from
happening at all. This means we use the first one we find, which might
not be overly optimal, but it should be better than the alternative.
Note that this does not affect shared library loads as it is per binary
descriptor, not per device.
@jhuber6 jhuber6 merged commit d60eeda into llvm:main May 9, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants