[Offload] Remove handling for COV4 binaries from offload/ #131033

jhuber6 · 2025-03-12T21:23:57Z

Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.

Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.

llvmbot · 2025-03-12T21:24:29Z

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
We moved from cov4 to cov5 a long time ago, and it guards simplifying
some front end code, so we should be able to move up with this.

Full diff: https://github.com/llvm/llvm-project/pull/131033.diff

3 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+11-16)
(modified) offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h (+1-8)
(modified) offload/plugins-nextgen/common/src/Utils/ELF.cpp (+2-3)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index e83d38a14f77f..f64d05744f204 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -576,8 +576,7 @@ struct AMDGPUKernelTy : public GenericKernelTy {
   /// Get the HSA kernel object representing the kernel function.
   uint64_t getKernelObject() const { return KernelObject; }
 
-  /// Get the size of implicitargs based on the code object version
-  /// @return 56 for cov4 and 256 for cov5
+  /// Get the size of implicitargs based on the code object version.
   uint32_t getImplicitArgsSize() const { return ImplicitArgsSize; }
 
   /// Indicates whether or not we need to set up our own private segment size.
@@ -3386,20 +3385,16 @@ Error AMDGPUKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
   if (auto Err = AMDGPUDevice.getStream(AsyncInfoWrapper, Stream))
     return Err;
 
-  // Only COV5 implicitargs needs to be set. COV4 implicitargs are not used.
-  if (ImplArgs &&
-      getImplicitArgsSize() == sizeof(hsa_utils::AMDGPUImplicitArgsTy)) {
-    ImplArgs->BlockCountX = NumBlocks[0];
-    ImplArgs->BlockCountY = NumBlocks[1];
-    ImplArgs->BlockCountZ = NumBlocks[2];
-    ImplArgs->GroupSizeX = NumThreads[0];
-    ImplArgs->GroupSizeY = NumThreads[1];
-    ImplArgs->GroupSizeZ = NumThreads[2];
-    ImplArgs->GridDims = NumBlocks[2] * NumThreads[2] > 1
-                             ? 3
-                             : 1 + (NumBlocks[1] * NumThreads[1] != 1);
-    ImplArgs->DynamicLdsSize = KernelArgs.DynCGroupMem;
-  }
+  ImplArgs->BlockCountX = NumBlocks[0];
+  ImplArgs->BlockCountY = NumBlocks[1];
+  ImplArgs->BlockCountZ = NumBlocks[2];
+  ImplArgs->GroupSizeX = NumThreads[0];
+  ImplArgs->GroupSizeY = NumThreads[1];
+  ImplArgs->GroupSizeZ = NumThreads[2];
+  ImplArgs->GridDims = NumBlocks[2] * NumThreads[2] > 1
+                           ? 3
+                           : 1 + (NumBlocks[1] * NumThreads[1] != 1);
+  ImplArgs->DynamicLdsSize = KernelArgs.DynCGroupMem;
 
   // Push the kernel launch into the stream.
   return Stream->pushKernelLaunch(*this, AllArgs, NumThreads, NumBlocks,
diff --git a/offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h b/offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h
index 43be4e8edeba4..609ead942dbb3 100644
--- a/offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h
+++ b/offload/plugins-nextgen/amdgpu/utils/UtilitiesRTL.h
@@ -40,17 +40,10 @@ struct AMDGPUImplicitArgsTy {
   uint8_t Unused2[132]; // 132 byte offset.
 };
 
-// Dummy struct for COV4 implicitargs.
-struct AMDGPUImplicitArgsTyCOV4 {
-  uint8_t Unused[56];
-};
-
 /// Returns the size in bytes of the implicit arguments of AMDGPU kernels.
 /// `Version` is the ELF ABI version, e.g. COV5.
 inline uint32_t getImplicitArgsSize(uint16_t Version) {
-  return Version < ELF::ELFABIVERSION_AMDGPU_HSA_V5
-             ? sizeof(AMDGPUImplicitArgsTyCOV4)
-             : sizeof(AMDGPUImplicitArgsTy);
+  return sizeof(AMDGPUImplicitArgsTy);
 }
 
 /// Reads the AMDGPU specific metadata from the ELF file and propagates the
diff --git a/offload/plugins-nextgen/common/src/Utils/ELF.cpp b/offload/plugins-nextgen/common/src/Utils/ELF.cpp
index 44d1c737e2efb..b33101b99aa10 100644
--- a/offload/plugins-nextgen/common/src/Utils/ELF.cpp
+++ b/offload/plugins-nextgen/common/src/Utils/ELF.cpp
@@ -65,10 +65,9 @@ checkMachineImpl(const object::ELFObjectFile<ELFT> &ELFObj, uint16_t EMachine) {
   if (Header.e_machine == EM_AMDGPU) {
     if (Header.e_ident[EI_OSABI] != ELFOSABI_AMDGPU_HSA)
       return createError("Invalid AMD OS/ABI, must be AMDGPU_HSA");
-    if (Header.e_ident[EI_ABIVERSION] != ELFABIVERSION_AMDGPU_HSA_V4 &&
-        Header.e_ident[EI_ABIVERSION] != ELFABIVERSION_AMDGPU_HSA_V5 &&
+    if (Header.e_ident[EI_ABIVERSION] != ELFABIVERSION_AMDGPU_HSA_V5 &&
         Header.e_ident[EI_ABIVERSION] != ELFABIVERSION_AMDGPU_HSA_V6)
-      return createError("Invalid AMD ABI version, must be version 4 or above");
+      return createError("Invalid AMD ABI version, must be version 5 or above");
     if ((Header.e_flags & EF_AMDGPU_MACH) < EF_AMDGPU_MACH_AMDGCN_GFX700 ||
         (Header.e_flags & EF_AMDGPU_MACH) >
             EF_AMDGPU_MACH_AMDGCN_GFX9_4_GENERIC)

shiltian · 2025-03-12T21:27:21Z

How about downstream? Are we not gonna support it anymore as well?

arsenm · 2025-03-13T00:52:36Z

I know of a few users that have not moved to v5 yet

jhuber6 · 2025-03-13T01:28:46Z

I know of a few users that have not moved to v5 yet

This is for offload/ so I'm not sure if it's the same. I'm mostly interested in getting rid of that __oclc_abi_version that we declare for every single compilation, since its only use is to be compatible between COV4 and COV5 and I'm pretty sure that offload/ is the only user of that.

shiltian

So we are okay that OpenMP will not support COV4 downstream? For upstream I'm totally fine.

offload/plugins-nextgen/amdgpu/src/rtl.cpp

JonChesterfield · 2025-03-18T21:22:42Z

I don't think it's a great move to drop v4 from offload while some llvm users haven't been able to move to v5 yet but in the scheme of things it probably doesn't do much harm

jhuber6 · 2025-03-18T21:24:54Z

I don't think it's a great move to drop v4 from offload while some llvm users haven't been able to move to v5 yet but in the scheme of things it probably doesn't do much harm

I'm not aware of anyone stuck on 4 since Triton switched, and this is only for OpenMP anyway.

Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in llvm#131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.

VeeEM · 2025-03-26T10:29:16Z

Some OpenMP offload programs compiled with flang seem to crash after this change. I see crashes at offload/plugins-nextgen/amdgpu/src/rtl.cpp:3389 because ImplArgs is 0.

#132982

saiislam · 2025-03-26T10:54:52Z

Some OpenMP offload programs compiled with flang seem to crash after this change. I see crashes at offload/plugins-nextgen/amdgpu/src/rtl.cpp:3389 because ImplArgs is 0.

#132982

I am also seeing the same error for multiple test cases. It is because if (ArgsSize == LaunchParams.Size + getImplicitArgsSize()) is false for these cases. So, we should guard access to ImplArgs.

Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in llvm#131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.

Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in #131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.

… (#132870) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in llvm/llvm-project#131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.

[Offload] Remove handling for COV4 binaries from offload/

8e665fc

Summary: We moved from cov4 to cov5 a long time ago, and it guards simplifying some front end code, so we should be able to move up with this.

jhuber6 requested review from arsenm, jdoerfert, JonChesterfield, ronlieb, saiislam and shiltian March 12, 2025 21:23

llvmbot added backend:AMDGPU offload labels Mar 12, 2025

shiltian approved these changes Mar 13, 2025

View reviewed changes

arsenm reviewed Mar 13, 2025

View reviewed changes

offload/plugins-nextgen/amdgpu/src/rtl.cpp Show resolved Hide resolved

Update rtl.cpp

ffa7e96

arsenm approved these changes Mar 19, 2025

View reviewed changes

jhuber6 merged commit 25bf4e2 into llvm:main Mar 24, 2025
9 checks passed

jhuber6 deleted the cov4 branch March 24, 2025 23:58

jhuber6 mentioned this pull request Mar 25, 2025

[Clang][AMDGPU] Remove special handling for COV4 libraries #132870

Merged

VeeEM mentioned this pull request Mar 26, 2025

[flang][openmp] target construct segfaults with amdgpu #132982

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Offload] Remove handling for COV4 binaries from offload/ #131033

[Offload] Remove handling for COV4 binaries from offload/ #131033

jhuber6 commented Mar 12, 2025

Uh oh!

llvmbot commented Mar 12, 2025 •

edited

Loading

Uh oh!

shiltian commented Mar 12, 2025

Uh oh!

arsenm commented Mar 13, 2025

Uh oh!

jhuber6 commented Mar 13, 2025

Uh oh!

shiltian left a comment

Uh oh!

Uh oh!

JonChesterfield commented Mar 18, 2025

Uh oh!

jhuber6 commented Mar 18, 2025

Uh oh!

Uh oh!

VeeEM commented Mar 26, 2025

Uh oh!

saiislam commented Mar 26, 2025

Uh oh!

Uh oh!

[Offload] Remove handling for COV4 binaries from offload/ #131033

[Offload] Remove handling for COV4 binaries from offload/ #131033

Conversation

jhuber6 commented Mar 12, 2025

Uh oh!

llvmbot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Mar 12, 2025

Uh oh!

arsenm commented Mar 13, 2025

Uh oh!

jhuber6 commented Mar 13, 2025

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JonChesterfield commented Mar 18, 2025

Uh oh!

jhuber6 commented Mar 18, 2025

Uh oh!

Uh oh!

VeeEM commented Mar 26, 2025

Uh oh!

saiislam commented Mar 26, 2025

Uh oh!

Uh oh!

llvmbot commented Mar 12, 2025 •

edited

Loading