[Clang][AMDGPU] Remove special handling for COV4 libraries #132870

jhuber6 · 2025-03-25T03:40:27Z

Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
#131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.

llvmbot · 2025-03-25T03:41:00Z

@llvm/pr-subscribers-flang-fir-hlfir
@llvm/pr-subscribers-flang-driver
@llvm/pr-subscribers-libcxx
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Joseph Huber (jhuber6)

Changes

Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
#131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.

Patch is 28.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132870.diff

14 Files Affected:

(modified) clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp (+8-47)
(modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (-36)
(modified) clang/test/CodeGen/amdgpu-abi-version.c (+13-12)
(modified) clang/test/CodeGen/amdgpu-address-spaces.cpp (+7-1)
(removed) clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu (-133)
(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (-34)
(modified) clang/test/CodeGenCXX/dynamic-cast-address-space.cpp (-1)
(modified) clang/test/CodeGenHIP/default-attributes.hip (-1)
(modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (-1)
(modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (-1)
(modified) compiler-rt/cmake/builtin-config-ix.cmake (-1)
(modified) compiler-rt/lib/builtins/CMakeLists.txt (-6)
(modified) libc/cmake/modules/LLVMLibCCompileOptionRules.cmake (-2)
(modified) offload/DeviceRTL/CMakeLists.txt (+1-1)

diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index f94917c905081..025f0726e099f 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
 /// Emit code based on Code Object ABI version.
 /// COV_4    : Emit code to use dispatch ptr
 /// COV_5+   : Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
-///            and use its value for COV_4 or COV_5+ approach. It is used for
-///            compiling device libraries in an ABI-agnostic way.
-///
-/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
-///       clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {
   llvm::LoadInst *LD;
 
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
-  if (Cov == CodeObjectVersionKind::COV_None) {
-    StringRef Name = "__oclc_ABI_version";
-    auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
-    if (!ABIVersionC)
-      ABIVersionC = new llvm::GlobalVariable(
-          CGF.CGM.getModule(), CGF.Int32Ty, false,
-          llvm::GlobalValue::ExternalLinkage, nullptr, Name, nullptr,
-          llvm::GlobalVariable::NotThreadLocal,
-          CGF.CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
-
-    // This load will be eliminated by the IPSCCP because it is constant
-    // weak_odr without externally_initialized. Either changing it to weak or
-    // adding externally_initialized will keep the load.
-    Value *ABIVersion = CGF.Builder.CreateAlignedLoad(CGF.Int32Ty, ABIVersionC,
-                                                      CGF.CGM.getIntAlign());
-
-    Value *IsCOV5 = CGF.Builder.CreateICmpSGE(
-        ABIVersion,
-        llvm::ConstantInt::get(CGF.Int32Ty, CodeObjectVersionKind::COV_5));
-
+  Value *GEP = nullptr;
+  if (Cov >= CodeObjectVersionKind::COV_5) {
     // Indexing the implicit kernarg segment.
-    Value *ImplicitGEP = CGF.Builder.CreateConstGEP1_32(
+    GEP = CGF.Builder.CreateConstGEP1_32(
         CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
-
-    // Indexing the HSA kernel_dispatch_packet struct.
-    Value *DispatchGEP = CGF.Builder.CreateConstGEP1_32(
-        CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
-
-    auto Result = CGF.Builder.CreateSelect(IsCOV5, ImplicitGEP, DispatchGEP);
-    LD = CGF.Builder.CreateLoad(
-        Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
   } else {
-    Value *GEP = nullptr;
-    if (Cov >= CodeObjectVersionKind::COV_5) {
-      // Indexing the implicit kernarg segment.
-      GEP = CGF.Builder.CreateConstGEP1_32(
-          CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
-    } else {
-      // Indexing the HSA kernel_dispatch_packet struct.
-      GEP = CGF.Builder.CreateConstGEP1_32(
-          CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
-    }
-    LD = CGF.Builder.CreateLoad(
-        Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+    // Indexing the HSA kernel_dispatch_packet struct.
+    GEP = CGF.Builder.CreateConstGEP1_32(CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF),
+                                         4 + Index * 2);
   }
+  LD = CGF.Builder.CreateLoad(
+      Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
 
   llvm::MDBuilder MDHelper(CGF.getLLVMContext());
   llvm::MDNode *RNode = MDHelper.createRange(APInt(16, 1),
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index e3c8243cbb0b5..db2a2c5740646 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -305,8 +305,6 @@ class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
   void setFunctionDeclAttributes(const FunctionDecl *FD, llvm::Function *F,
                                  CodeGenModule &CGM) const;
 
-  void emitTargetGlobals(CodeGen::CodeGenModule &CGM) const override;
-
   void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
                            CodeGen::CodeGenModule &M) const override;
   unsigned getOpenCLKernelCallingConv() const override;
@@ -414,40 +412,6 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
   }
 }
 
-/// Emits control constants used to change per-architecture behaviour in the
-/// AMDGPU ROCm device libraries.
-void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
-    CodeGen::CodeGenModule &CGM) const {
-  StringRef Name = "__oclc_ABI_version";
-  llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
-  if (OriginalGV && !llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
-    return;
-
-  if (CGM.getTarget().getTargetOpts().CodeObjectVersion ==
-      llvm::CodeObjectVersionKind::COV_None)
-    return;
-
-  auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
-  llvm::Constant *COV = llvm::ConstantInt::get(
-      Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
-
-  // It needs to be constant weak_odr without externally_initialized so that
-  // the load instuction can be eliminated by the IPSCCP.
-  auto *GV = new llvm::GlobalVariable(
-      CGM.getModule(), Type, true, llvm::GlobalValue::WeakODRLinkage, COV, Name,
-      nullptr, llvm::GlobalValue::ThreadLocalMode::NotThreadLocal,
-      CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
-  GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Local);
-  GV->setVisibility(llvm::GlobalValue::VisibilityTypes::HiddenVisibility);
-
-  // Replace any external references to this variable with the new global.
-  if (OriginalGV) {
-    OriginalGV->replaceAllUsesWith(GV);
-    GV->takeName(OriginalGV);
-    OriginalGV->eraseFromParent();
-  }
-}
-
 void AMDGPUTargetCodeGenInfo::setTargetAttributes(
     const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
   if (requiresAMDGPUProtectedVisibility(D, GV)) {
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c b/clang/test/CodeGen/amdgpu-abi-version.c
index 4e5ad87655f23..66990ab76283e 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -1,23 +1,24 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --version 3
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm -mcode-object-version=none %s -o - | FileCheck %s
 
-//.
-// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
-//.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
-// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(4) @__oclc_ABI_version, align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = icmp sge i32 [[TMP0]], 500
-// CHECK-NEXT:    [[TMP2:%.*]] = call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP2]], i32 12
-// CHECK-NEXT:    [[TMP4:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK-NEXT:    [[TMP5:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP4]], i32 4
-// CHECK-NEXT:    [[TMP6:%.*]] = select i1 [[TMP1]], ptr addrspace(4) [[TMP3]], ptr addrspace(4) [[TMP5]]
-// CHECK-NEXT:    [[TMP7:%.*]] = load i16, ptr addrspace(4) [[TMP6]], align 2, !range [[RNG2:![0-9]+]], !invariant.load !3, !noundef !3
-// CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[TMP7]] to i32
+// CHECK-NEXT:    [[TMP0:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT:    [[TMP1:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP0]], i32 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 2, !range [[RNG2:![0-9]+]], !invariant.load [[META3:![0-9]+]], !noundef [[META3]]
+// CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[TMP2]] to i32
 // CHECK-NEXT:    ret i32 [[CONV]]
 //
 int foo() { return __builtin_amdgcn_workgroup_size_x(); }
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+// CHECK: [[RNG2]] = !{i16 1, i16 1025}
+// CHECK: [[META3]] = !{}
+//.
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index b121b559f58dc..1d8668a7f0917 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,6 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
@@ -60,3 +59,10 @@ extern "C" [[clang::amdgpu_kernel]] void foo() {
   aaa = 0;
   bbb = 0;
 }
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// CHECK: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
deleted file mode 100644
index cb467886c016c..0000000000000
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
+++ /dev/null
@@ -1,133 +0,0 @@
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=4 -DUSER -x hip -o %t_4.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=5 -DUSER -x hip -o %t_5.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=6 -DUSER -x hip -o %t_6.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=none -DDEVICELIB -x hip -o %t_0.bc %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN:   %t_4.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN:   FileCheck -check-prefix=LINKED4 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN:   %t_5.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN:   FileCheck -check-prefix=LINKED5 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN:   %t_6.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN:   FileCheck -check-prefix=LINKED6 %s
-
-#include "Inputs/cuda.h"
-
-// LINKED4: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400
-// LINKED4-LABEL: bar
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED4: "amdhsa_code_object_version", i32 400
-
-// LINKED5: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500
-// LINKED5-LABEL: bar
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED5: "amdhsa_code_object_version", i32 500
-
-// LINKED6: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
-// LINKED6-LABEL: bar
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED6: "amdhsa_code_object_version", i32 600
-
-#ifdef DEVICELIB
-__device__ void bar(int *x, int *y, int *z)
-{
-  *x = __builtin_amdgcn_workgroup_size_x();
-  *y = __builtin_amdgcn_workgroup_size_y();
-  *z = __builtin_amdgcn_workgroup_size_z();
-}
-#endif
-
-#ifdef USER
-__device__ void bar(int *x, int *y, int *z);
-__device__ void foo()
-{
-  int *x, *y, *z;
-  bar(x, y, z);
-}
-#endif
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index f42b69f492ff8..2d3730cdfc8c9 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -2,7 +2,6 @@
 // RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=PRECOV5 %s
 
-
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
 // RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
@@ -11,10 +10,6 @@
 // RUN:     -fcuda-is-device -mcode-object-version=6 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=none -emit-llvm -o - -x hip %s \
-// RUN:     | FileCheck -check-prefix=COVNONE %s
-
 #include "Inputs/cuda.h"
 
 // PRECOV5-LABEL: test_get_workgroup_size
@@ -35,35 +30,6 @@
 // COV5: getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
 // COV5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 
-
-// COVNONE-LABEL: test_get_workgroup_size
-// COVNONE: load i32, ptr addrspace(4) @__oclc_ABI_version
-// COVNONE: [[ABI5_X:%.*]] = icmp sge i32 %{{.*}}, 500
-// COVNONE: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// COVNONE: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// COVNONE: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// COVNONE: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// COVNONE: select i1 [[ABI5_X]], ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// COVNONE: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// COVNONE: load i32, ptr a...
[truncated]

llvmbot · 2025-03-25T03:41:01Z

@llvm/pr-subscribers-offload

Author: Joseph Huber (jhuber6)

Changes

Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
#131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.

Patch is 28.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132870.diff

14 Files Affected:

(modified) clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp (+8-47)
(modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (-36)
(modified) clang/test/CodeGen/amdgpu-abi-version.c (+13-12)
(modified) clang/test/CodeGen/amdgpu-address-spaces.cpp (+7-1)
(removed) clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu (-133)
(modified) clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu (-34)
(modified) clang/test/CodeGenCXX/dynamic-cast-address-space.cpp (-1)
(modified) clang/test/CodeGenHIP/default-attributes.hip (-1)
(modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (-1)
(modified) clang/test/OpenMP/amdgcn_target_global_constructor.cpp (-1)
(modified) compiler-rt/cmake/builtin-config-ix.cmake (-1)
(modified) compiler-rt/lib/builtins/CMakeLists.txt (-6)
(modified) libc/cmake/modules/LLVMLibCCompileOptionRules.cmake (-2)
(modified) offload/DeviceRTL/CMakeLists.txt (+1-1)

diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index f94917c905081..025f0726e099f 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
 /// Emit code based on Code Object ABI version.
 /// COV_4    : Emit code to use dispatch ptr
 /// COV_5+   : Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
-///            and use its value for COV_4 or COV_5+ approach. It is used for
-///            compiling device libraries in an ABI-agnostic way.
-///
-/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
-///       clang during compilation of user code.
 Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {
   llvm::LoadInst *LD;
 
   auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
 
-  if (Cov == CodeObjectVersionKind::COV_None) {
-    StringRef Name = "__oclc_ABI_version";
-    auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
-    if (!ABIVersionC)
-      ABIVersionC = new llvm::GlobalVariable(
-          CGF.CGM.getModule(), CGF.Int32Ty, false,
-          llvm::GlobalValue::ExternalLinkage, nullptr, Name, nullptr,
-          llvm::GlobalVariable::NotThreadLocal,
-          CGF.CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
-
-    // This load will be eliminated by the IPSCCP because it is constant
-    // weak_odr without externally_initialized. Either changing it to weak or
-    // adding externally_initialized will keep the load.
-    Value *ABIVersion = CGF.Builder.CreateAlignedLoad(CGF.Int32Ty, ABIVersionC,
-                                                      CGF.CGM.getIntAlign());
-
-    Value *IsCOV5 = CGF.Builder.CreateICmpSGE(
-        ABIVersion,
-        llvm::ConstantInt::get(CGF.Int32Ty, CodeObjectVersionKind::COV_5));
-
+  Value *GEP = nullptr;
+  if (Cov >= CodeObjectVersionKind::COV_5) {
     // Indexing the implicit kernarg segment.
-    Value *ImplicitGEP = CGF.Builder.CreateConstGEP1_32(
+    GEP = CGF.Builder.CreateConstGEP1_32(
         CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
-
-    // Indexing the HSA kernel_dispatch_packet struct.
-    Value *DispatchGEP = CGF.Builder.CreateConstGEP1_32(
-        CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
-
-    auto Result = CGF.Builder.CreateSelect(IsCOV5, ImplicitGEP, DispatchGEP);
-    LD = CGF.Builder.CreateLoad(
-        Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
   } else {
-    Value *GEP = nullptr;
-    if (Cov >= CodeObjectVersionKind::COV_5) {
-      // Indexing the implicit kernarg segment.
-      GEP = CGF.Builder.CreateConstGEP1_32(
-          CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
-    } else {
-      // Indexing the HSA kernel_dispatch_packet struct.
-      GEP = CGF.Builder.CreateConstGEP1_32(
-          CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
-    }
-    LD = CGF.Builder.CreateLoad(
-        Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+    // Indexing the HSA kernel_dispatch_packet struct.
+    GEP = CGF.Builder.CreateConstGEP1_32(CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF),
+                                         4 + Index * 2);
   }
+  LD = CGF.Builder.CreateLoad(
+      Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
 
   llvm::MDBuilder MDHelper(CGF.getLLVMContext());
   llvm::MDNode *RNode = MDHelper.createRange(APInt(16, 1),
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index e3c8243cbb0b5..db2a2c5740646 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -305,8 +305,6 @@ class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
   void setFunctionDeclAttributes(const FunctionDecl *FD, llvm::Function *F,
                                  CodeGenModule &CGM) const;
 
-  void emitTargetGlobals(CodeGen::CodeGenModule &CGM) const override;
-
   void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
                            CodeGen::CodeGenModule &M) const override;
   unsigned getOpenCLKernelCallingConv() const override;
@@ -414,40 +412,6 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
   }
 }
 
-/// Emits control constants used to change per-architecture behaviour in the
-/// AMDGPU ROCm device libraries.
-void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
-    CodeGen::CodeGenModule &CGM) const {
-  StringRef Name = "__oclc_ABI_version";
-  llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
-  if (OriginalGV && !llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
-    return;
-
-  if (CGM.getTarget().getTargetOpts().CodeObjectVersion ==
-      llvm::CodeObjectVersionKind::COV_None)
-    return;
-
-  auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
-  llvm::Constant *COV = llvm::ConstantInt::get(
-      Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
-
-  // It needs to be constant weak_odr without externally_initialized so that
-  // the load instuction can be eliminated by the IPSCCP.
-  auto *GV = new llvm::GlobalVariable(
-      CGM.getModule(), Type, true, llvm::GlobalValue::WeakODRLinkage, COV, Name,
-      nullptr, llvm::GlobalValue::ThreadLocalMode::NotThreadLocal,
-      CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
-  GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Local);
-  GV->setVisibility(llvm::GlobalValue::VisibilityTypes::HiddenVisibility);
-
-  // Replace any external references to this variable with the new global.
-  if (OriginalGV) {
-    OriginalGV->replaceAllUsesWith(GV);
-    GV->takeName(OriginalGV);
-    OriginalGV->eraseFromParent();
-  }
-}
-
 void AMDGPUTargetCodeGenInfo::setTargetAttributes(
     const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
   if (requiresAMDGPUProtectedVisibility(D, GV)) {
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c b/clang/test/CodeGen/amdgpu-abi-version.c
index 4e5ad87655f23..66990ab76283e 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -1,23 +1,24 @@
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --version 3
 // RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm -mcode-object-version=none %s -o - | FileCheck %s
 
-//.
-// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
-//.
 // CHECK-LABEL: define dso_local i32 @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
 // CHECK-NEXT:    [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
-// CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr addrspace(4) @__oclc_ABI_version, align 4
-// CHECK-NEXT:    [[TMP1:%.*]] = icmp sge i32 [[TMP0]], 500
-// CHECK-NEXT:    [[TMP2:%.*]] = call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP2]], i32 12
-// CHECK-NEXT:    [[TMP4:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK-NEXT:    [[TMP5:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP4]], i32 4
-// CHECK-NEXT:    [[TMP6:%.*]] = select i1 [[TMP1]], ptr addrspace(4) [[TMP3]], ptr addrspace(4) [[TMP5]]
-// CHECK-NEXT:    [[TMP7:%.*]] = load i16, ptr addrspace(4) [[TMP6]], align 2, !range [[RNG2:![0-9]+]], !invariant.load !3, !noundef !3
-// CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[TMP7]] to i32
+// CHECK-NEXT:    [[TMP0:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT:    [[TMP1:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP0]], i32 4
+// CHECK-NEXT:    [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 2, !range [[RNG2:![0-9]+]], !invariant.load [[META3:![0-9]+]], !noundef [[META3]]
+// CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[TMP2]] to i32
 // CHECK-NEXT:    ret i32 [[CONV]]
 //
 int foo() { return __builtin_amdgcn_workgroup_size_x(); }
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+// CHECK: [[RNG2]] = !{i16 1, i16 1025}
+// CHECK: [[META3]] = !{}
+//.
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index b121b559f58dc..1d8668a7f0917 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,6 @@ int [[clang::address_space(999)]] bbb = 1234;
 // CHECK: @u = addrspace(5) global i32 undef, align 4
 // CHECK: @aaa = addrspace(6) global i32 1000, align 4
 // CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
 //.
 // CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
 // CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
@@ -60,3 +59,10 @@ extern "C" [[clang::amdgpu_kernel]] void foo() {
   aaa = 0;
   bbb = 0;
 }
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// CHECK: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
deleted file mode 100644
index cb467886c016c..0000000000000
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
+++ /dev/null
@@ -1,133 +0,0 @@
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=4 -DUSER -x hip -o %t_4.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=5 -DUSER -x hip -o %t_5.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=6 -DUSER -x hip -o %t_6.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN:   -mcode-object-version=none -DDEVICELIB -x hip -o %t_0.bc %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN:   %t_4.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN:   FileCheck -check-prefix=LINKED4 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN:   %t_5.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN:   FileCheck -check-prefix=LINKED5 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN:   %t_6.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN:   FileCheck -check-prefix=LINKED6 %s
-
-#include "Inputs/cuda.h"
-
-// LINKED4: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400
-// LINKED4-LABEL: bar
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED4: "amdhsa_code_object_version", i32 400
-
-// LINKED5: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500
-// LINKED5-LABEL: bar
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED5: "amdhsa_code_object_version", i32 500
-
-// LINKED6: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
-// LINKED6-LABEL: bar
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED6: "amdhsa_code_object_version", i32 600
-
-#ifdef DEVICELIB
-__device__ void bar(int *x, int *y, int *z)
-{
-  *x = __builtin_amdgcn_workgroup_size_x();
-  *y = __builtin_amdgcn_workgroup_size_y();
-  *z = __builtin_amdgcn_workgroup_size_z();
-}
-#endif
-
-#ifdef USER
-__device__ void bar(int *x, int *y, int *z);
-__device__ void foo()
-{
-  int *x, *y, *z;
-  bar(x, y, z);
-}
-#endif
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index f42b69f492ff8..2d3730cdfc8c9 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -2,7 +2,6 @@
 // RUN:     -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=PRECOV5 %s
 
-
 // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
 // RUN:     -fcuda-is-device -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
@@ -11,10 +10,6 @@
 // RUN:     -fcuda-is-device -mcode-object-version=6 -emit-llvm -o - -x hip %s \
 // RUN:     | FileCheck -check-prefix=COV5 %s
 
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN:     -fcuda-is-device -mcode-object-version=none -emit-llvm -o - -x hip %s \
-// RUN:     | FileCheck -check-prefix=COVNONE %s
-
 #include "Inputs/cuda.h"
 
 // PRECOV5-LABEL: test_get_workgroup_size
@@ -35,35 +30,6 @@
 // COV5: getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
 // COV5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
 
-
-// COVNONE-LABEL: test_get_workgroup_size
-// COVNONE: load i32, ptr addrspace(4) @__oclc_ABI_version
-// COVNONE: [[ABI5_X:%.*]] = icmp sge i32 %{{.*}}, 500
-// COVNONE: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// COVNONE: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// COVNONE: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// COVNONE: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// COVNONE: select i1 [[ABI5_X]], ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// COVNONE: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// COVNONE: load i32, ptr a...
[truncated]

shiltian · 2025-03-25T03:58:45Z

clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp

@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
 /// Emit code based on Code Object ABI version.
 /// COV_4    : Emit code to use dispatch ptr


this as well

arsenm

I'm confused about where this leaves the device library build. It's still using the none code object version flag, and IIRC it had a mix of its own reimplementation of this plus use of some of the builtins.

jhuber6 · 2025-03-25T13:04:51Z

I'm confused about where this leaves the device library build. It's still using the none code object version flag, and IIRC it had a mix of its own reimplementation of this plus use of some of the builtins.

Hm, you're probably right that the ROCm DL was relying on the split codegen but didn't use the global. I could leave that in and just remove emitting the global every time.

ldionne

Waiving reviewers/libcxx code review.

michaelrj-google

LGTM from the libc side

saiislam

Thanks, Joseph!
LGTM.

Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic. Author: Joseph Huber PR: llvm#132870

Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in llvm#131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.

llvm-ci · 2025-03-28T14:09:27Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot3 while building clang,compiler-rt,flang,libc,libcxx,offload at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/9909

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 89782 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: Clang :: Interpreter/inline-virtual.cpp (19841 of 89782)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
cat /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp # RUN: at line 6
+ cat /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
JIT session error: In graph incr_module_23-jitted-objectbuffer, section .text.startup: relocation target "_ZN1AD2Ev" at address 0x769f65a2f040 is out of range of Delta32 fixup at 0x729f6510f02d (<anonymous block> @ 0x729f6510f010 + 0x1d)
error: Failed to materialize symbols: { (main, { $.incr_module_23.__inits.0, __orc_init_func.incr_module_23, a2 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_23 }) }
/home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 89782 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70
FAIL: Clang :: Interpreter/inline-virtual.cpp (19841 of 89782)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
cat /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp # RUN: at line 6
+ cat /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
JIT session error: In graph incr_module_23-jitted-objectbuffer, section .text.startup: relocation target "_ZN1AD2Ev" at address 0x769f65a2f040 is out of range of Delta32 fixup at 0x729f6510f02d (<anonymous block> @ 0x729f6510f010 + 0x1d)
error: Failed to materialize symbols: { (main, { $.incr_module_23.__inits.0, __orc_init_func.incr_module_23, a2 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_23 }) }
/home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
Slowest Tests:
--------------------------------------------------------------------------

…lvm#132870)" This reverts commit 772173f.

…aries (llvm#132870)"" This reverts commit be5d122.

jhuber6 requested review from arsenm, carlobertolli, JonChesterfield, jplehr, ronlieb, saiislam and yxsamliu March 25, 2025 03:40

llvmbot added clang Clang issues not falling into any other category compiler-rt backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. compiler-rt:builtins libc clang:openmp OpenMP related changes to Clang offload labels Mar 25, 2025

shiltian reviewed Mar 25, 2025

View reviewed changes

arsenm reviewed Mar 25, 2025

View reviewed changes

jhuber6 force-pushed the RemoveCOVHandling branch 2 times, most recently from 93fc26d to 09c8fbf Compare March 25, 2025 13:59

jhuber6 requested a review from a team as a code owner March 25, 2025 13:59

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Mar 25, 2025

ldionne approved these changes Mar 25, 2025

View reviewed changes

michaelrj-google approved these changes Mar 25, 2025

View reviewed changes

jhuber6 force-pushed the RemoveCOVHandling branch from 09c8fbf to 3fe8e18 Compare March 27, 2025 12:36

llvmbot added flang:driver flang Flang issues not falling into any other category labels Mar 27, 2025

saiislam approved these changes Mar 28, 2025

View reviewed changes

jhuber6 force-pushed the RemoveCOVHandling branch from 3fe8e18 to e2bd8c7 Compare March 28, 2025 12:28

llvmbot added the flang:fir-hlfir label Mar 28, 2025

jhuber6 merged commit 772173f into llvm:main Mar 28, 2025
20 of 29 checks passed

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 29, 2025

Revert "[Clang][AMDGPU] Remove special handling for COV4 libraries (l…

be5d122

…lvm#132870)" This reverts commit 772173f.

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 9, 2025

Revert "Revert "[Clang][AMDGPU] Remove special handling for COV4 libr…

6cd42cd

…aries (llvm#132870)"" This reverts commit be5d122.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang][AMDGPU] Remove special handling for COV4 libraries #132870

[Clang][AMDGPU] Remove special handling for COV4 libraries #132870

Uh oh!

jhuber6 commented Mar 25, 2025

Uh oh!

llvmbot commented Mar 25, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 25, 2025

Uh oh!

shiltian Mar 25, 2025

Uh oh!

arsenm left a comment

Uh oh!

jhuber6 commented Mar 25, 2025

Uh oh!

ldionne left a comment

Uh oh!

michaelrj-google left a comment

Uh oh!

saiislam left a comment

Uh oh!

Uh oh!

llvm-ci commented Mar 28, 2025

Uh oh!

Uh oh!

		@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
		/// Emit code based on Code Object ABI version.
		/// COV_4 : Emit code to use dispatch ptr

[Clang][AMDGPU] Remove special handling for COV4 libraries #132870

[Clang][AMDGPU] Remove special handling for COV4 libraries #132870

Uh oh!

Conversation

jhuber6 commented Mar 25, 2025

Uh oh!

llvmbot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 25, 2025

Uh oh!

shiltian Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Mar 25, 2025

Uh oh!

ldionne left a comment

Choose a reason for hiding this comment

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

saiislam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Mar 28, 2025

Uh oh!

Uh oh!

llvmbot commented Mar 25, 2025 •

edited

Loading