-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Clang][AMDGPU] Remove special handling for COV4 libraries #132870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir @llvm/pr-subscribers-clang-codegen Author: Joseph Huber (jhuber6) ChangesSummary: This isn't a great solution because it forced every TU to have this So, now that we don't need to worry about backward compatibility with Patch is 28.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132870.diff 14 Files Affected:
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index f94917c905081..025f0726e099f 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
/// Emit code based on Code Object ABI version.
/// COV_4 : Emit code to use dispatch ptr
/// COV_5+ : Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
-/// and use its value for COV_4 or COV_5+ approach. It is used for
-/// compiling device libraries in an ABI-agnostic way.
-///
-/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
-/// clang during compilation of user code.
Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {
llvm::LoadInst *LD;
auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
- if (Cov == CodeObjectVersionKind::COV_None) {
- StringRef Name = "__oclc_ABI_version";
- auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
- if (!ABIVersionC)
- ABIVersionC = new llvm::GlobalVariable(
- CGF.CGM.getModule(), CGF.Int32Ty, false,
- llvm::GlobalValue::ExternalLinkage, nullptr, Name, nullptr,
- llvm::GlobalVariable::NotThreadLocal,
- CGF.CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
-
- // This load will be eliminated by the IPSCCP because it is constant
- // weak_odr without externally_initialized. Either changing it to weak or
- // adding externally_initialized will keep the load.
- Value *ABIVersion = CGF.Builder.CreateAlignedLoad(CGF.Int32Ty, ABIVersionC,
- CGF.CGM.getIntAlign());
-
- Value *IsCOV5 = CGF.Builder.CreateICmpSGE(
- ABIVersion,
- llvm::ConstantInt::get(CGF.Int32Ty, CodeObjectVersionKind::COV_5));
-
+ Value *GEP = nullptr;
+ if (Cov >= CodeObjectVersionKind::COV_5) {
// Indexing the implicit kernarg segment.
- Value *ImplicitGEP = CGF.Builder.CreateConstGEP1_32(
+ GEP = CGF.Builder.CreateConstGEP1_32(
CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
-
- // Indexing the HSA kernel_dispatch_packet struct.
- Value *DispatchGEP = CGF.Builder.CreateConstGEP1_32(
- CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
-
- auto Result = CGF.Builder.CreateSelect(IsCOV5, ImplicitGEP, DispatchGEP);
- LD = CGF.Builder.CreateLoad(
- Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
} else {
- Value *GEP = nullptr;
- if (Cov >= CodeObjectVersionKind::COV_5) {
- // Indexing the implicit kernarg segment.
- GEP = CGF.Builder.CreateConstGEP1_32(
- CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
- } else {
- // Indexing the HSA kernel_dispatch_packet struct.
- GEP = CGF.Builder.CreateConstGEP1_32(
- CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
- }
- LD = CGF.Builder.CreateLoad(
- Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+ // Indexing the HSA kernel_dispatch_packet struct.
+ GEP = CGF.Builder.CreateConstGEP1_32(CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF),
+ 4 + Index * 2);
}
+ LD = CGF.Builder.CreateLoad(
+ Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
llvm::MDBuilder MDHelper(CGF.getLLVMContext());
llvm::MDNode *RNode = MDHelper.createRange(APInt(16, 1),
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index e3c8243cbb0b5..db2a2c5740646 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -305,8 +305,6 @@ class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
void setFunctionDeclAttributes(const FunctionDecl *FD, llvm::Function *F,
CodeGenModule &CGM) const;
- void emitTargetGlobals(CodeGen::CodeGenModule &CGM) const override;
-
void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
CodeGen::CodeGenModule &M) const override;
unsigned getOpenCLKernelCallingConv() const override;
@@ -414,40 +412,6 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
}
}
-/// Emits control constants used to change per-architecture behaviour in the
-/// AMDGPU ROCm device libraries.
-void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
- CodeGen::CodeGenModule &CGM) const {
- StringRef Name = "__oclc_ABI_version";
- llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
- if (OriginalGV && !llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
- return;
-
- if (CGM.getTarget().getTargetOpts().CodeObjectVersion ==
- llvm::CodeObjectVersionKind::COV_None)
- return;
-
- auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
- llvm::Constant *COV = llvm::ConstantInt::get(
- Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
-
- // It needs to be constant weak_odr without externally_initialized so that
- // the load instuction can be eliminated by the IPSCCP.
- auto *GV = new llvm::GlobalVariable(
- CGM.getModule(), Type, true, llvm::GlobalValue::WeakODRLinkage, COV, Name,
- nullptr, llvm::GlobalValue::ThreadLocalMode::NotThreadLocal,
- CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
- GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Local);
- GV->setVisibility(llvm::GlobalValue::VisibilityTypes::HiddenVisibility);
-
- // Replace any external references to this variable with the new global.
- if (OriginalGV) {
- OriginalGV->replaceAllUsesWith(GV);
- GV->takeName(OriginalGV);
- OriginalGV->eraseFromParent();
- }
-}
-
void AMDGPUTargetCodeGenInfo::setTargetAttributes(
const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
if (requiresAMDGPUProtectedVisibility(D, GV)) {
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c b/clang/test/CodeGen/amdgpu-abi-version.c
index 4e5ad87655f23..66990ab76283e 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -1,23 +1,24 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --version 3
// RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm -mcode-object-version=none %s -o - | FileCheck %s
-//.
-// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
-//.
// CHECK-LABEL: define dso_local i32 @foo(
// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
-// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(4) @__oclc_ABI_version, align 4
-// CHECK-NEXT: [[TMP1:%.*]] = icmp sge i32 [[TMP0]], 500
-// CHECK-NEXT: [[TMP2:%.*]] = call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// CHECK-NEXT: [[TMP3:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP2]], i32 12
-// CHECK-NEXT: [[TMP4:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP4]], i32 4
-// CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP1]], ptr addrspace(4) [[TMP3]], ptr addrspace(4) [[TMP5]]
-// CHECK-NEXT: [[TMP7:%.*]] = load i16, ptr addrspace(4) [[TMP6]], align 2, !range [[RNG2:![0-9]+]], !invariant.load !3, !noundef !3
-// CHECK-NEXT: [[CONV:%.*]] = zext i16 [[TMP7]] to i32
+// CHECK-NEXT: [[TMP0:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP0]], i32 4
+// CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 2, !range [[RNG2:![0-9]+]], !invariant.load [[META3:![0-9]+]], !noundef [[META3]]
+// CHECK-NEXT: [[CONV:%.*]] = zext i16 [[TMP2]] to i32
// CHECK-NEXT: ret i32 [[CONV]]
//
int foo() { return __builtin_amdgcn_workgroup_size_x(); }
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+// CHECK: [[RNG2]] = !{i16 1, i16 1025}
+// CHECK: [[META3]] = !{}
+//.
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index b121b559f58dc..1d8668a7f0917 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,6 @@ int [[clang::address_space(999)]] bbb = 1234;
// CHECK: @u = addrspace(5) global i32 undef, align 4
// CHECK: @aaa = addrspace(6) global i32 1000, align 4
// CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
//.
// CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
@@ -60,3 +59,10 @@ extern "C" [[clang::amdgpu_kernel]] void foo() {
aaa = 0;
bbb = 0;
}
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// CHECK: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
deleted file mode 100644
index cb467886c016c..0000000000000
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
+++ /dev/null
@@ -1,133 +0,0 @@
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=4 -DUSER -x hip -o %t_4.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=5 -DUSER -x hip -o %t_5.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=6 -DUSER -x hip -o %t_6.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=none -DDEVICELIB -x hip -o %t_0.bc %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN: %t_4.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN: FileCheck -check-prefix=LINKED4 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN: %t_5.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN: FileCheck -check-prefix=LINKED5 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN: %t_6.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN: FileCheck -check-prefix=LINKED6 %s
-
-#include "Inputs/cuda.h"
-
-// LINKED4: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400
-// LINKED4-LABEL: bar
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED4: "amdhsa_code_object_version", i32 400
-
-// LINKED5: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500
-// LINKED5-LABEL: bar
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED5: "amdhsa_code_object_version", i32 500
-
-// LINKED6: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
-// LINKED6-LABEL: bar
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED6: "amdhsa_code_object_version", i32 600
-
-#ifdef DEVICELIB
-__device__ void bar(int *x, int *y, int *z)
-{
- *x = __builtin_amdgcn_workgroup_size_x();
- *y = __builtin_amdgcn_workgroup_size_y();
- *z = __builtin_amdgcn_workgroup_size_z();
-}
-#endif
-
-#ifdef USER
-__device__ void bar(int *x, int *y, int *z);
-__device__ void foo()
-{
- int *x, *y, *z;
- bar(x, y, z);
-}
-#endif
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index f42b69f492ff8..2d3730cdfc8c9 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -2,7 +2,6 @@
// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
// RUN: | FileCheck -check-prefix=PRECOV5 %s
-
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
// RUN: | FileCheck -check-prefix=COV5 %s
@@ -11,10 +10,6 @@
// RUN: -fcuda-is-device -mcode-object-version=6 -emit-llvm -o - -x hip %s \
// RUN: | FileCheck -check-prefix=COV5 %s
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -mcode-object-version=none -emit-llvm -o - -x hip %s \
-// RUN: | FileCheck -check-prefix=COVNONE %s
-
#include "Inputs/cuda.h"
// PRECOV5-LABEL: test_get_workgroup_size
@@ -35,35 +30,6 @@
// COV5: getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
// COV5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// COVNONE-LABEL: test_get_workgroup_size
-// COVNONE: load i32, ptr addrspace(4) @__oclc_ABI_version
-// COVNONE: [[ABI5_X:%.*]] = icmp sge i32 %{{.*}}, 500
-// COVNONE: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// COVNONE: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// COVNONE: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// COVNONE: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// COVNONE: select i1 [[ABI5_X]], ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// COVNONE: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// COVNONE: load i32, ptr a...
[truncated]
|
@llvm/pr-subscribers-offload Author: Joseph Huber (jhuber6) ChangesSummary: This isn't a great solution because it forced every TU to have this So, now that we don't need to worry about backward compatibility with Patch is 28.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132870.diff 14 Files Affected:
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index f94917c905081..025f0726e099f 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) {
/// Emit code based on Code Object ABI version.
/// COV_4 : Emit code to use dispatch ptr
/// COV_5+ : Emit code to use implicitarg ptr
-/// COV_NONE : Emit code to load a global variable "__oclc_ABI_version"
-/// and use its value for COV_4 or COV_5+ approach. It is used for
-/// compiling device libraries in an ABI-agnostic way.
-///
-/// Note: "__oclc_ABI_version" is supposed to be emitted and intialized by
-/// clang during compilation of user code.
Value *EmitAMDGPUWorkGroupSize(CodeGenFunction &CGF, unsigned Index) {
llvm::LoadInst *LD;
auto Cov = CGF.getTarget().getTargetOpts().CodeObjectVersion;
- if (Cov == CodeObjectVersionKind::COV_None) {
- StringRef Name = "__oclc_ABI_version";
- auto *ABIVersionC = CGF.CGM.getModule().getNamedGlobal(Name);
- if (!ABIVersionC)
- ABIVersionC = new llvm::GlobalVariable(
- CGF.CGM.getModule(), CGF.Int32Ty, false,
- llvm::GlobalValue::ExternalLinkage, nullptr, Name, nullptr,
- llvm::GlobalVariable::NotThreadLocal,
- CGF.CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
-
- // This load will be eliminated by the IPSCCP because it is constant
- // weak_odr without externally_initialized. Either changing it to weak or
- // adding externally_initialized will keep the load.
- Value *ABIVersion = CGF.Builder.CreateAlignedLoad(CGF.Int32Ty, ABIVersionC,
- CGF.CGM.getIntAlign());
-
- Value *IsCOV5 = CGF.Builder.CreateICmpSGE(
- ABIVersion,
- llvm::ConstantInt::get(CGF.Int32Ty, CodeObjectVersionKind::COV_5));
-
+ Value *GEP = nullptr;
+ if (Cov >= CodeObjectVersionKind::COV_5) {
// Indexing the implicit kernarg segment.
- Value *ImplicitGEP = CGF.Builder.CreateConstGEP1_32(
+ GEP = CGF.Builder.CreateConstGEP1_32(
CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
-
- // Indexing the HSA kernel_dispatch_packet struct.
- Value *DispatchGEP = CGF.Builder.CreateConstGEP1_32(
- CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
-
- auto Result = CGF.Builder.CreateSelect(IsCOV5, ImplicitGEP, DispatchGEP);
- LD = CGF.Builder.CreateLoad(
- Address(Result, CGF.Int16Ty, CharUnits::fromQuantity(2)));
} else {
- Value *GEP = nullptr;
- if (Cov >= CodeObjectVersionKind::COV_5) {
- // Indexing the implicit kernarg segment.
- GEP = CGF.Builder.CreateConstGEP1_32(
- CGF.Int8Ty, EmitAMDGPUImplicitArgPtr(CGF), 12 + Index * 2);
- } else {
- // Indexing the HSA kernel_dispatch_packet struct.
- GEP = CGF.Builder.CreateConstGEP1_32(
- CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF), 4 + Index * 2);
- }
- LD = CGF.Builder.CreateLoad(
- Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
+ // Indexing the HSA kernel_dispatch_packet struct.
+ GEP = CGF.Builder.CreateConstGEP1_32(CGF.Int8Ty, EmitAMDGPUDispatchPtr(CGF),
+ 4 + Index * 2);
}
+ LD = CGF.Builder.CreateLoad(
+ Address(GEP, CGF.Int16Ty, CharUnits::fromQuantity(2)));
llvm::MDBuilder MDHelper(CGF.getLLVMContext());
llvm::MDNode *RNode = MDHelper.createRange(APInt(16, 1),
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index e3c8243cbb0b5..db2a2c5740646 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -305,8 +305,6 @@ class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
void setFunctionDeclAttributes(const FunctionDecl *FD, llvm::Function *F,
CodeGenModule &CGM) const;
- void emitTargetGlobals(CodeGen::CodeGenModule &CGM) const override;
-
void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
CodeGen::CodeGenModule &M) const override;
unsigned getOpenCLKernelCallingConv() const override;
@@ -414,40 +412,6 @@ void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
}
}
-/// Emits control constants used to change per-architecture behaviour in the
-/// AMDGPU ROCm device libraries.
-void AMDGPUTargetCodeGenInfo::emitTargetGlobals(
- CodeGen::CodeGenModule &CGM) const {
- StringRef Name = "__oclc_ABI_version";
- llvm::GlobalVariable *OriginalGV = CGM.getModule().getNamedGlobal(Name);
- if (OriginalGV && !llvm::GlobalVariable::isExternalLinkage(OriginalGV->getLinkage()))
- return;
-
- if (CGM.getTarget().getTargetOpts().CodeObjectVersion ==
- llvm::CodeObjectVersionKind::COV_None)
- return;
-
- auto *Type = llvm::IntegerType::getIntNTy(CGM.getModule().getContext(), 32);
- llvm::Constant *COV = llvm::ConstantInt::get(
- Type, CGM.getTarget().getTargetOpts().CodeObjectVersion);
-
- // It needs to be constant weak_odr without externally_initialized so that
- // the load instuction can be eliminated by the IPSCCP.
- auto *GV = new llvm::GlobalVariable(
- CGM.getModule(), Type, true, llvm::GlobalValue::WeakODRLinkage, COV, Name,
- nullptr, llvm::GlobalValue::ThreadLocalMode::NotThreadLocal,
- CGM.getContext().getTargetAddressSpace(LangAS::opencl_constant));
- GV->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Local);
- GV->setVisibility(llvm::GlobalValue::VisibilityTypes::HiddenVisibility);
-
- // Replace any external references to this variable with the new global.
- if (OriginalGV) {
- OriginalGV->replaceAllUsesWith(GV);
- GV->takeName(OriginalGV);
- OriginalGV->eraseFromParent();
- }
-}
-
void AMDGPUTargetCodeGenInfo::setTargetAttributes(
const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
if (requiresAMDGPUProtectedVisibility(D, GV)) {
diff --git a/clang/test/CodeGen/amdgpu-abi-version.c b/clang/test/CodeGen/amdgpu-abi-version.c
index 4e5ad87655f23..66990ab76283e 100644
--- a/clang/test/CodeGen/amdgpu-abi-version.c
+++ b/clang/test/CodeGen/amdgpu-abi-version.c
@@ -1,23 +1,24 @@
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --version 3
// RUN: %clang_cc1 -cc1 -triple amdgcn-amd-amdhsa -emit-llvm -mcode-object-version=none %s -o - | FileCheck %s
-//.
-// CHECK: @__oclc_ABI_version = external addrspace(4) global i32
-//.
// CHECK-LABEL: define dso_local i32 @foo(
// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
// CHECK-NEXT: entry:
// CHECK-NEXT: [[RETVAL:%.*]] = alloca i32, align 4, addrspace(5)
// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
-// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(4) @__oclc_ABI_version, align 4
-// CHECK-NEXT: [[TMP1:%.*]] = icmp sge i32 [[TMP0]], 500
-// CHECK-NEXT: [[TMP2:%.*]] = call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// CHECK-NEXT: [[TMP3:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP2]], i32 12
-// CHECK-NEXT: [[TMP4:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP4]], i32 4
-// CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP1]], ptr addrspace(4) [[TMP3]], ptr addrspace(4) [[TMP5]]
-// CHECK-NEXT: [[TMP7:%.*]] = load i16, ptr addrspace(4) [[TMP6]], align 2, !range [[RNG2:![0-9]+]], !invariant.load !3, !noundef !3
-// CHECK-NEXT: [[CONV:%.*]] = zext i16 [[TMP7]] to i32
+// CHECK-NEXT: [[TMP0:%.*]] = call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr i8, ptr addrspace(4) [[TMP0]], i32 4
+// CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 2, !range [[RNG2:![0-9]+]], !invariant.load [[META3:![0-9]+]], !noundef [[META3]]
+// CHECK-NEXT: [[CONV:%.*]] = zext i16 [[TMP2]] to i32
// CHECK-NEXT: ret i32 [[CONV]]
//
int foo() { return __builtin_amdgcn_workgroup_size_x(); }
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+// CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META1:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+// CHECK: [[RNG2]] = !{i16 1, i16 1025}
+// CHECK: [[META3]] = !{}
+//.
diff --git a/clang/test/CodeGen/amdgpu-address-spaces.cpp b/clang/test/CodeGen/amdgpu-address-spaces.cpp
index b121b559f58dc..1d8668a7f0917 100644
--- a/clang/test/CodeGen/amdgpu-address-spaces.cpp
+++ b/clang/test/CodeGen/amdgpu-address-spaces.cpp
@@ -29,7 +29,6 @@ int [[clang::address_space(999)]] bbb = 1234;
// CHECK: @u = addrspace(5) global i32 undef, align 4
// CHECK: @aaa = addrspace(6) global i32 1000, align 4
// CHECK: @bbb = addrspace(999) global i32 1234, align 4
-// CHECK: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
//.
// CHECK-LABEL: define dso_local amdgpu_kernel void @foo(
// CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
@@ -60,3 +59,10 @@ extern "C" [[clang::amdgpu_kernel]] void foo() {
aaa = 0;
bbb = 0;
}
+//.
+// CHECK: attributes #[[ATTR0]] = { convergent mustprogress noinline nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+//.
+// CHECK: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 600}
+// CHECK: [[META1:![0-9]+]] = !{i32 1, !"wchar_size", i32 4}
+// CHECK: [[META2:![0-9]+]] = !{!"{{.*}}clang version {{.*}}"}
+//.
diff --git a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu b/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
deleted file mode 100644
index cb467886c016c..0000000000000
--- a/clang/test/CodeGenCUDA/amdgpu-code-object-version-linking.cu
+++ /dev/null
@@ -1,133 +0,0 @@
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=4 -DUSER -x hip -o %t_4.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=5 -DUSER -x hip -o %t_5.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=6 -DUSER -x hip -o %t_6.bc %s
-
-// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm-bc \
-// RUN: -mcode-object-version=none -DDEVICELIB -x hip -o %t_0.bc %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN: %t_4.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN: FileCheck -check-prefix=LINKED4 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN: %t_5.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN: FileCheck -check-prefix=LINKED5 %s
-
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -O3 \
-// RUN: %t_6.bc -mlink-builtin-bitcode %t_0.bc -o - |\
-// RUN: FileCheck -check-prefix=LINKED6 %s
-
-#include "Inputs/cuda.h"
-
-// LINKED4: @__oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 400
-// LINKED4-LABEL: bar
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED4-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED4-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED4: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED4: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED4: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED4: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED4: select i1 false, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED4: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED4: "amdhsa_code_object_version", i32 400
-
-// LINKED5: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 500
-// LINKED5-LABEL: bar
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED5-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED5-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED5: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED5: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED5: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED5: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED5: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED5: "amdhsa_code_object_version", i32 500
-
-// LINKED6: __oclc_ABI_version = weak_odr hidden local_unnamed_addr addrspace(4) constant i32 600
-// LINKED6-LABEL: bar
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 14
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Y:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 6
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Y]], ptr addrspace(4) [[GEP_4_Y]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// LINKED6-NOT: load i32, ptr addrspacecast (ptr addrspace(4) @__oclc_ABI_version to ptr), align {{.*}}
-// LINKED6-NOT: icmp sge i32 %{{.*}}, 500
-// LINKED6: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// LINKED6: [[GEP_5_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
-// LINKED6: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// LINKED6: [[GEP_4_Z:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 8
-// LINKED6: select i1 true, ptr addrspace(4) [[GEP_5_Z]], ptr addrspace(4) [[GEP_4_Z]]
-// LINKED6: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-// LINKED6: "amdhsa_code_object_version", i32 600
-
-#ifdef DEVICELIB
-__device__ void bar(int *x, int *y, int *z)
-{
- *x = __builtin_amdgcn_workgroup_size_x();
- *y = __builtin_amdgcn_workgroup_size_y();
- *z = __builtin_amdgcn_workgroup_size_z();
-}
-#endif
-
-#ifdef USER
-__device__ void bar(int *x, int *y, int *z);
-__device__ void foo()
-{
- int *x, *y, *z;
- bar(x, y, z);
-}
-#endif
diff --git a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
index f42b69f492ff8..2d3730cdfc8c9 100644
--- a/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
+++ b/clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu
@@ -2,7 +2,6 @@
// RUN: -fcuda-is-device -mcode-object-version=4 -emit-llvm -o - -x hip %s \
// RUN: | FileCheck -check-prefix=PRECOV5 %s
-
// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
// RUN: | FileCheck -check-prefix=COV5 %s
@@ -11,10 +10,6 @@
// RUN: -fcuda-is-device -mcode-object-version=6 -emit-llvm -o - -x hip %s \
// RUN: | FileCheck -check-prefix=COV5 %s
-// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \
-// RUN: -fcuda-is-device -mcode-object-version=none -emit-llvm -o - -x hip %s \
-// RUN: | FileCheck -check-prefix=COVNONE %s
-
#include "Inputs/cuda.h"
// PRECOV5-LABEL: test_get_workgroup_size
@@ -35,35 +30,6 @@
// COV5: getelementptr i8, ptr addrspace(4) %{{.*}}, i32 16
// COV5: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// COVNONE-LABEL: test_get_workgroup_size
-// COVNONE: load i32, ptr addrspace(4) @__oclc_ABI_version
-// COVNONE: [[ABI5_X:%.*]] = icmp sge i32 %{{.*}}, 500
-// COVNONE: call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
-// COVNONE: [[GEP_5_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 12
-// COVNONE: call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
-// COVNONE: [[GEP_4_X:%.*]] = getelementptr i8, ptr addrspace(4) %{{.*}}, i32 4
-// COVNONE: select i1 [[ABI5_X]], ptr addrspace(4) [[GEP_5_X]], ptr addrspace(4) [[GEP_4_X]]
-// COVNONE: load i16, ptr addrspace(4) %{{.*}}, align 2, !range [[$WS_RANGE:![0-9]*]], !invariant.load{{.*}}, !noundef
-
-// COVNONE: load i32, ptr a...
[truncated]
|
@@ -62,62 +62,23 @@ Value *EmitAMDGPUImplicitArgPtr(CodeGenFunction &CGF) { | |||
/// Emit code based on Code Object ABI version. | |||
/// COV_4 : Emit code to use dispatch ptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about where this leaves the device library build. It's still using the none code object version flag, and IIRC it had a mix of its own reimplementation of this plus use of some of the builtins.
Hm, you're probably right that the ROCm DL was relying on the split codegen but didn't use the global. I could leave that in and just remove emitting the global every time. |
93fc26d
to
09c8fbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiving reviewers/libcxx
code review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the libc side
09c8fbf
to
3fe8e18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Joseph!
LGTM.
Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic. Author: Joseph Huber PR: llvm#132870
Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in llvm#131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.
3fe8e18
to
e2bd8c7
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/9909 Here is the relevant piece of the build log for the reference
|
…lvm#132870)" This reverts commit 772173f.
…aries (llvm#132870)"" This reverts commit be5d122.
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.
This isn't a great solution because it forced every TU to have this
variable in it. The patch in
#131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).
So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.