-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[clang][SPIR-V] Add support for AMDGCN flavoured SPIRV #89796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-spir-v @llvm/pr-subscribers-backend-amdgpu Author: Alex Voicu (AlexVlx) ChangesThis change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features:
Existing AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though). Patch is 73.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/89796.diff 22 Files Affected:
diff --git a/clang/lib/Basic/Targets.cpp b/clang/lib/Basic/Targets.cpp
index e3283510c6aac7..04a13e3385d1f6 100644
--- a/clang/lib/Basic/Targets.cpp
+++ b/clang/lib/Basic/Targets.cpp
@@ -673,8 +673,12 @@ std::unique_ptr<TargetInfo> AllocateTarget(const llvm::Triple &Triple,
}
case llvm::Triple::spirv64: {
if (os != llvm::Triple::UnknownOS ||
- Triple.getEnvironment() != llvm::Triple::UnknownEnvironment)
+ Triple.getEnvironment() != llvm::Triple::UnknownEnvironment) {
+ if (os == llvm::Triple::OSType::AMDHSA)
+ return std::make_unique<SPIRV64AMDGCNTargetInfo>(Triple, Opts);
+
return nullptr;
+ }
return std::make_unique<SPIRV64TargetInfo>(Triple, Opts);
}
case llvm::Triple::wasm32:
diff --git a/clang/lib/Basic/Targets/SPIR.cpp b/clang/lib/Basic/Targets/SPIR.cpp
index dc920177d3a910..d7d232ac9484f8 100644
--- a/clang/lib/Basic/Targets/SPIR.cpp
+++ b/clang/lib/Basic/Targets/SPIR.cpp
@@ -12,6 +12,8 @@
#include "SPIR.h"
#include "Targets.h"
+#include "clang/Basic/Builtins.h"
+#include "clang/Basic/TargetBuiltins.h"
using namespace clang;
using namespace clang::targets;
@@ -54,3 +56,289 @@ void SPIRV64TargetInfo::getTargetDefines(const LangOptions &Opts,
BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
DefineStd(Builder, "SPIRV64", Opts);
}
+
+static constexpr Builtin::Info BuiltinInfo[] = {
+#define BUILTIN(ID, TYPE, ATTRS) \
+ {#ID, TYPE, ATTRS, nullptr, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#define TARGET_BUILTIN(ID, TYPE, ATTRS, FEATURE) \
+ {#ID, TYPE, ATTRS, FEATURE, HeaderDesc::NO_HEADER, ALL_LANGUAGES},
+#include "clang/Basic/BuiltinsAMDGPU.def"
+};
+
+namespace {
+const char *AMDGPUGCCRegNames[] = {
+ "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8",
+ "v9", "v10", "v11", "v12", "v13", "v14", "v15", "v16", "v17",
+ "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26",
+ "v27", "v28", "v29", "v30", "v31", "v32", "v33", "v34", "v35",
+ "v36", "v37", "v38", "v39", "v40", "v41", "v42", "v43", "v44",
+ "v45", "v46", "v47", "v48", "v49", "v50", "v51", "v52", "v53",
+ "v54", "v55", "v56", "v57", "v58", "v59", "v60", "v61", "v62",
+ "v63", "v64", "v65", "v66", "v67", "v68", "v69", "v70", "v71",
+ "v72", "v73", "v74", "v75", "v76", "v77", "v78", "v79", "v80",
+ "v81", "v82", "v83", "v84", "v85", "v86", "v87", "v88", "v89",
+ "v90", "v91", "v92", "v93", "v94", "v95", "v96", "v97", "v98",
+ "v99", "v100", "v101", "v102", "v103", "v104", "v105", "v106", "v107",
+ "v108", "v109", "v110", "v111", "v112", "v113", "v114", "v115", "v116",
+ "v117", "v118", "v119", "v120", "v121", "v122", "v123", "v124", "v125",
+ "v126", "v127", "v128", "v129", "v130", "v131", "v132", "v133", "v134",
+ "v135", "v136", "v137", "v138", "v139", "v140", "v141", "v142", "v143",
+ "v144", "v145", "v146", "v147", "v148", "v149", "v150", "v151", "v152",
+ "v153", "v154", "v155", "v156", "v157", "v158", "v159", "v160", "v161",
+ "v162", "v163", "v164", "v165", "v166", "v167", "v168", "v169", "v170",
+ "v171", "v172", "v173", "v174", "v175", "v176", "v177", "v178", "v179",
+ "v180", "v181", "v182", "v183", "v184", "v185", "v186", "v187", "v188",
+ "v189", "v190", "v191", "v192", "v193", "v194", "v195", "v196", "v197",
+ "v198", "v199", "v200", "v201", "v202", "v203", "v204", "v205", "v206",
+ "v207", "v208", "v209", "v210", "v211", "v212", "v213", "v214", "v215",
+ "v216", "v217", "v218", "v219", "v220", "v221", "v222", "v223", "v224",
+ "v225", "v226", "v227", "v228", "v229", "v230", "v231", "v232", "v233",
+ "v234", "v235", "v236", "v237", "v238", "v239", "v240", "v241", "v242",
+ "v243", "v244", "v245", "v246", "v247", "v248", "v249", "v250", "v251",
+ "v252", "v253", "v254", "v255", "s0", "s1", "s2", "s3", "s4",
+ "s5", "s6", "s7", "s8", "s9", "s10", "s11", "s12", "s13",
+ "s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s22",
+ "s23", "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31",
+ "s32", "s33", "s34", "s35", "s36", "s37", "s38", "s39", "s40",
+ "s41", "s42", "s43", "s44", "s45", "s46", "s47", "s48", "s49",
+ "s50", "s51", "s52", "s53", "s54", "s55", "s56", "s57", "s58",
+ "s59", "s60", "s61", "s62", "s63", "s64", "s65", "s66", "s67",
+ "s68", "s69", "s70", "s71", "s72", "s73", "s74", "s75", "s76",
+ "s77", "s78", "s79", "s80", "s81", "s82", "s83", "s84", "s85",
+ "s86", "s87", "s88", "s89", "s90", "s91", "s92", "s93", "s94",
+ "s95", "s96", "s97", "s98", "s99", "s100", "s101", "s102", "s103",
+ "s104", "s105", "s106", "s107", "s108", "s109", "s110", "s111", "s112",
+ "s113", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121",
+ "s122", "s123", "s124", "s125", "s126", "s127", "exec", "vcc", "scc",
+ "m0", "flat_scratch", "exec_lo", "exec_hi", "vcc_lo", "vcc_hi",
+ "flat_scratch_lo", "flat_scratch_hi",
+ "a0", "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8",
+ "a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17",
+ "a18", "a19", "a20", "a21", "a22", "a23", "a24", "a25", "a26",
+ "a27", "a28", "a29", "a30", "a31", "a32", "a33", "a34", "a35",
+ "a36", "a37", "a38", "a39", "a40", "a41", "a42", "a43", "a44",
+ "a45", "a46", "a47", "a48", "a49", "a50", "a51", "a52", "a53",
+ "a54", "a55", "a56", "a57", "a58", "a59", "a60", "a61", "a62",
+ "a63", "a64", "a65", "a66", "a67", "a68", "a69", "a70", "a71",
+ "a72", "a73", "a74", "a75", "a76", "a77", "a78", "a79", "a80",
+ "a81", "a82", "a83", "a84", "a85", "a86", "a87", "a88", "a89",
+ "a90", "a91", "a92", "a93", "a94", "a95", "a96", "a97", "a98",
+ "a99", "a100", "a101", "a102", "a103", "a104", "a105", "a106", "a107",
+ "a108", "a109", "a110", "a111", "a112", "a113", "a114", "a115", "a116",
+ "a117", "a118", "a119", "a120", "a121", "a122", "a123", "a124", "a125",
+ "a126", "a127", "a128", "a129", "a130", "a131", "a132", "a133", "a134",
+ "a135", "a136", "a137", "a138", "a139", "a140", "a141", "a142", "a143",
+ "a144", "a145", "a146", "a147", "a148", "a149", "a150", "a151", "a152",
+ "a153", "a154", "a155", "a156", "a157", "a158", "a159", "a160", "a161",
+ "a162", "a163", "a164", "a165", "a166", "a167", "a168", "a169", "a170",
+ "a171", "a172", "a173", "a174", "a175", "a176", "a177", "a178", "a179",
+ "a180", "a181", "a182", "a183", "a184", "a185", "a186", "a187", "a188",
+ "a189", "a190", "a191", "a192", "a193", "a194", "a195", "a196", "a197",
+ "a198", "a199", "a200", "a201", "a202", "a203", "a204", "a205", "a206",
+ "a207", "a208", "a209", "a210", "a211", "a212", "a213", "a214", "a215",
+ "a216", "a217", "a218", "a219", "a220", "a221", "a222", "a223", "a224",
+ "a225", "a226", "a227", "a228", "a229", "a230", "a231", "a232", "a233",
+ "a234", "a235", "a236", "a237", "a238", "a239", "a240", "a241", "a242",
+ "a243", "a244", "a245", "a246", "a247", "a248", "a249", "a250", "a251",
+ "a252", "a253", "a254", "a255"
+};
+
+} // anonymous namespace
+
+ArrayRef<const char *> SPIRV64AMDGCNTargetInfo::getGCCRegNames() const {
+ return llvm::ArrayRef(AMDGPUGCCRegNames);
+}
+
+bool SPIRV64AMDGCNTargetInfo::initFeatureMap(
+ llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef,
+ const std::vector<std::string> &FeatureVec) const {
+ // This represents the union of all AMDGCN features.
+ Features["atomic-ds-pk-add-16-insts"] = true;
+ Features["atomic-flat-pk-add-16-insts"] = true;
+ Features["atomic-buffer-global-pk-add-f16-insts"] = true;
+ Features["atomic-global-pk-add-bf16-inst"] = true;
+ Features["atomic-fadd-rtn-insts"] = true;
+ Features["ci-insts"] = true;
+ Features["dot1-insts"] = true;
+ Features["dot2-insts"] = true;
+ Features["dot3-insts"] = true;
+ Features["dot4-insts"] = true;
+ Features["dot5-insts"] = true;
+ Features["dot7-insts"] = true;
+ Features["dot8-insts"] = true;
+ Features["dot9-insts"] = true;
+ Features["dot10-insts"] = true;
+ Features["dot11-insts"] = true;
+ Features["dl-insts"] = true;
+ Features["16-bit-insts"] = true;
+ Features["dpp"] = true;
+ Features["gfx8-insts"] = true;
+ Features["gfx9-insts"] = true;
+ Features["gfx90a-insts"] = true;
+ Features["gfx940-insts"] = true;
+ Features["gfx10-insts"] = true;
+ Features["gfx10-3-insts"] = true;
+ Features["gfx11-insts"] = true;
+ Features["gfx12-insts"] = true;
+ Features["image-insts"] = true;
+ Features["fp8-conversion-insts"] = true;
+ Features["s-memrealtime"] = true;
+ Features["s-memtime-inst"] = true;
+ Features["gws"] = true;
+ Features["fp8-insts"] = true;
+ Features["fp8-conversion-insts"] = true;
+ Features["atomic-ds-pk-add-16-insts"] = true;
+ Features["mai-insts"] = true;
+
+ return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec);
+}
+
+bool SPIRV64AMDGCNTargetInfo::validateAsmConstraint(
+ const char *&Name, TargetInfo::ConstraintInfo &Info) const {
+ // This is a 1:1 copy of AMDGPUTargetInfo::validateAsmConstraint()
+ static const ::llvm::StringSet<> SpecialRegs({
+ "exec", "vcc", "flat_scratch", "m0", "scc", "tba", "tma",
+ "flat_scratch_lo", "flat_scratch_hi", "vcc_lo", "vcc_hi", "exec_lo",
+ "exec_hi", "tma_lo", "tma_hi", "tba_lo", "tba_hi",
+ });
+
+ switch (*Name) {
+ case 'I':
+ Info.setRequiresImmediate(-16, 64);
+ return true;
+ case 'J':
+ Info.setRequiresImmediate(-32768, 32767);
+ return true;
+ case 'A':
+ case 'B':
+ case 'C':
+ Info.setRequiresImmediate();
+ return true;
+ default:
+ break;
+ }
+
+ StringRef S(Name);
+
+ if (S == "DA" || S == "DB") {
+ Name++;
+ Info.setRequiresImmediate();
+ return true;
+ }
+
+ bool HasLeftParen = S.consume_front("{");
+ if (S.empty())
+ return false;
+ if (S.front() != 'v' && S.front() != 's' && S.front() != 'a') {
+ if (!HasLeftParen)
+ return false;
+ auto E = S.find('}');
+ if (!SpecialRegs.count(S.substr(0, E)))
+ return false;
+ S = S.drop_front(E + 1);
+ if (!S.empty())
+ return false;
+ // Found {S} where S is a special register.
+ Info.setAllowsRegister();
+ Name = S.data() - 1;
+ return true;
+ }
+ S = S.drop_front();
+ if (!HasLeftParen) {
+ if (!S.empty())
+ return false;
+ // Found s, v or a.
+ Info.setAllowsRegister();
+ Name = S.data() - 1;
+ return true;
+ }
+ bool HasLeftBracket = S.consume_front("[");
+ unsigned long long N;
+ if (S.empty() || consumeUnsignedInteger(S, 10, N))
+ return false;
+ if (S.consume_front(":")) {
+ if (!HasLeftBracket)
+ return false;
+ unsigned long long M;
+ if (consumeUnsignedInteger(S, 10, M) || N >= M)
+ return false;
+ }
+ if (HasLeftBracket) {
+ if (!S.consume_front("]"))
+ return false;
+ }
+ if (!S.consume_front("}"))
+ return false;
+ if (!S.empty())
+ return false;
+ // Found {vn}, {sn}, {an}, {v[n]}, {s[n]}, {a[n]}, {v[n:m]}, {s[n:m]}
+ // or {a[n:m]}.
+ Info.setAllowsRegister();
+ Name = S.data() - 1;
+ return true;
+}
+
+std::string
+SPIRV64AMDGCNTargetInfo::convertConstraint(const char *&Constraint) const {
+ // This is a 1:1 copy of AMDGPUTargetInfo::convertConstraint()
+ StringRef S(Constraint);
+ if (S == "DA" || S == "DB") {
+ return std::string("^") + std::string(Constraint++, 2);
+ }
+
+ const char *Begin = Constraint;
+ TargetInfo::ConstraintInfo Info("", "");
+ if (validateAsmConstraint(Constraint, Info))
+ return std::string(Begin).substr(0, Constraint - Begin + 1);
+
+ Constraint = Begin;
+ return std::string(1, *Constraint);
+}
+
+ArrayRef<Builtin::Info> SPIRV64AMDGCNTargetInfo::getTargetBuiltins() const {
+ return llvm::ArrayRef(BuiltinInfo,
+ clang::AMDGPU::LastTSBuiltin - Builtin::FirstTSBuiltin);
+}
+
+void SPIRV64AMDGCNTargetInfo::getTargetDefines(const LangOptions &Opts,
+ MacroBuilder &Builder) const {
+ BaseSPIRVTargetInfo::getTargetDefines(Opts, Builder);
+ DefineStd(Builder, "SPIRV64", Opts);
+
+ Builder.defineMacro("__AMD__");
+ Builder.defineMacro("__AMDGPU__");
+ Builder.defineMacro("__AMDGCN__");
+}
+
+void SPIRV64AMDGCNTargetInfo::setAuxTarget(const TargetInfo *Aux) {
+ // This is a 1:1 copy of AMDGPUTargetInfo::setAuxTarget()
+ assert(HalfFormat == Aux->HalfFormat);
+ assert(FloatFormat == Aux->FloatFormat);
+ assert(DoubleFormat == Aux->DoubleFormat);
+
+ // On x86_64 long double is 80-bit extended precision format, which is
+ // not supported by AMDGPU. 128-bit floating point format is also not
+ // supported by AMDGPU. Therefore keep its own format for these two types.
+ auto SaveLongDoubleFormat = LongDoubleFormat;
+ auto SaveFloat128Format = Float128Format;
+ auto SaveLongDoubleWidth = LongDoubleWidth;
+ auto SaveLongDoubleAlign = LongDoubleAlign;
+ copyAuxTarget(Aux);
+ LongDoubleFormat = SaveLongDoubleFormat;
+ Float128Format = SaveFloat128Format;
+ LongDoubleWidth = SaveLongDoubleWidth;
+ LongDoubleAlign = SaveLongDoubleAlign;
+ // For certain builtin types support on the host target, claim they are
+ // supported to pass the compilation of the host code during the device-side
+ // compilation.
+ // FIXME: As the side effect, we also accept `__float128` uses in the device
+ // code. To reject these builtin types supported in the host target but not in
+ // the device target, one approach would support `device_builtin` attribute
+ // so that we could tell the device builtin types from the host ones. This
+ // also solves the different representations of the same builtin type, such
+ // as `size_t` in the MSVC environment.
+ if (Aux->hasFloat128Type()) {
+ HasFloat128 = true;
+ Float128Format = DoubleFormat;
+ }
+}
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 44265445ff004b..6b605979c9ab1d 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -364,6 +364,57 @@ class LLVM_LIBRARY_VISIBILITY SPIRV64TargetInfo : public BaseSPIRVTargetInfo {
MacroBuilder &Builder) const override;
};
+class LLVM_LIBRARY_VISIBILITY SPIRV64AMDGCNTargetInfo
+ : public BaseSPIRVTargetInfo {
+public:
+ SPIRV64AMDGCNTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
+ : BaseSPIRVTargetInfo(Triple, Opts) {
+ assert(Triple.getArch() == llvm::Triple::spirv64 &&
+ "Invalid architecture for 64-bit AMDGCN SPIR-V.");
+ assert(Triple.getVendor() == llvm::Triple::VendorType::AMD &&
+ "64-bit AMDGCN SPIR-V target must use AMD vendor");
+ assert(getTriple().getOS() == llvm::Triple::OSType::AMDHSA &&
+ "64-bit AMDGCN SPIR-V target must use AMDHSA OS");
+ assert(getTriple().getEnvironment() == llvm::Triple::UnknownEnvironment &&
+ "64-bit SPIR-V target must use unknown environment type");
+ PointerWidth = PointerAlign = 64;
+ SizeType = TargetInfo::UnsignedLong;
+ PtrDiffType = IntPtrType = TargetInfo::SignedLong;
+
+ resetDataLayout("e-i64:64-v16:16-v24:32-v32:32-v48:64-"
+ "v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0");
+
+ BFloat16Width = BFloat16Align = 16;
+ BFloat16Format = &llvm::APFloat::BFloat();
+
+ HasLegalHalfType = true;
+ HasFloat16 = true;
+ HalfArgsAndReturns = true;
+ }
+
+ bool hasBFloat16Type() const override { return true; }
+
+ ArrayRef<const char *> getGCCRegNames() const override;
+
+ bool initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags,
+ StringRef,
+ const std::vector<std::string> &) const override;
+
+ bool validateAsmConstraint(const char *&Name,
+ TargetInfo::ConstraintInfo &Info) const override;
+
+ std::string convertConstraint(const char *&Constraint) const override;
+
+ ArrayRef<Builtin::Info> getTargetBuiltins() const override;
+
+ void getTargetDefines(const LangOptions &Opts,
+ MacroBuilder &Builder) const override;
+
+ void setAuxTarget(const TargetInfo *Aux) override;
+
+ bool hasInt128Type() const override { return TargetInfo::hasInt128Type(); }
+};
+
} // namespace targets
} // namespace clang
#endif // LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 7e5f2edfc732cc..db64b0a436a095 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -6083,6 +6083,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
StringRef Prefix =
llvm::Triple::getArchTypePrefix(getTarget().getTriple().getArch());
if (!Prefix.empty()) {
+ if (Prefix == "spv" &&
+ getTarget().getTriple().getOS() == llvm::Triple::OSType::AMDHSA)
+ Prefix = "amdgcn";
IntrinsicID = Intrinsic::getIntrinsicForClangBuiltin(Prefix.data(), Name);
// NOTE we don't need to perform a compatibility flag check here since the
// intrinsics are declared in Builtins*.def via LANGBUILTIN which filter the
@@ -6254,6 +6257,10 @@ static Value *EmitTargetArchBuiltinExpr(CodeGenFunction *CGF,
case llvm::Triple::riscv32:
case llvm::Triple::riscv64:
return CGF->EmitRISCVBuiltinExpr(BuiltinID, E, ReturnValue);
+ case llvm::Triple::spirv64:
+ if (CGF->getTarget().getTriple().getOS() != llvm::Triple::OSType::AMDHSA)
+ return nullptr;
+ return CGF->EmitAMDGPUBuiltinExpr(BuiltinID, E);
default:
return nullptr;
}
diff --git a/clang/test/CodeGen/target-data.c b/clang/test/CodeGen/target-data.c
index c184f314f68f80..1d40b8fe46063d 100644
--- a/clang/test/CodeGen/target-data.c
+++ b/clang/test/CodeGen/target-data.c
@@ -268,3 +268,7 @@
// RUN: %clang_cc1 -triple ve -o - -emit-llvm %s | \
// RUN: FileCheck %s -check-prefix=VE
// VE: target datalayout = "e-m:e-i64:64-n32:64-S128-v64:64:64-v128:64:64-v256:64:64-v512:64:64-v1024:64:64-v2048:64:64-v4096:64:64-v8192:64:64-v16384:64:64"
+
+// RUN: %clang_cc1 -triple spirv64-amd -o - -emit-llvm %s | \
+// RUN: FileCheck %s -check-prefix=SPIR64
+// AMDGPUSPIRV64: target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-G1-P4-A0"
diff --git a/clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu b/clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu
new file mode 100644
index 00000000000000..8dbb8c538ddc16
--- /dev/null
+++ b/clang/test/CodeGenCUDA/builtins-spirv-amdgcn.cu
@@ -0,0 +1,294 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -x hip \
+// RUN: -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -emit-llvm %s \
+// RUN: -o - | FileCheck %s
+
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -x hip \
+// RUN: -aux-triple x86_64-pc-windows-msvc -fcuda-is-device -emit-llvm %s \
+// RUN: -o - | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// CHECK-LABEL: @_Z16use_dispatch_ptrPi(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[OUT:%.*]] = alloca ptr addrspace(4), align 8
+// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr addrspace(4), align 8
+// CHECK-NEXT: [[DISPATCH_PTR:%.*]] = alloca ptr addrspace(4), align 8
+// CHECK-NEXT: [[OUT_ASCAST:%.*]] = addrspacecast ptr [[OUT]] to ptr addrspace(4)
+// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr [[OUT_ADDR]] to ptr addrspace(4)
+// CHECK-NEXT: [[DISPATCH_PTR_ASCAST:%.*]] = addrspacecast ptr [[DISPATCH_PTR]] to ptr addrspace(4)
+// CHECK-NEXT: [[TMP0:%.*]] = addrspacecast ptr addrspace(1) [[OUT_COERCE:%.*]] to ptr addrspace(4)
+// CHECK-NEXT: store ptr addrspace(4) [[TMP0]], ptr addrspace(4) [[OUT_ASCAST]], align 8
+// CHECK-NEXT: [[OUT1:%.*]] = load ptr addrspace(4), ptr addrspace(4) [[OUT_ASCAST]], align 8
+// CHECK-NEXT: store ptr addrspace(4) [[OUT1]], ptr addrspace(4) [[OUT_ADDR_ASCAST]], align 8
+// CHECK-NEXT: [[TMP1:%.*]] = call align 4 dereferenceable(64) addrspace(4) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT: store ptr addrspace(4) [[TMP1]], ptr addrspace(4) [[DISPATCH_PTR_ASCAST]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = load ptr addrspace(4), ptr addrspace(4) [[DISPATCH_PTR_ASCAST]], align 8
+// CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr addrspace(4) [[TMP2]], align 4
+// CHECK-NEXT: [[TMP4:%.*]] = load ptr addrspace(4), ptr addrspace(4) [[OUT_ADDR_ASCAST]], align 8
+// CHECK-NEXT: store i32 [[TMP3]], ptr addrspace(4) [[TMP4]], align 4
+// CHECK-NEXT: ret void
+//
+__global__ void use_dispatch_ptr(int* out) {
+ const int* dispatch_ptr = (co...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
clang/lib/Basic/Targets/SPIR.cpp
Outdated
"v225", "v226", "v227", "v228", "v229", "v230", "v231", "v232", "v233", | ||
"v234", "v235", "v236", "v237", "v238", "v239", "v240", "v241", "v242", | ||
"v243", "v244", "v245", "v246", "v247", "v248", "v249", "v250", "v251", | ||
"v252", "v253", "v254", "v255", "s0", "s1", "s2", "s3", "s4", | ||
"s5", "s6", "s7", "s8", "s9", "s10", "s11", "s12", "s13", | ||
"s14", "s15", "s16", "s17", "s18", "s19", "s20", "s21", "s22", | ||
"s23", "s24", "s25", "s26", "s27", "s28", "s29", "s30", "s31", | ||
"s32", "s33", "s34", "s35", "s36", "s37", "s38", "s39", "s40", | ||
"s41", "s42", "s43", "s44", "s45", "s46", "s47", "s48", "s49", | ||
"s50", "s51", "s52", "s53", "s54", "s55", "s56", "s57", "s58", | ||
"s59", "s60", "s61", "s62", "s63", "s64", "s65", "s66", "s67", | ||
"s68", "s69", "s70", "s71", "s72", "s73", "s74", "s75", "s76", | ||
"s77", "s78", "s79", "s80", "s81", "s82", "s83", "s84", "s85", | ||
"s86", "s87", "s88", "s89", "s90", "s91", "s92", "s93", "s94", | ||
"s95", "s96", "s97", "s98", "s99", "s100", "s101", "s102", "s103", | ||
"s104", "s105", "s106", "s107", "s108", "s109", "s110", "s111", "s112", | ||
"s113", "s114", "s115", "s116", "s117", "s118", "s119", "s120", "s121", | ||
"s122", "s123", "s124", "s125", "s126", "s127", "exec", "vcc", "scc", | ||
"m0", "flat_scratch", "exec_lo", "exec_hi", "vcc_lo", "vcc_hi", | ||
"flat_scratch_lo", "flat_scratch_hi", | ||
"a0", "a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", | ||
"a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17", | ||
"a18", "a19", "a20", "a21", "a22", "a23", "a24", "a25", "a26", | ||
"a27", "a28", "a29", "a30", "a31", "a32", "a33", "a34", "a35", | ||
"a36", "a37", "a38", "a39", "a40", "a41", "a42", "a43", "a44", | ||
"a45", "a46", "a47", "a48", "a49", "a50", "a51", "a52", "a53", | ||
"a54", "a55", "a56", "a57", "a58", "a59", "a60", "a61", "a62", | ||
"a63", "a64", "a65", "a66", "a67", "a68", "a69", "a70", "a71", | ||
"a72", "a73", "a74", "a75", "a76", "a77", "a78", "a79", "a80", | ||
"a81", "a82", "a83", "a84", "a85", "a86", "a87", "a88", "a89", | ||
"a90", "a91", "a92", "a93", "a94", "a95", "a96", "a97", "a98", | ||
"a99", "a100", "a101", "a102", "a103", "a104", "a105", "a106", "a107", | ||
"a108", "a109", "a110", "a111", "a112", "a113", "a114", "a115", "a116", | ||
"a117", "a118", "a119", "a120", "a121", "a122", "a123", "a124", "a125", | ||
"a126", "a127", "a128", "a129", "a130", "a131", "a132", "a133", "a134", | ||
"a135", "a136", "a137", "a138", "a139", "a140", "a141", "a142", "a143", | ||
"a144", "a145", "a146", "a147", "a148", "a149", "a150", "a151", "a152", | ||
"a153", "a154", "a155", "a156", "a157", "a158", "a159", "a160", "a161", | ||
"a162", "a163", "a164", "a165", "a166", "a167", "a168", "a169", "a170", | ||
"a171", "a172", "a173", "a174", "a175", "a176", "a177", "a178", "a179", | ||
"a180", "a181", "a182", "a183", "a184", "a185", "a186", "a187", "a188", | ||
"a189", "a190", "a191", "a192", "a193", "a194", "a195", "a196", "a197", | ||
"a198", "a199", "a200", "a201", "a202", "a203", "a204", "a205", "a206", | ||
"a207", "a208", "a209", "a210", "a211", "a212", "a213", "a214", "a215", | ||
"a216", "a217", "a218", "a219", "a220", "a221", "a222", "a223", "a224", | ||
"a225", "a226", "a227", "a228", "a229", "a230", "a231", "a232", "a233", | ||
"a234", "a235", "a236", "a237", "a238", "a239", "a240", "a241", "a242", | ||
"a243", "a244", "a245", "a246", "a247", "a248", "a249", "a250", "a251", | ||
"a252", "a253", "a254", "a255" | ||
}; | ||
|
||
} // anonymous namespace | ||
|
||
ArrayRef<const char *> SPIRV64AMDGCNTargetInfo::getGCCRegNames() const { | ||
return llvm::ArrayRef(AMDGPUGCCRegNames); | ||
} | ||
|
||
bool SPIRV64AMDGCNTargetInfo::initFeatureMap( | ||
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef, | ||
const std::vector<std::string> &FeatureVec) const { | ||
// This represents the union of all AMDGCN features. | ||
Features["atomic-ds-pk-add-16-insts"] = true; | ||
Features["atomic-flat-pk-add-16-insts"] = true; | ||
Features["atomic-buffer-global-pk-add-f16-insts"] = true; | ||
Features["atomic-global-pk-add-bf16-inst"] = true; | ||
Features["atomic-fadd-rtn-insts"] = true; | ||
Features["ci-insts"] = true; | ||
Features["dot1-insts"] = true; | ||
Features["dot2-insts"] = true; | ||
Features["dot3-insts"] = true; | ||
Features["dot4-insts"] = true; | ||
Features["dot5-insts"] = true; | ||
Features["dot7-insts"] = true; | ||
Features["dot8-insts"] = true; | ||
Features["dot9-insts"] = true; | ||
Features["dot10-insts"] = true; | ||
Features["dot11-insts"] = true; | ||
Features["dl-insts"] = true; | ||
Features["16-bit-insts"] = true; | ||
Features["dpp"] = true; | ||
Features["gfx8-insts"] = true; | ||
Features["gfx9-insts"] = true; | ||
Features["gfx90a-insts"] = true; | ||
Features["gfx940-insts"] = true; | ||
Features["gfx10-insts"] = true; | ||
Features["gfx10-3-insts"] = true; | ||
Features["gfx11-insts"] = true; | ||
Features["gfx12-insts"] = true; | ||
Features["image-insts"] = true; | ||
Features["fp8-conversion-insts"] = true; | ||
Features["s-memrealtime"] = true; | ||
Features["s-memtime-inst"] = true; | ||
Features["gws"] = true; | ||
Features["fp8-insts"] = true; | ||
Features["fp8-conversion-insts"] = true; | ||
Features["atomic-ds-pk-add-16-insts"] = true; | ||
Features["mai-insts"] = true; | ||
|
||
return TargetInfo::initFeatureMap(Features, Diags, {}, FeatureVec); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want to maintain 2 copies of either of these things. Can we just move this to a common place? I thought we already moved the Features gunk into lib/TargetSupport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a common place for the register names, the current design assumes they're a private static detail of a target - suggestions welcome though, I agree the duplication is pretty terrible. As for the features, if you're thinking about extending and then re-using AMDGPU::fillAMDGPUFeatureMap
, I can do that, and it's probably a better solution I had not considered. Is that what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this is actually different from the existing target info for AMDGCN? Seems like we're doing a lot of redundant stuff like defining macros or features.
That's part of the point, it's not actually supposed to differ in those particular regards, up to the point where things fork into specific GFXIPs. At the same time, there's no feasible way to re-use any of that, at least not one that I can see with how targets currently work. If you're suggesting that this should actually be based on AMDGPUTargetInfo, that's probably not the right way to go since that sets additional things that do not work with SPIRV at all. |
Yeah, I was unsure how much of this is a subset. We could pull the common stuff into some new base class that both targets then inherit from, but it depends how much code we actually save with that method. I think I agree at the very least we should try to avoid duplicating the register list. |
That's not a bad idea but I suspect we'll run into a physical design issue since there doesn't seem to be a natural place to put the shared base - unless you were thinking about a place in particular? We'd probably have to relocate this to the AMDGCN side, and then duplicate the SPIRV bits/details, and at that point we could just re-use/share the reg defs. |
Depending on how AMDGCN flavored this is, we could concievably just put it in the same |
clang/lib/Basic/Targets/SPIR.cpp
Outdated
}; | ||
|
||
namespace { | ||
const char *AMDGPUGCCRegNames[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could create a common base class for amdgpu target info and spirv64 amdgcn target info and not duplicate stuff. we can define a static member for reg names.
@bader @asudarsa @michalpaszkowski any opinions from the SPIRV side? I would like to merge this so as to be able to progress some related work, but I'd rather not squat on SPIRV real-estate without an ACK from the landlords:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with the patch in general, but I'd like either @michalpaszkowski or @VyacheslavLevytskyy to take a look.
Just one question about doubling the test scope for LLVM tests with spir64 target triple.
@@ -1,4 +1,5 @@ | |||
; RUN: llc -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s | |||
; RUN: llc -O0 -mtriple=spirv64-amd-amdhsa %s -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there is a value in testing spirv64-amd-amdhsa
triple in addition to spirv64-unknown-unknown
for most of llvm/test/CodeGen/SPIRV tests. It makes sense to extend testing for vendor extensions mentioned in the description, but I suppose most of the tests cover core functionality which should work any spir64-*
triple. Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, you are right; my thinking was to prevent anyone getting ideas and diverging the "flavoured" variant in ways that breaks its core functionality as a spirv64
target, but that might be needlessly defensive. If we don't feel there's much value here, I'll revert the test spam around core functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rolled back most of these, except for a couple of places where there is a substantive difference (spirv64-amdgcn-amdhsa
encodes the program address space in its DL, which yields slightly different IR in places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with either approach. I think we can assume that the core functionality will stay the same for all "flavors". In case of any changes in the future to this assumption, we could create larger tests covering multiple features, which will not provide any isolation value but just ensure the features are still there.
+1 on this. I will take a look before end of week on this. Thanks |
; CHECK: OpFunctionEnd | ||
; CHECK: %[[DefFunFp]] = OpFunction %[[TyVoid]] None %[[TyFunFp]] | ||
|
||
target triple = "spir64-unknown-unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this line is not needed here?
; CHECK: OpReturn | ||
; CHECK: OpFunctionEnd | ||
|
||
target triple = "spir64-unknown-unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, I guess we don't need this line.
%0 = call spir_func addrspace(4) float %fp(ptr addrspace(4) %bar) | ||
%1 = call spir_func addrspace(4) i64 %bar(ptr addrspace(4) %fp, ptr addrspace(4) %data) | ||
ret void | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a general suggestion, if I may, about those two function pointers tests. Neither the extension itself is perfect, nor its current implementation in SPIR-V Backend. About a half of Khronos Translator test cases for the function pointers extension crash when running with our SPIR-V Backend implementation, and this is a near to-do item in our development plans.
Giving all these, probably it's better to freeze in tests as little of function pointers implementation logics as it's possible -- until the implementation is completed in full and all Khronos Translator test cases are successful. In terms of this PR this means that probably it's better to strip those 2 tests of details whenever this is possible, unless you consider each existing CHECK in those tests important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense; initially we are planning to rely on the translator, until the BE makes it out of the experimental phase, so I think I can remove these and re-add them (if necessary) when that happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexVlx LGTM! Thank you!
Thank you everyone for the reviews! |
This change seeks to add support for vendor flavoured SPIRV - more specifically, AMDGCN flavoured SPIRV. The aim is to generate SPIRV that carries some extra bits of information that are only usable by AMDGCN targets, forfeiting absolute genericity to obtain greater expressiveness for target features:
--spirv-allow-unknown-intrinsics
option is enabled when using the downstream translatorExisting AMDGCN tests are extended to cover this new target. It is currently dormant / will require some additional changes, but I thought I'd rather put it up for review to get feedback as early as possible. I will note that an alternative option is to place this under AMDGPU, but that seems slightly less natural, since this is still SPIRV, albeit relaxed in terms of preconditions & constrained in terms of postconditions, and only guaranteed to be usable on AMDGCN targets (it is still possible to obtain pristine portable SPIRV through usage of the flavoured target, though).