-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Clang][AArch64] Generalise streaming mode checks for builtins. #93802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang][AArch64] Generalise streaming mode checks for builtins. #93802
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang Author: Sander de Smalen (sdesmalen-arm) ChangesPR #76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked The problem is a bit more complex than only this case. For example, we've marked If we were to add the appropriate target guards, we'd have to add many more To avoid having to add more combinations (and avoid having to add more in the For example: 'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2
Patch is 217.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93802.diff 6 Files Affected:
diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td
index 88938a981fd8a..09d34cb048813 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -19,27 +19,27 @@ include "arm_sve_sme_incl.td"
// Loads
// Load one vector (scalar base)
-def SVLD1 : MInst<"svld1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ld1">;
-def SVLD1SB : MInst<"svld1sb_{d}", "dPS", "silUsUiUl", [IsLoad, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_ld1">;
-def SVLD1UB : MInst<"svld1ub_{d}", "dPW", "silUsUiUl", [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_ld1">;
-def SVLD1SH : MInst<"svld1sh_{d}", "dPT", "ilUiUl", [IsLoad, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_ld1">;
-def SVLD1UH : MInst<"svld1uh_{d}", "dPX", "ilUiUl", [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_ld1">;
-def SVLD1SW : MInst<"svld1sw_{d}", "dPU", "lUl", [IsLoad, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_ld1">;
-def SVLD1UW : MInst<"svld1uw_{d}", "dPY", "lUl", [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_ld1">;
+def SVLD1 : MInst<"svld1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ld1">;
+def SVLD1SB : MInst<"svld1sb_{d}", "dPS", "silUsUiUl", [IsLoad, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_ld1">;
+def SVLD1UB : MInst<"svld1ub_{d}", "dPW", "silUsUiUl", [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_ld1">;
+def SVLD1SH : MInst<"svld1sh_{d}", "dPT", "ilUiUl", [IsLoad, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_ld1">;
+def SVLD1UH : MInst<"svld1uh_{d}", "dPX", "ilUiUl", [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_ld1">;
+def SVLD1SW : MInst<"svld1sw_{d}", "dPU", "lUl", [IsLoad, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_ld1">;
+def SVLD1UW : MInst<"svld1uw_{d}", "dPY", "lUl", [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_ld1">;
let TargetGuard = "sve,bf16" in {
- def SVLD1_BF : MInst<"svld1[_{2}]", "dPc", "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ld1">;
- def SVLD1_VNUM_BF : MInst<"svld1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ld1">;
+ def SVLD1_BF : MInst<"svld1[_{2}]", "dPc", "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ld1">;
+ def SVLD1_VNUM_BF : MInst<"svld1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ld1">;
}
// Load one vector (scalar base, VL displacement)
-def SVLD1_VNUM : MInst<"svld1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ld1">;
-def SVLD1SB_VNUM : MInst<"svld1sb_vnum_{d}", "dPSl", "silUsUiUl", [IsLoad, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_ld1">;
-def SVLD1UB_VNUM : MInst<"svld1ub_vnum_{d}", "dPWl", "silUsUiUl", [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_ld1">;
-def SVLD1SH_VNUM : MInst<"svld1sh_vnum_{d}", "dPTl", "ilUiUl", [IsLoad, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_ld1">;
-def SVLD1UH_VNUM : MInst<"svld1uh_vnum_{d}", "dPXl", "ilUiUl", [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_ld1">;
-def SVLD1SW_VNUM : MInst<"svld1sw_vnum_{d}", "dPUl", "lUl", [IsLoad, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_ld1">;
-def SVLD1UW_VNUM : MInst<"svld1uw_vnum_{d}", "dPYl", "lUl", [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_ld1">;
+def SVLD1_VNUM : MInst<"svld1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ld1">;
+def SVLD1SB_VNUM : MInst<"svld1sb_vnum_{d}", "dPSl", "silUsUiUl", [IsLoad, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_ld1">;
+def SVLD1UB_VNUM : MInst<"svld1ub_vnum_{d}", "dPWl", "silUsUiUl", [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_ld1">;
+def SVLD1SH_VNUM : MInst<"svld1sh_vnum_{d}", "dPTl", "ilUiUl", [IsLoad, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_ld1">;
+def SVLD1UH_VNUM : MInst<"svld1uh_vnum_{d}", "dPXl", "ilUiUl", [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_ld1">;
+def SVLD1SW_VNUM : MInst<"svld1sw_vnum_{d}", "dPUl", "lUl", [IsLoad, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_ld1">;
+def SVLD1UW_VNUM : MInst<"svld1uw_vnum_{d}", "dPYl", "lUl", [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_ld1">;
// Load one vector (vector base)
def SVLD1_GATHER_BASES_U : MInst<"svld1_gather[_{2}base]_{d}", "dPu", "ilUiUlfd", [IsGatherLoad], MemEltTyDefault, "aarch64_sve_ld1_gather_scalar_offset">;
@@ -243,27 +243,27 @@ let TargetGuard = "sve,bf16" in {
}
// Load one vector, unextended load, non-temporal (scalar base)
-def SVLDNT1 : MInst<"svldnt1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
+def SVLDNT1 : MInst<"svldnt1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
// Load one vector, unextended load, non-temporal (scalar base, VL displacement)
-def SVLDNT1_VNUM : MInst<"svldnt1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
+def SVLDNT1_VNUM : MInst<"svldnt1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
let TargetGuard = "sve,bf16" in {
- def SVLDNT1_BF : MInst<"svldnt1[_{2}]", "dPc", "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
- def SVLDNT1_VNUM_BF : MInst<"svldnt1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
+ def SVLDNT1_BF : MInst<"svldnt1[_{2}]", "dPc", "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
+ def SVLDNT1_VNUM_BF : MInst<"svldnt1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
}
// Load one quadword and replicate (scalar base)
-def SVLD1RQ : SInst<"svld1rq[_{2}]", "dPc", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_ld1rq", [IsStreamingCompatible]>;
+def SVLD1RQ : SInst<"svld1rq[_{2}]", "dPc", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_ld1rq", [IsSVEOrStreamingSVE]>;
let TargetGuard = "sve,bf16" in {
- def SVLD1RQ_BF : SInst<"svld1rq[_{2}]", "dPc", "b", MergeNone, "aarch64_sve_ld1rq", [IsStreamingCompatible]>;
+ def SVLD1RQ_BF : SInst<"svld1rq[_{2}]", "dPc", "b", MergeNone, "aarch64_sve_ld1rq", [IsSVEOrStreamingSVE]>;
}
multiclass StructLoad<string name, string proto, string i> {
- def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructLoad, IsStreamingCompatible]>;
+ def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructLoad, IsSVEOrStreamingSVE]>;
let TargetGuard = "sve,bf16" in {
- def: SInst<name, proto, "b", MergeNone, i, [IsStructLoad, IsStreamingCompatible]>;
+ def: SInst<name, proto, "b", MergeNone, i, [IsStructLoad, IsSVEOrStreamingSVE]>;
}
}
@@ -286,16 +286,16 @@ let TargetGuard = "sve,f64mm,bf16" in {
}
let TargetGuard = "sve,bf16" in {
- def SVBFDOT : SInst<"svbfdot[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfdot", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFMLALB : SInst<"svbfmlalb[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmlalb", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFMLALT : SInst<"svbfmlalt[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmlalt", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFMMLA : SInst<"svbfmmla[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmmla", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFDOT_N : SInst<"svbfdot[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfdot", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFMLAL_N : SInst<"svbfmlalb[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfmlalb", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFMLALT_N : SInst<"svbfmlalt[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfmlalt", [IsOverloadNone, IsStreamingCompatible]>;
- def SVBFDOT_LANE : SInst<"svbfdot_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfdot_lane_v2", [IsOverloadNone, IsStreamingCompatible], [ImmCheck<3, ImmCheck0_3>]>;
- def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalb_lane_v2", [IsOverloadNone, IsStreamingCompatible], [ImmCheck<3, ImmCheck0_7>]>;
- def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone, IsStreamingCompatible], [ImmCheck<3, ImmCheck0_7>]>;
+ def SVBFDOT : SInst<"svbfdot[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfdot", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFMLALB : SInst<"svbfmlalb[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmlalb", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFMLALT : SInst<"svbfmlalt[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmlalt", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFMMLA : SInst<"svbfmmla[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmmla", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFDOT_N : SInst<"svbfdot[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfdot", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFMLAL_N : SInst<"svbfmlalb[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfmlalb", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFMLALT_N : SInst<"svbfmlalt[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfmlalt", [IsOverloadNone, IsSVEOrStreamingSVE]>;
+ def SVBFDOT_LANE : SInst<"svbfdot_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfdot_lane_v2", [IsOverloadNone, IsSVEOrStreamingSVE], [ImmCheck<3, ImmCheck0_3>]>;
+ def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalb_lane_v2", [IsOverloadNone, IsSVEOrStreamingSVE], [ImmCheck<3, ImmCheck0_7>]>;
+ def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone, IsSVEOrStreamingSVE], [ImmCheck<3, ImmCheck0_7>]>;
}
let TargetGuard = "sve2p1" in {
@@ -334,26 +334,26 @@ let TargetGuard = "sve2p1" in {
// Stores
// Store one vector (scalar base)
-def SVST1 : MInst<"svst1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
-def SVST1B_S : MInst<"svst1b[_{d}]", "vPAd", "sil", [IsStore, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_st1">;
-def SVST1B_U : MInst<"svst1b[_{d}]", "vPEd", "UsUiUl", [IsStore, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_st1">;
-def SVST1H_S : MInst<"svst1h[_{d}]", "vPBd", "il", [IsStore, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_st1">;
-def SVST1H_U : MInst<"svst1h[_{d}]", "vPFd", "UiUl", [IsStore, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_st1">;
-def SVST1W_S : MInst<"svst1w[_{d}]", "vPCd", "l", [IsStore, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_st1">;
-def SVST1W_U : MInst<"svst1w[_{d}]", "vPGd", "Ul", [IsStore, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_st1">;
+def SVST1 : MInst<"svst1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
+def SVST1B_S : MInst<"svst1b[_{d}]", "vPAd", "sil", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_st1">;
+def SVST1B_U : MInst<"svst1b[_{d}]", "vPEd", "UsUiUl", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_st1">;
+def SVST1H_S : MInst<"svst1h[_{d}]", "vPBd", "il", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_st1">;
+def SVST1H_U : MInst<"svst1h[_{d}]", "vPFd", "UiUl", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_st1">;
+def SVST1W_S : MInst<"svst1w[_{d}]", "vPCd", "l", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_st1">;
+def SVST1W_U : MInst<"svst1w[_{d}]", "vPGd", "Ul", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_st1">;
// Store one vector (scalar base, VL displacement)
-def SVST1_VNUM : MInst<"svst1_vnum[_{d}]", "vPpld", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
-def SVST1B_VNUM_S : MInst<"svst1b_vnum[_{d}]", "vPAld", "sil", [IsStore, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_st1">;
-def SVST1B_VNUM_U : MInst<"svst1b_vnum[_{d}]", "vPEld", "UsUiUl", [IsStore, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_st1">;
-def SVST1H_VNUM_S : MInst<"svst1h_vnum[_{d}]", "vPBld", "il", [IsStore, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_st1">;
-def SVST1H_VNUM_U : MInst<"svst1h_vnum[_{d}]", "vPFld", "UiUl", [IsStore, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_st1">;
-def SVST1W_VNUM_S : MInst<"svst1w_vnum[_{d}]", "vPCld", "l", [IsStore, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_st1">;
-def SVST1W_VNUM_U : MInst<"svst1w_vnum[_{d}]", "vPGld", "Ul", [IsStore, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_st1">;
+def SVST1_VNUM : MInst<"svst1_vnum[_{d}]", "vPpld", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
+def SVST1B_VNUM_S : MInst<"svst1b_vnum[_{d}]", "vPAld", "sil", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_st1">;
+def SVST1B_VNUM_U : MInst<"svst1b_vnum[_{d}]", "vPEld", "UsUiUl", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_st1">;
+def SVST1H_VNUM_S : MInst<"svst1h_vnum[_{d}]", "vPBld", "il", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_st1">;
+def SVST1H_VNUM_U : MInst<"svst1h_vnum[_{d}]", "vPFld", "UiUl", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_st1">;
+def SVST1W_VNUM_S : MInst<"svst1w_vnum[_{d}]", "vPCld", "l", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_st1">;
+def SVST1W_VNUM_U : MInst<"svst1w_vnum[_{d}]", "vPGld", "Ul", [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_st1">;
let TargetGuard = "sve,bf16" in {
- def SVST1_BF : MInst<"svst1[_{d}]", "vPpd", "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
- def SVST1_VNUM_BF : MInst<"svst1_vnum[_{d}]", "vPpld", "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
+ def SVST1_BF : MInst<"svst1[_{d}]", "vPpd", "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
+ def SVST1_VNUM_BF : MInst<"svst1_vnum[_{d}]", "vPpld", "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
}
// Store one vector (vector base)
@@ -426,9 +426,9 @@ def SVST1H_SCATTER_INDEX_S : MInst<"svst1h_scatter[_{2}base]_index[_{d}]", "v
def SVST1W_SCATTER_INDEX_S : MInst<"svst1w_scatter[_{2}base]_index[_{d}]", "vPuld", "lUl", [IsScatterStore], MemEltTyInt32, "aarch64_sve_st1_scatter_scalar_offset">;
multiclass StructStore<string name, string proto, string i> {
- def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructStore, IsStreamingCompatible]>;
+ def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructStore, IsSVEOrStreamingSVE]>;
let TargetGuard = "sve,bf16" in {
- def: SInst<name, proto, "b", MergeNone, i, [IsStructStore, IsStreamingCompatible]>;
+ def: SInst<name, proto, "b", MergeNone, i, [IsStructStore, IsSVEOrStreamingSVE]>;
}
}
// Store N vectors into N-element structure (scalar base)
@@ -442,14 +442,14 @@ defm SVST3_VNUM : StructStore<"svst3_vnum[_{d}]", "vPpl3", "aarch64_sve_st3">;
defm SVST4_VNUM : StructStore<"svst4_vnum[_{d}]", "vPpl4", "aarch64_sve_st4">;
// Store one vector, with no truncation, non-temporal (scalar base)
-def SVSTNT1 : MInst<"svstnt1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
+def SVSTNT1 : MInst<"svstnt1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
// Store one vector, with no truncation, non-temporal (scalar base, VL displacement)
-def SVSTNT1_VNUM : MInst<"svstnt1_vnum[_{d}]", "vPpld", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
+def SVSTNT1_VNUM : MInst<"svstnt1_vnum[_{d}]", "vPpld", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
let TargetGuard = "sve,bf16" in {
- def SVSTNT1_BF : MInst<"svstnt1[_{d}]", "vPpd", "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
- def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
+ def SVSTNT1_BF : MInst<"svstnt1[_{d}]", "vPpd", "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
+ def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
}
let TargetGuard = "sve2p1" in {
@@ -488,16 +488,16 @@ let TargetGuard = "sve2p1" in {
// Prefetches
// Prefetch (Scalar base)
-def SVPRFB : MInst<"svprfb", "vPQJ", "c", [IsPrefetch, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_prf">;
-def SVPRFH : MInst<"svprfh", "vPQJ", "s", [IsPrefetch, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_prf">;
-def SVPRFW : MInst<"svprfw", "vPQJ", "i", [IsPrefetch, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_prf">;
-def SVPRFD : MInst<"svprfd", "vPQJ", "l", [IsPrefetch, IsStreamingCompatible], MemEltTyInt64, "aarch64_sve_prf">;
+def SVPRFB : MInst<"svprfb", "vPQJ", "c", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_prf">;
+def SVPRFH : MInst<"svprfh", "vPQJ", "s", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_prf">;
+def SVPRFW : MInst<"svprfw", "vPQJ", "i", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_prf">;
+def SVPRFD : MInst<"svprfd", "vPQJ", "l", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt64, "aarch64_sve_prf">;
// Prefetch (Scalar base, VL displacement)
-def SVPRFB_VNUM : MInst<"svprfb_vnum", "vPQlJ", "c", [IsPrefetch, IsStreamingCompatible], MemEltTyInt8, "aarch64_sve_prf">;
-def SVPRFH_VNUM : MInst<"svprfh_vnum", "vPQlJ", "s", [IsPrefetch, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_prf">;
-def SVPRFW_VNUM : MInst<"svprfw_vnum", "vPQlJ", "i", [IsPrefetch, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_prf">;
-def SVPRFD_VNUM : MInst<"svprfd_vnum", "vPQlJ", "l", [IsPrefetch, IsStreamingCompatible], MemEltTyInt64, "aarch64_sve_prf">;
+def SVPRFB_VNUM : MInst<"svprfb_vnum", "vPQlJ", "c", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt8, "aarch64_sve_prf">;
+def SVPRFH_VNUM : MInst<"svprfh_vnum", "vPQlJ", "s", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_prf">;
+def SVPRFW_VNUM : MInst<"svprfw_vnum", "vPQlJ", "i", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_prf">;
+def SVPRFD_VNUM : MInst<"svprfd_vnum", "vPQlJ", "l", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt64, "aarch64_sve_prf">;
// Prefetch (Vector bases)
def SVPRFB_GATHER_BASES : MInst<"svprfb_gather[_{2}base]", "vPdJ", "UiUl", [IsGatherPrefetch], MemEltTyInt8, "aarch64_sve_prfb_gather_scalar_offset">;
@@ -543,18 +543,18 @@ def SVADRD : SInst<"svadrd[_{0}base]_[{2}]index", "uud", "ilUiUl", MergeNone, "
//////////////////////////...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with suggestion
__attribute__((target("+sve2p1,+sme2"))) | ||
svfloat32_t good3(svfloat32_t a, svfloat32_t b, svfloat32_t c) __arm_streaming_compatible { | ||
return svclamp(a, b, c); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness can we have a test as well to check this case but instead with __arm_streaming
as well
PR llvm#76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked with 'IsStreamingOrSVE2p1' is used in a non-streaming function that is not compiled with `+sve2p1`. The problem is a bit more complex than only this case. For example, we've marked lots of builtins with 'IsStreamingCompatible', meaning it can be used in either streaming, streaming-compatible or non-streaming functions. But the code in SemaChecking, doesn't check the appropriate target guards. This issue becomes relevant when SVE builtins are only available in streaming mode, e.g. when compiling for SME without SVE. If we were to add the appropriate target guards, we'd have to add many more combinations, e.g.: IsStreamingSMEOrSVE IsStreamingSME2OrSVE2 IsStreamingSMEOrSVE2p1 IsStreamingSME2OrSVE2p1 etc. To avoid having to add more combinations (and avoid having to add more in the future for new extensions), we use a single 'IsSVEOrStreamingSVE' flag for all builtins that are available in streaming mode for the appropriate SME flags, or in non-streaming mode for the appropriate SVE flags, or both. The code in SemaChecking will then verify for which mode (or both) the builtin would be defined, given the target features of the function/compilation unit. For example: 'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2 * When we compile for SVE2p1 and SME (but not SME2), the builtin is undefined behaviour when called from a streaming function. * When we compile for SME2 and SVE2 (but not SVE2p1), the builtin is undefined behaviour when called from a non-streaming function. * When we compile for _both_ SVE2p1 and SME2, the builtin can be used in either mode (non-streaming, streaming or streaming-compatible)
This changes the target guards for a small number of intrinsics, which is the minimum set to fix the test Sema/CodeGen failures introduced by this patch. I had hoped I could leave these changes to a follow-up patch to keep this patch simple. My next patch will set the correct target guards for *all* the intrinsics (with corresponding RUN lines for the tests), which should complete the work on the Clang side.
4e98882
to
46713b1
Compare
clang/lib/Sema/SemaARM.cpp
Outdated
CachedFD = FD; | ||
} | ||
|
||
if (SatisfiesSVE && SatisfiesSME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this effectively prevent streaming compatible functions when only SVE feature flags are available?
My updated understand of streaming compatible functions is that SME features play no role and the user is expected to use SVE feature flags to direct the compiler to the level of SVE support a streaming compatible function can have, much like they would for ordinary functions.
@@ -225,7 +225,7 @@ def IsStreamingCompatible : FlagType<0x4000000000>; | |||
def IsReadZA : FlagType<0x8000000000>; | |||
def IsWriteZA : FlagType<0x10000000000>; | |||
def IsReductionQV : FlagType<0x20000000000>; | |||
def IsStreamingOrSVE2p1 : FlagType<0x40000000000>; // Use for intrinsics that are common between sme/sme2 and sve2p1. | |||
def IsSVEOrStreamingSVE : FlagType<0x40000000000>; // Use for intrinsics that are common between SVE and SME. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you permit a bit of bike shedding I don't think this is a good name. From what I can see the new flag is used to trigger dynamic resolution to determine if the builtin is available to use based on the target features along with any keywords associated with the function. Perhaps RequiresDynamicVerification
? or ideally something shorter that has a similar meaning.
* Allow intrinsics in streaming-compatible functions, if they satisfy the SVE feature requirements. * Renamed IsSVEOrStreamingSVE -> VerifyRuntimeMode Additionally: * Fixed caching mechanism; it previously cached the 'SatisfiesSVE' and 'SatisfiesSME' values as well, which meant it cached per builtin call, not per functiondecl. * As per offline discussion with @paulwalker-arm, I have changed warning for undefined behaviour into an error.
✅ With the latest revision this PR passed the C/C++ code formatter. |
clang/lib/Sema/SemaARM.cpp
Outdated
@@ -622,7 +679,8 @@ bool SemaARM::CheckSMEBuiltinFunctionCall(unsigned BuiltinID, | |||
} | |||
|
|||
if (BuiltinType) | |||
checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType); | |||
HasError |= checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be wrong to return immediately? I ask because there's
switch (BuiltinID) {
default:
return false;
which should be return HasError;
? but if we can return directly then there's less change of other similar issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clang/lib/Sema/SemaARM.cpp
Outdated
static llvm::StringMap<bool> CallerFeatureMapWithoutSVE, | ||
CallerFeatureMapWithoutSME; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope I'm wrong but I think the use of static here is almost certainly bad because there's nothing stopping multiple threads from calling checkArmStreamingBuiltin
. I looked for other instances but all I could find involved one time static initialisation after which the data is effectively constant.
Random Idea:
Rather than filtering the feature map you could process BuiltinTargetGuards
, by which I mean you could split the guard into streaming and non-streaming (perhaps that's what |
effectively means) and then you use whichever side is relevant to the function's mode of operation.
Of course you could just remove the cache and leave compile time as a worry for tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, not sure what I was thinking here with the 'static' variable :)
I'd rather not go down the route of parsing the BuiltinTargetGuards here or making assumptions on the format if we're not going to ensure this format in the arm_sve.td file. For example, one can write (sve,featureX)|(sme2,featureX)
or (sve|sme),featureX
. Sure we can assume the former format, but in that case I think we need to have a SVETargetGuard
and a SMETargetGuard
, that we let TableGen combine into a canonical form for the combined TargetGuard
.
clang/lib/Sema/SemaARM.cpp
Outdated
|
||
// Cache the feature maps, to avoid having to recalculate this for each | ||
// builtin call. | ||
static unsigned CachedODRHash = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This removes the invalid caching mechanism, that wouldn't be thread-safe. Also returns directly when it finds a streaming-mode error, rather than using HasError.
clang/lib/Sema/SemaARM.cpp
Outdated
const FunctionDecl *FD, | ||
SemaARM::ArmStreamingType BuiltinType) { | ||
static bool checkArmStreamingBuiltin(Sema &S, CallExpr *TheCall, | ||
FunctionDecl *FD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can FD
's constness be restored? I think you only had to remove it because you previously called getODRHash
.
…g-mode-checks-builtins
This allows code with SVE intrinsics to be compiled with +sme,+nosve, assuming the encompassing function is in the correct mode (see llvm#93802)
) This allows code with SVE intrinsics to be compiled with +sme,+nosve, assuming the encompassing function is in the correct mode (see #93802)
One reason to want to split this up is to simplify the code added in llvm#93802, where it checks the SME streaming-mode requirements for a builtin by checking for the absence of SVE. If the target guards are separate, we can generate a table and make the Sema code to verify the runtime mode simpler. Another reason is to avoid an issue with a check in SveEmitter.cpp where it ensures that the 'VerifyRuntimeMode' is set correctly for functions that have both SVE and SME target guards: if (!Def->isFlagSet(VerifyRuntimeMode) && Def->getGuard().contains("sve") && Def->getGuard().contains("sme")) llvm_unreachable("Missing VerifyRuntimeMode flag"); However, if we ever add a new feature with "sme" in the name, even though it is unrelated to FEAT_SME, then this code no longer works. Note that the arm_sve.td and arm_sme.td files could do with a bit of restructuring after this but it seems better to follow that up in an NFC patch.
…96482) One reason to want to split this up is to simplify the code added in #93802, where it checks the SME streaming-mode requirements for a builtin by checking for the absence of SVE. If the target guards are separate, we can generate a table and make the Sema code to verify the runtime mode simpler. Another reason is to avoid an issue with a check in SveEmitter.cpp where it ensures that the 'VerifyRuntimeMode' is set correctly for functions that have both SVE and SME target guards: if (!Def->isFlagSet(VerifyRuntimeMode) && Def->getGuard().contains("sve") && Def->getGuard().contains("sme")) llvm_unreachable("Missing VerifyRuntimeMode flag"); However, if we ever add a new feature with "sme" in the name, even though it is unrelated to FEAT_SME, then this code no longer works. Note that the arm_sve.td and arm_sme.td files could do with a bit of restructuring after this but it seems better to follow that up in an NFC patch.
…m#95787) This allows code with SVE intrinsics to be compiled with +sme,+nosve, assuming the encompassing function is in the correct mode (see llvm#93802)
…lvm#96482) One reason to want to split this up is to simplify the code added in llvm#93802, where it checks the SME streaming-mode requirements for a builtin by checking for the absence of SVE. If the target guards are separate, we can generate a table and make the Sema code to verify the runtime mode simpler. Another reason is to avoid an issue with a check in SveEmitter.cpp where it ensures that the 'VerifyRuntimeMode' is set correctly for functions that have both SVE and SME target guards: if (!Def->isFlagSet(VerifyRuntimeMode) && Def->getGuard().contains("sve") && Def->getGuard().contains("sme")) llvm_unreachable("Missing VerifyRuntimeMode flag"); However, if we ever add a new feature with "sme" in the name, even though it is unrelated to FEAT_SME, then this code no longer works. Note that the arm_sve.td and arm_sme.td files could do with a bit of restructuring after this but it seems better to follow that up in an NFC patch.
PR #76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked
with 'IsStreamingOrSVE2p1' is used in a non-streaming function that is not
compiled with
+sve2p1
.The problem is a bit more complex than only this case. For example, we've marked
lots of builtins with 'IsStreamingCompatible', meaning it can be used in either
streaming, streaming-compatible or non-streaming functions. But the code in
SemaChecking, doesn't check the appropriate target guards. This issue becomes
relevant when SVE builtins are only available in streaming mode, e.g. when
compiling for SME without SVE.
If we were to add the appropriate target guards, we'd have to add many more
combinations, e.g.:
IsStreamingSMEOrSVE
IsStreamingSME2OrSVE2
IsStreamingSMEOrSVE2p1
IsStreamingSME2OrSVE2p1
etc.
To avoid having to add more combinations (and avoid having to add more in the
future for new extensions), we use a single 'IsSVEOrStreamingSVE' flag for all
builtins that are available in streaming mode for the appropriate SME flags, or
in non-streaming mode for the appropriate SVE flags, or both. The code in
SemaChecking will then verify for which mode (or both) the builtin would be
defined, given the target features of the function/compilation unit.
For example:
'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2
When we compile for SVE2p1 and SME (but not SME2), the builtin is undefined
behaviour when called from a streaming function.
When we compile for SME2 and SVE2 (but not SVE2p1), the builtin is undefined
behaviour when called from a non-streaming function.
When we compile for both SVE2p1 and SME2, the builtin can be used in either
mode (non-streaming, streaming or streaming-compatible)