Skip to content

[Clang][AArch64] Generalise streaming mode checks for builtins. #93802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sdesmalen-arm
Copy link
Collaborator

PR #76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked
with 'IsStreamingOrSVE2p1' is used in a non-streaming function that is not
compiled with +sve2p1.

The problem is a bit more complex than only this case. For example, we've marked
lots of builtins with 'IsStreamingCompatible', meaning it can be used in either
streaming, streaming-compatible or non-streaming functions. But the code in
SemaChecking, doesn't check the appropriate target guards. This issue becomes
relevant when SVE builtins are only available in streaming mode, e.g. when
compiling for SME without SVE.

If we were to add the appropriate target guards, we'd have to add many more
combinations, e.g.:
IsStreamingSMEOrSVE
IsStreamingSME2OrSVE2
IsStreamingSMEOrSVE2p1
IsStreamingSME2OrSVE2p1
etc.

To avoid having to add more combinations (and avoid having to add more in the
future for new extensions), we use a single 'IsSVEOrStreamingSVE' flag for all
builtins that are available in streaming mode for the appropriate SME flags, or
in non-streaming mode for the appropriate SVE flags, or both. The code in
SemaChecking will then verify for which mode (or both) the builtin would be
defined, given the target features of the function/compilation unit.

For example:

'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2

  • When we compile for SVE2p1 and SME (but not SME2), the builtin is undefined
    behaviour when called from a streaming function.

  • When we compile for SME2 and SVE2 (but not SVE2p1), the builtin is undefined
    behaviour when called from a non-streaming function.

  • When we compile for both SVE2p1 and SME2, the builtin can be used in either
    mode (non-streaming, streaming or streaming-compatible)

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 30, 2024
@llvmbot
Copy link
Member

llvmbot commented May 30, 2024

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-clang

Author: Sander de Smalen (sdesmalen-arm)

Changes

PR #76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked
with 'IsStreamingOrSVE2p1' is used in a non-streaming function that is not
compiled with +sve2p1.

The problem is a bit more complex than only this case. For example, we've marked
lots of builtins with 'IsStreamingCompatible', meaning it can be used in either
streaming, streaming-compatible or non-streaming functions. But the code in
SemaChecking, doesn't check the appropriate target guards. This issue becomes
relevant when SVE builtins are only available in streaming mode, e.g. when
compiling for SME without SVE.

If we were to add the appropriate target guards, we'd have to add many more
combinations, e.g.:
IsStreamingSMEOrSVE
IsStreamingSME2OrSVE2
IsStreamingSMEOrSVE2p1
IsStreamingSME2OrSVE2p1
etc.

To avoid having to add more combinations (and avoid having to add more in the
future for new extensions), we use a single 'IsSVEOrStreamingSVE' flag for all
builtins that are available in streaming mode for the appropriate SME flags, or
in non-streaming mode for the appropriate SVE flags, or both. The code in
SemaChecking will then verify for which mode (or both) the builtin would be
defined, given the target features of the function/compilation unit.

For example:

'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2

  • When we compile for SVE2p1 and SME (but not SME2), the builtin is undefined
    behaviour when called from a streaming function.

  • When we compile for SME2 and SVE2 (but not SVE2p1), the builtin is undefined
    behaviour when called from a non-streaming function.

  • When we compile for both SVE2p1 and SME2, the builtin can be used in either
    mode (non-streaming, streaming or streaming-compatible)


Patch is 217.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93802.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/arm_sve.td (+751-751)
  • (modified) clang/include/clang/Basic/arm_sve_sme_incl.td (+1-1)
  • (modified) clang/lib/Sema/SemaChecking.cpp (+61-14)
  • (removed) clang/test/Sema/aarch64-sme2-sve2p1-diagnostics.c (-39)
  • (added) clang/test/Sema/aarch64-streaming-sme-or-nonstreaming-sve-builtins.c (+51)
  • (modified) clang/utils/TableGen/SveEmitter.cpp (+3-3)
diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td
index 88938a981fd8a..09d34cb048813 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -19,27 +19,27 @@ include "arm_sve_sme_incl.td"
 // Loads
 
 // Load one vector (scalar base)
-def SVLD1   : MInst<"svld1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible],               MemEltTyDefault, "aarch64_sve_ld1">;
-def SVLD1SB : MInst<"svld1sb_{d}", "dPS", "silUsUiUl",       [IsLoad, IsStreamingCompatible],               MemEltTyInt8,    "aarch64_sve_ld1">;
-def SVLD1UB : MInst<"svld1ub_{d}", "dPW", "silUsUiUl",       [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt8,    "aarch64_sve_ld1">;
-def SVLD1SH : MInst<"svld1sh_{d}", "dPT", "ilUiUl",          [IsLoad, IsStreamingCompatible],               MemEltTyInt16,   "aarch64_sve_ld1">;
-def SVLD1UH : MInst<"svld1uh_{d}", "dPX", "ilUiUl",          [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt16,   "aarch64_sve_ld1">;
-def SVLD1SW : MInst<"svld1sw_{d}", "dPU", "lUl",             [IsLoad, IsStreamingCompatible],               MemEltTyInt32,   "aarch64_sve_ld1">;
-def SVLD1UW : MInst<"svld1uw_{d}", "dPY", "lUl",             [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt32,   "aarch64_sve_ld1">;
+def SVLD1   : MInst<"svld1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE],               MemEltTyDefault, "aarch64_sve_ld1">;
+def SVLD1SB : MInst<"svld1sb_{d}", "dPS", "silUsUiUl",       [IsLoad, IsSVEOrStreamingSVE],               MemEltTyInt8,    "aarch64_sve_ld1">;
+def SVLD1UB : MInst<"svld1ub_{d}", "dPW", "silUsUiUl",       [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt8,    "aarch64_sve_ld1">;
+def SVLD1SH : MInst<"svld1sh_{d}", "dPT", "ilUiUl",          [IsLoad, IsSVEOrStreamingSVE],               MemEltTyInt16,   "aarch64_sve_ld1">;
+def SVLD1UH : MInst<"svld1uh_{d}", "dPX", "ilUiUl",          [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt16,   "aarch64_sve_ld1">;
+def SVLD1SW : MInst<"svld1sw_{d}", "dPU", "lUl",             [IsLoad, IsSVEOrStreamingSVE],               MemEltTyInt32,   "aarch64_sve_ld1">;
+def SVLD1UW : MInst<"svld1uw_{d}", "dPY", "lUl",             [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt32,   "aarch64_sve_ld1">;
 
 let TargetGuard = "sve,bf16" in {
-  def SVLD1_BF      : MInst<"svld1[_{2}]",      "dPc",  "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ld1">;
-  def SVLD1_VNUM_BF : MInst<"svld1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ld1">;
+  def SVLD1_BF      : MInst<"svld1[_{2}]",      "dPc",  "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ld1">;
+  def SVLD1_VNUM_BF : MInst<"svld1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ld1">;
 }
 
 // Load one vector (scalar base, VL displacement)
-def SVLD1_VNUM   : MInst<"svld1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible],               MemEltTyDefault, "aarch64_sve_ld1">;
-def SVLD1SB_VNUM : MInst<"svld1sb_vnum_{d}", "dPSl", "silUsUiUl",       [IsLoad, IsStreamingCompatible],               MemEltTyInt8,    "aarch64_sve_ld1">;
-def SVLD1UB_VNUM : MInst<"svld1ub_vnum_{d}", "dPWl", "silUsUiUl",       [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt8,    "aarch64_sve_ld1">;
-def SVLD1SH_VNUM : MInst<"svld1sh_vnum_{d}", "dPTl", "ilUiUl",          [IsLoad, IsStreamingCompatible],               MemEltTyInt16,   "aarch64_sve_ld1">;
-def SVLD1UH_VNUM : MInst<"svld1uh_vnum_{d}", "dPXl", "ilUiUl",          [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt16,   "aarch64_sve_ld1">;
-def SVLD1SW_VNUM : MInst<"svld1sw_vnum_{d}", "dPUl", "lUl",             [IsLoad, IsStreamingCompatible],               MemEltTyInt32,   "aarch64_sve_ld1">;
-def SVLD1UW_VNUM : MInst<"svld1uw_vnum_{d}", "dPYl", "lUl",             [IsLoad, IsZExtReturn, IsStreamingCompatible], MemEltTyInt32,   "aarch64_sve_ld1">;
+def SVLD1_VNUM   : MInst<"svld1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE],               MemEltTyDefault, "aarch64_sve_ld1">;
+def SVLD1SB_VNUM : MInst<"svld1sb_vnum_{d}", "dPSl", "silUsUiUl",       [IsLoad, IsSVEOrStreamingSVE],               MemEltTyInt8,    "aarch64_sve_ld1">;
+def SVLD1UB_VNUM : MInst<"svld1ub_vnum_{d}", "dPWl", "silUsUiUl",       [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt8,    "aarch64_sve_ld1">;
+def SVLD1SH_VNUM : MInst<"svld1sh_vnum_{d}", "dPTl", "ilUiUl",          [IsLoad, IsSVEOrStreamingSVE],               MemEltTyInt16,   "aarch64_sve_ld1">;
+def SVLD1UH_VNUM : MInst<"svld1uh_vnum_{d}", "dPXl", "ilUiUl",          [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt16,   "aarch64_sve_ld1">;
+def SVLD1SW_VNUM : MInst<"svld1sw_vnum_{d}", "dPUl", "lUl",             [IsLoad, IsSVEOrStreamingSVE],               MemEltTyInt32,   "aarch64_sve_ld1">;
+def SVLD1UW_VNUM : MInst<"svld1uw_vnum_{d}", "dPYl", "lUl",             [IsLoad, IsZExtReturn, IsSVEOrStreamingSVE], MemEltTyInt32,   "aarch64_sve_ld1">;
 
 // Load one vector (vector base)
 def SVLD1_GATHER_BASES_U   : MInst<"svld1_gather[_{2}base]_{d}",   "dPu", "ilUiUlfd", [IsGatherLoad],               MemEltTyDefault, "aarch64_sve_ld1_gather_scalar_offset">;
@@ -243,27 +243,27 @@ let TargetGuard = "sve,bf16" in {
 }
 
 // Load one vector, unextended load, non-temporal (scalar base)
-def SVLDNT1 : MInst<"svldnt1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
+def SVLDNT1 : MInst<"svldnt1[_{2}]", "dPc", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
 
 // Load one vector, unextended load, non-temporal (scalar base, VL displacement)
-def SVLDNT1_VNUM : MInst<"svldnt1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
+def SVLDNT1_VNUM : MInst<"svldnt1_vnum[_{2}]", "dPcl", "csilUcUsUiUlhfd", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
 
 let TargetGuard = "sve,bf16" in {
-  def SVLDNT1_BF      : MInst<"svldnt1[_{2}]",      "dPc",  "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
-  def SVLDNT1_VNUM_BF : MInst<"svldnt1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_ldnt1">;
+  def SVLDNT1_BF      : MInst<"svldnt1[_{2}]",      "dPc",  "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
+  def SVLDNT1_VNUM_BF : MInst<"svldnt1_vnum[_{2}]", "dPcl", "b", [IsLoad, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_ldnt1">;
 }
 
 // Load one quadword and replicate (scalar base)
-def SVLD1RQ : SInst<"svld1rq[_{2}]", "dPc", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_ld1rq", [IsStreamingCompatible]>;
+def SVLD1RQ : SInst<"svld1rq[_{2}]", "dPc", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_ld1rq", [IsSVEOrStreamingSVE]>;
 
 let TargetGuard = "sve,bf16" in {
-  def SVLD1RQ_BF : SInst<"svld1rq[_{2}]", "dPc",  "b", MergeNone, "aarch64_sve_ld1rq", [IsStreamingCompatible]>;
+  def SVLD1RQ_BF : SInst<"svld1rq[_{2}]", "dPc",  "b", MergeNone, "aarch64_sve_ld1rq", [IsSVEOrStreamingSVE]>;
 }
 
 multiclass StructLoad<string name, string proto, string i> {
-  def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructLoad, IsStreamingCompatible]>;
+  def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructLoad, IsSVEOrStreamingSVE]>;
   let TargetGuard = "sve,bf16" in {
-    def: SInst<name, proto, "b", MergeNone, i, [IsStructLoad, IsStreamingCompatible]>;
+    def: SInst<name, proto, "b", MergeNone, i, [IsStructLoad, IsSVEOrStreamingSVE]>;
   }
 }
 
@@ -286,16 +286,16 @@ let TargetGuard = "sve,f64mm,bf16" in {
 }
 
 let TargetGuard = "sve,bf16" in {
-  def SVBFDOT        : SInst<"svbfdot[_{0}]",        "MMdd",  "b", MergeNone, "aarch64_sve_bfdot",        [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFMLALB      : SInst<"svbfmlalb[_{0}]",      "MMdd",  "b", MergeNone, "aarch64_sve_bfmlalb",      [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFMLALT      : SInst<"svbfmlalt[_{0}]",      "MMdd",  "b", MergeNone, "aarch64_sve_bfmlalt",      [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFMMLA       : SInst<"svbfmmla[_{0}]",       "MMdd",  "b", MergeNone, "aarch64_sve_bfmmla",       [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFDOT_N      : SInst<"svbfdot[_n_{0}]",      "MMda",  "b", MergeNone, "aarch64_sve_bfdot",        [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFMLAL_N     : SInst<"svbfmlalb[_n_{0}]",    "MMda",  "b", MergeNone, "aarch64_sve_bfmlalb",      [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFMLALT_N    : SInst<"svbfmlalt[_n_{0}]",    "MMda",  "b", MergeNone, "aarch64_sve_bfmlalt",      [IsOverloadNone, IsStreamingCompatible]>;
-  def SVBFDOT_LANE   : SInst<"svbfdot_lane[_{0}]",   "MMddi", "b", MergeNone, "aarch64_sve_bfdot_lane_v2",   [IsOverloadNone, IsStreamingCompatible], [ImmCheck<3, ImmCheck0_3>]>;
-  def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalb_lane_v2", [IsOverloadNone, IsStreamingCompatible], [ImmCheck<3, ImmCheck0_7>]>;
-  def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone, IsStreamingCompatible], [ImmCheck<3, ImmCheck0_7>]>;
+  def SVBFDOT        : SInst<"svbfdot[_{0}]",        "MMdd",  "b", MergeNone, "aarch64_sve_bfdot",        [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFMLALB      : SInst<"svbfmlalb[_{0}]",      "MMdd",  "b", MergeNone, "aarch64_sve_bfmlalb",      [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFMLALT      : SInst<"svbfmlalt[_{0}]",      "MMdd",  "b", MergeNone, "aarch64_sve_bfmlalt",      [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFMMLA       : SInst<"svbfmmla[_{0}]",       "MMdd",  "b", MergeNone, "aarch64_sve_bfmmla",       [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFDOT_N      : SInst<"svbfdot[_n_{0}]",      "MMda",  "b", MergeNone, "aarch64_sve_bfdot",        [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFMLAL_N     : SInst<"svbfmlalb[_n_{0}]",    "MMda",  "b", MergeNone, "aarch64_sve_bfmlalb",      [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFMLALT_N    : SInst<"svbfmlalt[_n_{0}]",    "MMda",  "b", MergeNone, "aarch64_sve_bfmlalt",      [IsOverloadNone, IsSVEOrStreamingSVE]>;
+  def SVBFDOT_LANE   : SInst<"svbfdot_lane[_{0}]",   "MMddi", "b", MergeNone, "aarch64_sve_bfdot_lane_v2",   [IsOverloadNone, IsSVEOrStreamingSVE], [ImmCheck<3, ImmCheck0_3>]>;
+  def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalb_lane_v2", [IsOverloadNone, IsSVEOrStreamingSVE], [ImmCheck<3, ImmCheck0_7>]>;
+  def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddi", "b", MergeNone, "aarch64_sve_bfmlalt_lane_v2", [IsOverloadNone, IsSVEOrStreamingSVE], [ImmCheck<3, ImmCheck0_7>]>;
 }
 
 let TargetGuard = "sve2p1" in {
@@ -334,26 +334,26 @@ let TargetGuard = "sve2p1" in {
 // Stores
 
 // Store one vector (scalar base)
-def SVST1    : MInst<"svst1[_{d}]",  "vPpd", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
-def SVST1B_S : MInst<"svst1b[_{d}]", "vPAd", "sil",             [IsStore, IsStreamingCompatible], MemEltTyInt8,    "aarch64_sve_st1">;
-def SVST1B_U : MInst<"svst1b[_{d}]", "vPEd", "UsUiUl",          [IsStore, IsStreamingCompatible], MemEltTyInt8,    "aarch64_sve_st1">;
-def SVST1H_S : MInst<"svst1h[_{d}]", "vPBd", "il",              [IsStore, IsStreamingCompatible], MemEltTyInt16,   "aarch64_sve_st1">;
-def SVST1H_U : MInst<"svst1h[_{d}]", "vPFd", "UiUl",            [IsStore, IsStreamingCompatible], MemEltTyInt16,   "aarch64_sve_st1">;
-def SVST1W_S : MInst<"svst1w[_{d}]", "vPCd", "l",               [IsStore, IsStreamingCompatible], MemEltTyInt32,   "aarch64_sve_st1">;
-def SVST1W_U : MInst<"svst1w[_{d}]", "vPGd", "Ul",              [IsStore, IsStreamingCompatible], MemEltTyInt32,   "aarch64_sve_st1">;
+def SVST1    : MInst<"svst1[_{d}]",  "vPpd", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
+def SVST1B_S : MInst<"svst1b[_{d}]", "vPAd", "sil",             [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8,    "aarch64_sve_st1">;
+def SVST1B_U : MInst<"svst1b[_{d}]", "vPEd", "UsUiUl",          [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8,    "aarch64_sve_st1">;
+def SVST1H_S : MInst<"svst1h[_{d}]", "vPBd", "il",              [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16,   "aarch64_sve_st1">;
+def SVST1H_U : MInst<"svst1h[_{d}]", "vPFd", "UiUl",            [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16,   "aarch64_sve_st1">;
+def SVST1W_S : MInst<"svst1w[_{d}]", "vPCd", "l",               [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32,   "aarch64_sve_st1">;
+def SVST1W_U : MInst<"svst1w[_{d}]", "vPGd", "Ul",              [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32,   "aarch64_sve_st1">;
 
 // Store one vector (scalar base, VL displacement)
-def SVST1_VNUM    : MInst<"svst1_vnum[_{d}]",  "vPpld", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
-def SVST1B_VNUM_S : MInst<"svst1b_vnum[_{d}]", "vPAld", "sil",             [IsStore, IsStreamingCompatible], MemEltTyInt8,    "aarch64_sve_st1">;
-def SVST1B_VNUM_U : MInst<"svst1b_vnum[_{d}]", "vPEld", "UsUiUl",          [IsStore, IsStreamingCompatible], MemEltTyInt8,    "aarch64_sve_st1">;
-def SVST1H_VNUM_S : MInst<"svst1h_vnum[_{d}]", "vPBld", "il",              [IsStore, IsStreamingCompatible], MemEltTyInt16,   "aarch64_sve_st1">;
-def SVST1H_VNUM_U : MInst<"svst1h_vnum[_{d}]", "vPFld", "UiUl",            [IsStore, IsStreamingCompatible], MemEltTyInt16,   "aarch64_sve_st1">;
-def SVST1W_VNUM_S : MInst<"svst1w_vnum[_{d}]", "vPCld", "l",               [IsStore, IsStreamingCompatible], MemEltTyInt32,   "aarch64_sve_st1">;
-def SVST1W_VNUM_U : MInst<"svst1w_vnum[_{d}]", "vPGld", "Ul",              [IsStore, IsStreamingCompatible], MemEltTyInt32,   "aarch64_sve_st1">;
+def SVST1_VNUM    : MInst<"svst1_vnum[_{d}]",  "vPpld", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
+def SVST1B_VNUM_S : MInst<"svst1b_vnum[_{d}]", "vPAld", "sil",             [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8,    "aarch64_sve_st1">;
+def SVST1B_VNUM_U : MInst<"svst1b_vnum[_{d}]", "vPEld", "UsUiUl",          [IsStore, IsSVEOrStreamingSVE], MemEltTyInt8,    "aarch64_sve_st1">;
+def SVST1H_VNUM_S : MInst<"svst1h_vnum[_{d}]", "vPBld", "il",              [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16,   "aarch64_sve_st1">;
+def SVST1H_VNUM_U : MInst<"svst1h_vnum[_{d}]", "vPFld", "UiUl",            [IsStore, IsSVEOrStreamingSVE], MemEltTyInt16,   "aarch64_sve_st1">;
+def SVST1W_VNUM_S : MInst<"svst1w_vnum[_{d}]", "vPCld", "l",               [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32,   "aarch64_sve_st1">;
+def SVST1W_VNUM_U : MInst<"svst1w_vnum[_{d}]", "vPGld", "Ul",              [IsStore, IsSVEOrStreamingSVE], MemEltTyInt32,   "aarch64_sve_st1">;
 
 let TargetGuard = "sve,bf16" in {
-  def SVST1_BF      : MInst<"svst1[_{d}]",      "vPpd",  "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
-  def SVST1_VNUM_BF : MInst<"svst1_vnum[_{d}]", "vPpld", "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_st1">;
+  def SVST1_BF      : MInst<"svst1[_{d}]",      "vPpd",  "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
+  def SVST1_VNUM_BF : MInst<"svst1_vnum[_{d}]", "vPpld", "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_st1">;
 }
 
 // Store one vector (vector base)
@@ -426,9 +426,9 @@ def SVST1H_SCATTER_INDEX_S    : MInst<"svst1h_scatter[_{2}base]_index[_{d}]", "v
 def SVST1W_SCATTER_INDEX_S    : MInst<"svst1w_scatter[_{2}base]_index[_{d}]", "vPuld", "lUl",      [IsScatterStore], MemEltTyInt32,   "aarch64_sve_st1_scatter_scalar_offset">;
 
 multiclass StructStore<string name, string proto, string i> {
-  def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructStore, IsStreamingCompatible]>;
+  def : SInst<name, proto, "csilUcUsUiUlhfd", MergeNone, i, [IsStructStore, IsSVEOrStreamingSVE]>;
   let TargetGuard = "sve,bf16" in {
-    def: SInst<name, proto, "b", MergeNone, i, [IsStructStore, IsStreamingCompatible]>;
+    def: SInst<name, proto, "b", MergeNone, i, [IsStructStore, IsSVEOrStreamingSVE]>;
   }
 }
 // Store N vectors into N-element structure (scalar base)
@@ -442,14 +442,14 @@ defm SVST3_VNUM : StructStore<"svst3_vnum[_{d}]", "vPpl3", "aarch64_sve_st3">;
 defm SVST4_VNUM : StructStore<"svst4_vnum[_{d}]", "vPpl4", "aarch64_sve_st4">;
 
 // Store one vector, with no truncation, non-temporal (scalar base)
-def SVSTNT1 : MInst<"svstnt1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
+def SVSTNT1 : MInst<"svstnt1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
 
 // Store one vector, with no truncation, non-temporal (scalar base, VL displacement)
-def SVSTNT1_VNUM : MInst<"svstnt1_vnum[_{d}]", "vPpld", "csilUcUsUiUlhfd", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
+def SVSTNT1_VNUM : MInst<"svstnt1_vnum[_{d}]", "vPpld", "csilUcUsUiUlhfd", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
 
 let TargetGuard = "sve,bf16" in {
-  def SVSTNT1_BF      : MInst<"svstnt1[_{d}]",      "vPpd",  "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
-  def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore, IsStreamingCompatible], MemEltTyDefault, "aarch64_sve_stnt1">;
+  def SVSTNT1_BF      : MInst<"svstnt1[_{d}]",      "vPpd",  "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
+  def SVSTNT1_VNUM_BF : MInst<"svstnt1_vnum[_{d}]", "vPpld", "b", [IsStore, IsSVEOrStreamingSVE], MemEltTyDefault, "aarch64_sve_stnt1">;
 }
 
 let TargetGuard = "sve2p1" in {
@@ -488,16 +488,16 @@ let TargetGuard = "sve2p1" in {
 // Prefetches
 
 // Prefetch (Scalar base)
-def SVPRFB : MInst<"svprfb", "vPQJ", "c", [IsPrefetch, IsStreamingCompatible], MemEltTyInt8,  "aarch64_sve_prf">;
-def SVPRFH : MInst<"svprfh", "vPQJ", "s", [IsPrefetch, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_prf">;
-def SVPRFW : MInst<"svprfw", "vPQJ", "i", [IsPrefetch, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_prf">;
-def SVPRFD : MInst<"svprfd", "vPQJ", "l", [IsPrefetch, IsStreamingCompatible], MemEltTyInt64, "aarch64_sve_prf">;
+def SVPRFB : MInst<"svprfb", "vPQJ", "c", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt8,  "aarch64_sve_prf">;
+def SVPRFH : MInst<"svprfh", "vPQJ", "s", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_prf">;
+def SVPRFW : MInst<"svprfw", "vPQJ", "i", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_prf">;
+def SVPRFD : MInst<"svprfd", "vPQJ", "l", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt64, "aarch64_sve_prf">;
 
 // Prefetch (Scalar base, VL displacement)
-def SVPRFB_VNUM : MInst<"svprfb_vnum", "vPQlJ", "c", [IsPrefetch, IsStreamingCompatible], MemEltTyInt8,  "aarch64_sve_prf">;
-def SVPRFH_VNUM : MInst<"svprfh_vnum", "vPQlJ", "s", [IsPrefetch, IsStreamingCompatible], MemEltTyInt16, "aarch64_sve_prf">;
-def SVPRFW_VNUM : MInst<"svprfw_vnum", "vPQlJ", "i", [IsPrefetch, IsStreamingCompatible], MemEltTyInt32, "aarch64_sve_prf">;
-def SVPRFD_VNUM : MInst<"svprfd_vnum", "vPQlJ", "l", [IsPrefetch, IsStreamingCompatible], MemEltTyInt64, "aarch64_sve_prf">;
+def SVPRFB_VNUM : MInst<"svprfb_vnum", "vPQlJ", "c", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt8,  "aarch64_sve_prf">;
+def SVPRFH_VNUM : MInst<"svprfh_vnum", "vPQlJ", "s", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt16, "aarch64_sve_prf">;
+def SVPRFW_VNUM : MInst<"svprfw_vnum", "vPQlJ", "i", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt32, "aarch64_sve_prf">;
+def SVPRFD_VNUM : MInst<"svprfd_vnum", "vPQlJ", "l", [IsPrefetch, IsSVEOrStreamingSVE], MemEltTyInt64, "aarch64_sve_prf">;
 
 // Prefetch (Vector bases)
 def SVPRFB_GATHER_BASES : MInst<"svprfb_gather[_{2}base]", "vPdJ", "UiUl", [IsGatherPrefetch], MemEltTyInt8,  "aarch64_sve_prfb_gather_scalar_offset">;
@@ -543,18 +543,18 @@ def SVADRD : SInst<"svadrd[_{0}base]_[{2}]index",  "uud", "ilUiUl", MergeNone, "
 //////////////////////////...
[truncated]

Copy link
Contributor

@aemerson aemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with suggestion

Comment on lines +18 to +21
__attribute__((target("+sve2p1,+sme2")))
svfloat32_t good3(svfloat32_t a, svfloat32_t b, svfloat32_t c) __arm_streaming_compatible {
return svclamp(a, b, c);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness can we have a test as well to check this case but instead with __arm_streaming as well

PR llvm#76975 added 'IsStreamingOrSVE2p1' to emit a diagnostic when a builtin marked
with 'IsStreamingOrSVE2p1' is used in a non-streaming function that is not
compiled with `+sve2p1`.

The problem is a bit more complex than only this case. For example, we've marked
lots of builtins with 'IsStreamingCompatible', meaning it can be used in either
streaming, streaming-compatible or non-streaming functions. But the code in
SemaChecking, doesn't check the appropriate target guards. This issue becomes
relevant when SVE builtins are only available in streaming mode, e.g. when
compiling for SME without SVE.

If we were to add the appropriate target guards, we'd have to add many more
combinations, e.g.:
  IsStreamingSMEOrSVE
  IsStreamingSME2OrSVE2
  IsStreamingSMEOrSVE2p1
  IsStreamingSME2OrSVE2p1
  etc.

To avoid having to add more combinations (and avoid having to add more in the
future for new extensions), we use a single 'IsSVEOrStreamingSVE' flag for all
builtins that are available in streaming mode for the appropriate SME flags, or
in non-streaming mode for the appropriate SVE flags, or both.  The code in
SemaChecking will then verify for which mode (or both) the builtin would be
defined, given the target features of the function/compilation unit.

For example:

  'svclamp' is enabled under FEAT_SVE2p1 and FEAT_SME2

* When we compile for SVE2p1 and SME (but not SME2), the builtin is undefined
  behaviour when called from a streaming function.

* When we compile for SME2 and SVE2 (but not SVE2p1), the builtin is undefined
  behaviour when called from a non-streaming function.

* When we compile for _both_ SVE2p1 and SME2, the builtin can be used in either
  mode (non-streaming, streaming or streaming-compatible)
This changes the target guards for a small number of intrinsics, which is
the minimum set to fix the test Sema/CodeGen failures introduced by this
patch. I had hoped I could leave these changes to a follow-up patch to keep
this patch simple.

My next patch will set the correct target guards for *all* the intrinsics
(with corresponding RUN lines for the tests), which should complete the
work on the Clang side.
@sdesmalen-arm sdesmalen-arm force-pushed the generalise-streaming-mode-checks-builtins branch from 4e98882 to 46713b1 Compare June 4, 2024 11:27
CachedFD = FD;
}

if (SatisfiesSVE && SatisfiesSME)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this effectively prevent streaming compatible functions when only SVE feature flags are available?

My updated understand of streaming compatible functions is that SME features play no role and the user is expected to use SVE feature flags to direct the compiler to the level of SVE support a streaming compatible function can have, much like they would for ordinary functions.

@@ -225,7 +225,7 @@ def IsStreamingCompatible : FlagType<0x4000000000>;
def IsReadZA : FlagType<0x8000000000>;
def IsWriteZA : FlagType<0x10000000000>;
def IsReductionQV : FlagType<0x20000000000>;
def IsStreamingOrSVE2p1 : FlagType<0x40000000000>; // Use for intrinsics that are common between sme/sme2 and sve2p1.
def IsSVEOrStreamingSVE : FlagType<0x40000000000>; // Use for intrinsics that are common between SVE and SME.
Copy link
Collaborator

@paulwalker-arm paulwalker-arm Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you permit a bit of bike shedding I don't think this is a good name. From what I can see the new flag is used to trigger dynamic resolution to determine if the builtin is available to use based on the target features along with any keywords associated with the function. Perhaps RequiresDynamicVerification? or ideally something shorter that has a similar meaning.

* Allow intrinsics in streaming-compatible functions, if they satisfy the
  SVE feature requirements.
* Renamed IsSVEOrStreamingSVE -> VerifyRuntimeMode

Additionally:

* Fixed caching mechanism; it previously cached the 'SatisfiesSVE' and
  'SatisfiesSME' values as well, which meant it cached per builtin call,
  not per functiondecl.
* As per offline discussion with @paulwalker-arm, I have changed warning
  for undefined behaviour into an error.
Copy link

github-actions bot commented Jun 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -622,7 +679,8 @@ bool SemaARM::CheckSMEBuiltinFunctionCall(unsigned BuiltinID,
}

if (BuiltinType)
checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType);
HasError |= checkArmStreamingBuiltin(SemaRef, TheCall, FD, *BuiltinType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be wrong to return immediately? I ask because there's

  switch (BuiltinID) {
  default:
    return false;

which should be return HasError;? but if we can return directly then there's less change of other similar issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 573 to 574
static llvm::StringMap<bool> CallerFeatureMapWithoutSVE,
CallerFeatureMapWithoutSME;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I'm wrong but I think the use of static here is almost certainly bad because there's nothing stopping multiple threads from calling checkArmStreamingBuiltin. I looked for other instances but all I could find involved one time static initialisation after which the data is effectively constant.

Random Idea:
Rather than filtering the feature map you could process BuiltinTargetGuards, by which I mean you could split the guard into streaming and non-streaming (perhaps that's what | effectively means) and then you use whichever side is relevant to the function's mode of operation.

Of course you could just remove the cache and leave compile time as a worry for tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not sure what I was thinking here with the 'static' variable :)

I'd rather not go down the route of parsing the BuiltinTargetGuards here or making assumptions on the format if we're not going to ensure this format in the arm_sve.td file. For example, one can write (sve,featureX)|(sme2,featureX) or (sve|sme),featureX. Sure we can assume the former format, but in that case I think we need to have a SVETargetGuard and a SMETargetGuard, that we let TableGen combine into a canonical form for the combined TargetGuard.


// Cache the feature maps, to avoid having to recalculate this for each
// builtin call.
static unsigned CachedODRHash = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This removes the invalid caching mechanism, that wouldn't be thread-safe.
Also returns directly when it finds a streaming-mode error, rather than using HasError.
const FunctionDecl *FD,
SemaARM::ArmStreamingType BuiltinType) {
static bool checkArmStreamingBuiltin(Sema &S, CallExpr *TheCall,
FunctionDecl *FD,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can FD's constness be restored? I think you only had to remove it because you previously called getODRHash.

@sdesmalen-arm sdesmalen-arm merged commit 1644a31 into llvm:main Jun 16, 2024
7 checks passed
@sdesmalen-arm sdesmalen-arm deleted the generalise-streaming-mode-checks-builtins branch June 16, 2024 16:24
sdesmalen-arm added a commit to sdesmalen-arm/llvm-project that referenced this pull request Jun 17, 2024
This allows code with SVE intrinsics to be compiled with +sme,+nosve,
assuming the encompassing function is in the correct mode (see llvm#93802)
sdesmalen-arm added a commit that referenced this pull request Jun 21, 2024
)

This allows code with SVE intrinsics to be compiled with +sme,+nosve,
assuming the encompassing function is in the correct mode (see #93802)
sdesmalen-arm added a commit to sdesmalen-arm/llvm-project that referenced this pull request Jun 24, 2024
One reason to want to split this up is to simplify the code added
in llvm#93802, where it checks the SME streaming-mode requirements for a
builtin by checking for the absence of SVE. If the target guards are
separate, we can generate a table and make the Sema code to verify the
runtime mode simpler.

Another reason is to avoid an issue with a check in SveEmitter.cpp
where it ensures that the 'VerifyRuntimeMode' is set correctly for
functions that have both SVE and SME target guards:

  if (!Def->isFlagSet(VerifyRuntimeMode) && Def->getGuard().contains("sve") &&
      Def->getGuard().contains("sme"))
    llvm_unreachable("Missing VerifyRuntimeMode flag");

However, if we ever add a new feature with "sme" in the name, even
though it is unrelated to FEAT_SME, then this code no longer works.

Note that the arm_sve.td and arm_sme.td files could do with a bit of
restructuring after this but it seems better to follow that up in an
NFC patch.
sdesmalen-arm added a commit that referenced this pull request Jun 24, 2024
…96482)

One reason to want to split this up is to simplify the code added in
#93802, where it checks the SME streaming-mode requirements for a
builtin by checking for the absence of SVE. If the target guards are
separate, we can generate a table and make the Sema code to verify the
runtime mode simpler.

Another reason is to avoid an issue with a check in SveEmitter.cpp where
it ensures that the 'VerifyRuntimeMode' is set correctly for functions
that have both SVE and SME target guards:

if (!Def->isFlagSet(VerifyRuntimeMode) &&
Def->getGuard().contains("sve") &&
      Def->getGuard().contains("sme"))
    llvm_unreachable("Missing VerifyRuntimeMode flag");

However, if we ever add a new feature with "sme" in the name, even
though it is unrelated to FEAT_SME, then this code no longer works.

Note that the arm_sve.td and arm_sme.td files could do with a bit of
restructuring after this but it seems better to follow that up in an NFC
patch.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…m#95787)

This allows code with SVE intrinsics to be compiled with +sme,+nosve,
assuming the encompassing function is in the correct mode (see llvm#93802)
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…lvm#96482)

One reason to want to split this up is to simplify the code added in
llvm#93802, where it checks the SME streaming-mode requirements for a
builtin by checking for the absence of SVE. If the target guards are
separate, we can generate a table and make the Sema code to verify the
runtime mode simpler.

Another reason is to avoid an issue with a check in SveEmitter.cpp where
it ensures that the 'VerifyRuntimeMode' is set correctly for functions
that have both SVE and SME target guards:

if (!Def->isFlagSet(VerifyRuntimeMode) &&
Def->getGuard().contains("sve") &&
      Def->getGuard().contains("sme"))
    llvm_unreachable("Missing VerifyRuntimeMode flag");

However, if we ever add a new feature with "sme" in the name, even
though it is unrelated to FEAT_SME, then this code no longer works.

Note that the arm_sve.td and arm_sme.td files could do with a bit of
restructuring after this but it seems better to follow that up in an NFC
patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants