[RISCV] Make Zicclsm imply unaligned scalar and vector access #108551

wangpc-pp · 2024-09-13T12:37:00Z

According to the RISC-V profiles specification:

Zicclsm Misaligned loads and stores to main memory regions with
both the cacheability and coherence PMAs must be supported.

Zicclsm should imply both scalar and vector unaligned access.

This PR moves all LLVM specific features and extensions above.

This may break some CPU definitions that don't support unaligned
vector access, like spacemit-x60. But I believe this is the
right thing, so I'd like to gather more comments.

… FeatureUnalignedVectorMem According to the RISC-V profiles specification: > Zicclsm Misaligned loads and stores to main memory regions with > both the cacheability and coherence PMAs must be supported. `Zicclsm` should imply both scalar and vector unaligned access.

llvmbot · 2024-09-13T12:37:31Z

@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)

Changes

According to the RISC-V profiles specification:
> Zicclsm Misaligned loads and stores to main memory regions with
> both the cacheability and coherence PMAs must be supported.

Zicclsm should imply both scalar and vector unaligned access.

This PR moves all LLVM specific features and extensions above.

This may break some CPU definitions that don't support unaligned
vector access, like spacemit-x60. But I believe this is the
right thing, so I'd like to gather more comments.

Full diff: https://github.com/llvm/llvm-project/pull/108551.diff

1 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+128-127)

diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 52f5a637eb740d..54423a5b5cc0f4 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -6,6 +6,132 @@
 //
 //===----------------------------------------------------------------------===//
 
+//===----------------------------------------------------------------------===//
+// LLVM specific features and extensions
+//===----------------------------------------------------------------------===//
+
+// Feature32Bit exists to mark CPUs that support RV32 to distinquish them from
+// tuning CPU names.
+def Feature32Bit
+    : SubtargetFeature<"32bit", "IsRV32", "true", "Implements RV32">;
+def Feature64Bit
+    : SubtargetFeature<"64bit", "IsRV64", "true", "Implements RV64">;
+def IsRV64 : Predicate<"Subtarget->is64Bit()">,
+             AssemblerPredicate<(all_of Feature64Bit),
+                                "RV64I Base Instruction Set">;
+def IsRV32 : Predicate<"!Subtarget->is64Bit()">,
+             AssemblerPredicate<(all_of (not Feature64Bit)),
+                                "RV32I Base Instruction Set">;
+
+defvar RV32 = DefaultMode;
+def RV64           : HwMode<"+64bit", [IsRV64]>;
+
+def FeatureRelax
+    : SubtargetFeature<"relax", "EnableLinkerRelax", "true",
+                       "Enable Linker relaxation.">;
+
+foreach i = {1-31} in
+  def FeatureReserveX#i :
+      SubtargetFeature<"reserve-x"#i, "UserReservedRegister[RISCV::X"#i#"]",
+                       "true", "Reserve X"#i>;
+
+def FeatureSaveRestore : SubtargetFeature<"save-restore", "EnableSaveRestore",
+                                          "true", "Enable save/restore.">;
+
+def FeatureNoTrailingSeqCstFence : SubtargetFeature<"no-trailing-seq-cst-fence",
+                                          "EnableTrailingSeqCstFence",
+                                          "false",
+                                          "Disable trailing fence for seq-cst store.">;
+
+def FeatureUnalignedScalarMem
+   : SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",
+                      "true", "Has reasonably performant unaligned scalar "
+                      "loads and stores">;
+
+def FeatureUnalignedVectorMem
+   : SubtargetFeature<"unaligned-vector-mem", "EnableUnalignedVectorMem",
+                      "true", "Has reasonably performant unaligned vector "
+                      "loads and stores">;
+
+def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
+    "UsePostRAScheduler", "true", "Schedule again after register allocation">;
+
+def FeaturePredictableSelectIsExpensive
+    : SubtargetFeature<"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
+                       "Prefer likely predicted branches over selects">;
+
+def TuneOptimizedZeroStrideLoad
+   : SubtargetFeature<"optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
+                      "true", "Optimized (perform fewer memory operations)"
+                      "zero-stride vector load">;
+
+def Experimental
+   : SubtargetFeature<"experimental", "HasExperimental",
+                      "true", "Experimental intrinsics">;
+
+// Some vector hardware implementations do not process all VLEN bits in parallel
+// and instead split over multiple cycles. DLEN refers to the datapath width
+// that can be done in parallel.
+def TuneDLenFactor2
+   : SubtargetFeature<"dlen-factor-2", "DLenFactor2", "true",
+                      "Vector unit DLEN(data path width) is half of VLEN">;
+
+def TuneNoDefaultUnroll
+    : SubtargetFeature<"no-default-unroll", "EnableDefaultUnroll", "false",
+                       "Disable default unroll preference.">;
+
+// SiFive 7 is able to fuse integer ALU operations with a preceding branch
+// instruction.
+def TuneShortForwardBranchOpt
+    : SubtargetFeature<"short-forward-branch-opt", "HasShortForwardBranchOpt",
+                       "true", "Enable short forward branch optimization">;
+def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
+def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
+
+// Some subtargets require a S2V transfer buffer to move scalars into vectors.
+// FIXME: Forming .vx/.vf/.wx/.wf can reduce register pressure.
+def TuneNoSinkSplatOperands
+    : SubtargetFeature<"no-sink-splat-operands", "SinkSplatOperands",
+                       "false", "Disable sink splat operands to enable .vx, .vf,"
+                       ".wx, and .wf instructions">;
+
+def TunePreferWInst
+    : SubtargetFeature<"prefer-w-inst", "PreferWInst", "true",
+                       "Prefer instructions with W suffix">;
+
+def TuneConditionalCompressedMoveFusion
+    : SubtargetFeature<"conditional-cmv-fusion", "HasConditionalCompressedMoveFusion",
+                       "true", "Enable branch+c.mv fusion">;
+def HasConditionalMoveFusion : Predicate<"Subtarget->hasConditionalMoveFusion()">;
+def NoConditionalMoveFusion  : Predicate<"!Subtarget->hasConditionalMoveFusion()">;
+
+def TuneSiFive7 : SubtargetFeature<"sifive7", "RISCVProcFamily", "SiFive7",
+                                   "SiFive 7-Series processors">;
+
+def TuneVentanaVeyron : SubtargetFeature<"ventana-veyron", "RISCVProcFamily", "VentanaVeyron",
+                                         "Ventana Veyron-Series processors">;
+
+// Assume that lock-free native-width atomics are available, even if the target
+// and operating system combination would not usually provide them. The user
+// is responsible for providing any necessary __sync implementations. Code
+// built with this feature is not ABI-compatible with code built without this
+// feature, if atomic variables are exposed across the ABI boundary.
+def FeatureForcedAtomics : SubtargetFeature<
+    "forced-atomics", "HasForcedAtomics", "true",
+    "Assume that lock-free native-width atomics are available">;
+def HasAtomicLdSt
+    : Predicate<"Subtarget->hasStdExtA() || Subtarget->hasForcedAtomics()">;
+
+def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
+    "AllowTaggedGlobals",
+    "true", "Use an instruction sequence for taking the address of a global "
+    "that allows a memory tag in the upper address bits">;
+
+def FeatureForcedSWShadowStack : SubtargetFeature<
+    "forced-sw-shadow-stack", "HasForcedSWShadowStack", "true",
+    "Implement shadow stack with software.">;
+def HasForcedSWShadowStack : Predicate<"Subtarget->hasForcedSWShadowStack()">;
+
 //===----------------------------------------------------------------------===//
 // RISC-V subtarget features and instruction predicates.
 //===----------------------------------------------------------------------===//
@@ -104,7 +230,8 @@ def FeatureStdExtZiccif
 
 def FeatureStdExtZicclsm
     : RISCVExtension<"zicclsm", 1, 0,
-                     "'Zicclsm' (Main Memory Supports Misaligned Loads/Stores)">;
+                     "'Zicclsm' (Main Memory Supports Misaligned Loads/Stores)",
+                     [FeatureUnalignedScalarMem, FeatureUnalignedVectorMem]>;
 
 def FeatureStdExtZiccrse
     : RISCVExtension<"ziccrse", 1, 0,
@@ -1299,129 +1426,3 @@ def HasVendorXwchc
     : Predicate<"Subtarget->hasVendorXwchc()">,
       AssemblerPredicate<(all_of FeatureVendorXwchc),
                          "'Xwchc' (WCH/QingKe additional compressed opcodes)">;
-
-//===----------------------------------------------------------------------===//
-// LLVM specific features and extensions
-//===----------------------------------------------------------------------===//
-
-// Feature32Bit exists to mark CPUs that support RV32 to distinquish them from
-// tuning CPU names.
-def Feature32Bit
-    : SubtargetFeature<"32bit", "IsRV32", "true", "Implements RV32">;
-def Feature64Bit
-    : SubtargetFeature<"64bit", "IsRV64", "true", "Implements RV64">;
-def IsRV64 : Predicate<"Subtarget->is64Bit()">,
-             AssemblerPredicate<(all_of Feature64Bit),
-                                "RV64I Base Instruction Set">;
-def IsRV32 : Predicate<"!Subtarget->is64Bit()">,
-             AssemblerPredicate<(all_of (not Feature64Bit)),
-                                "RV32I Base Instruction Set">;
-
-defvar RV32 = DefaultMode;
-def RV64           : HwMode<"+64bit", [IsRV64]>;
-
-def FeatureRelax
-    : SubtargetFeature<"relax", "EnableLinkerRelax", "true",
-                       "Enable Linker relaxation.">;
-
-foreach i = {1-31} in
-  def FeatureReserveX#i :
-      SubtargetFeature<"reserve-x"#i, "UserReservedRegister[RISCV::X"#i#"]",
-                       "true", "Reserve X"#i>;
-
-def FeatureSaveRestore : SubtargetFeature<"save-restore", "EnableSaveRestore",
-                                          "true", "Enable save/restore.">;
-
-def FeatureNoTrailingSeqCstFence : SubtargetFeature<"no-trailing-seq-cst-fence",
-                                          "EnableTrailingSeqCstFence",
-                                          "false",
-                                          "Disable trailing fence for seq-cst store.">;
-
-def FeatureUnalignedScalarMem
-   : SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",
-                      "true", "Has reasonably performant unaligned scalar "
-                      "loads and stores">;
-
-def FeatureUnalignedVectorMem
-   : SubtargetFeature<"unaligned-vector-mem", "EnableUnalignedVectorMem",
-                      "true", "Has reasonably performant unaligned vector "
-                      "loads and stores">;
-
-def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
-    "UsePostRAScheduler", "true", "Schedule again after register allocation">;
-
-def FeaturePredictableSelectIsExpensive
-    : SubtargetFeature<"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
-                       "Prefer likely predicted branches over selects">;
-
-def TuneOptimizedZeroStrideLoad
-   : SubtargetFeature<"optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
-                      "true", "Optimized (perform fewer memory operations)"
-                      "zero-stride vector load">;
-
-def Experimental
-   : SubtargetFeature<"experimental", "HasExperimental",
-                      "true", "Experimental intrinsics">;
-
-// Some vector hardware implementations do not process all VLEN bits in parallel
-// and instead split over multiple cycles. DLEN refers to the datapath width
-// that can be done in parallel.
-def TuneDLenFactor2
-   : SubtargetFeature<"dlen-factor-2", "DLenFactor2", "true",
-                      "Vector unit DLEN(data path width) is half of VLEN">;
-
-def TuneNoDefaultUnroll
-    : SubtargetFeature<"no-default-unroll", "EnableDefaultUnroll", "false",
-                       "Disable default unroll preference.">;
-
-// SiFive 7 is able to fuse integer ALU operations with a preceding branch
-// instruction.
-def TuneShortForwardBranchOpt
-    : SubtargetFeature<"short-forward-branch-opt", "HasShortForwardBranchOpt",
-                       "true", "Enable short forward branch optimization">;
-def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
-def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
-
-// Some subtargets require a S2V transfer buffer to move scalars into vectors.
-// FIXME: Forming .vx/.vf/.wx/.wf can reduce register pressure.
-def TuneNoSinkSplatOperands
-    : SubtargetFeature<"no-sink-splat-operands", "SinkSplatOperands",
-                       "false", "Disable sink splat operands to enable .vx, .vf,"
-                       ".wx, and .wf instructions">;
-
-def TunePreferWInst
-    : SubtargetFeature<"prefer-w-inst", "PreferWInst", "true",
-                       "Prefer instructions with W suffix">;
-
-def TuneConditionalCompressedMoveFusion
-    : SubtargetFeature<"conditional-cmv-fusion", "HasConditionalCompressedMoveFusion",
-                       "true", "Enable branch+c.mv fusion">;
-def HasConditionalMoveFusion : Predicate<"Subtarget->hasConditionalMoveFusion()">;
-def NoConditionalMoveFusion  : Predicate<"!Subtarget->hasConditionalMoveFusion()">;
-
-def TuneSiFive7 : SubtargetFeature<"sifive7", "RISCVProcFamily", "SiFive7",
-                                   "SiFive 7-Series processors">;
-
-def TuneVentanaVeyron : SubtargetFeature<"ventana-veyron", "RISCVProcFamily", "VentanaVeyron",
-                                         "Ventana Veyron-Series processors">;
-
-// Assume that lock-free native-width atomics are available, even if the target
-// and operating system combination would not usually provide them. The user
-// is responsible for providing any necessary __sync implementations. Code
-// built with this feature is not ABI-compatible with code built without this
-// feature, if atomic variables are exposed across the ABI boundary.
-def FeatureForcedAtomics : SubtargetFeature<
-    "forced-atomics", "HasForcedAtomics", "true",
-    "Assume that lock-free native-width atomics are available">;
-def HasAtomicLdSt
-    : Predicate<"Subtarget->hasStdExtA() || Subtarget->hasForcedAtomics()">;
-
-def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
-    "AllowTaggedGlobals",
-    "true", "Use an instruction sequence for taking the address of a global "
-    "that allows a memory tag in the upper address bits">;
-
-def FeatureForcedSWShadowStack : SubtargetFeature<
-    "forced-sw-shadow-stack", "HasForcedSWShadowStack", "true",
-    "Implement shadow stack with software.">;
-def HasForcedSWShadowStack : Predicate<"Subtarget->hasForcedSWShadowStack()">;

topperc · 2024-09-13T13:28:27Z

Zicclsm makes no guarantees about performance of unaligned access. It only says they won't trap. They can be emulated in the kernel and take thousands of cycles.

dtcxzyw · 2024-09-13T13:32:55Z

Zicclsm makes no guarantees about performance of unaligned access. It only says they won't trap. They can be emulated in the kernel and take thousands of cycles.

See also https://patchew.org/linux/[email protected]/[email protected]/

Note:
This introduces a new extension name for this feature.
This requires misaligned support for all regular load and store
instructions (including scalar and vector) but not AMOs or other
specialized forms of memory access. Even though mandated, misaligned
loads and stores might execute extremely slowly. Standard software
distributions should assume their existence only for correctness,
not for performance.

wangpc-pp · 2024-09-13T13:43:53Z

I think I misunderstood it, thanks everyone!

This patch is the follow-up of #94352 with some updates: 1. Add support for more extensions for `zve*`, `zimop`, `zc*`, `zcmop` and `zawrs`. 2. Use `RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF` to check whether the processor supports fast misaligned scalar memory access. #108551 reminds me that the patch https://lore.kernel.org/all/[email protected]/T/ has been merged. Address comment #94352 (comment). References: 1. constants: https://github.com/torvalds/linux/blame/v6.11-rc7/arch/riscv/include/uapi/asm/hwprobe.h 2. https://docs.kernel.org/arch/riscv/hwprobe.html 3. Related commits: 1. `zve*` support: torvalds/linux@de8f828 2. `zimop` support: torvalds/linux@36f8960 3. `zc*` support: torvalds/linux@0ad70db 4. `zcmop` support: torvalds/linux@fc078ea 5. `zawrs` support: torvalds/linux@244c18f 6. scalar misaligned perf: torvalds/linux@c42e2f0 and torvalds/linux@1f52888

llvmbot added the backend:RISC-V label Sep 13, 2024

wangpc-pp changed the title ~~[RISCV] Make FeatureStdExtZicclsm imply FeatureUnalignedScalarMem and FeatureUnalignedVectorMem~~ [RISCV] Make Zicclsm imply unaligned scalar and vector access Sep 13, 2024

wangpc-pp requested review from asb, preames, lukel97, kito-cheng, dtcxzyw and topperc September 13, 2024 12:38

wangpc-pp closed this Sep 13, 2024

dtcxzyw mentioned this pull request Sep 13, 2024

[RISCV] Bump hwprobe support to Linux 6.11 #108578

Merged

wangpc-pp deleted the main-riscv-profile-unaligned branch September 14, 2024 04:58

dtcxzyw mentioned this pull request Sep 30, 2024

RISC-V: enable -mno-strict-align when Zicclsm is enabled #110454

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Make Zicclsm imply unaligned scalar and vector access #108551

[RISCV] Make Zicclsm imply unaligned scalar and vector access #108551

Uh oh!

wangpc-pp commented Sep 13, 2024

Uh oh!

llvmbot commented Sep 13, 2024

Uh oh!

topperc commented Sep 13, 2024

Uh oh!

dtcxzyw commented Sep 13, 2024

Uh oh!

wangpc-pp commented Sep 13, 2024

Uh oh!

Uh oh!

[RISCV] Make Zicclsm imply unaligned scalar and vector access #108551

[RISCV] Make Zicclsm imply unaligned scalar and vector access #108551

Uh oh!

Conversation

wangpc-pp commented Sep 13, 2024

Uh oh!

llvmbot commented Sep 13, 2024

Uh oh!

topperc commented Sep 13, 2024

Uh oh!

dtcxzyw commented Sep 13, 2024

Uh oh!

wangpc-pp commented Sep 13, 2024

Uh oh!

Uh oh!