-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV] Make Zicclsm imply unaligned scalar and vector access #108551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… FeatureUnalignedVectorMem According to the RISC-V profiles specification: > Zicclsm Misaligned loads and stores to main memory regions with > both the cacheability and coherence PMAs must be supported. `Zicclsm` should imply both scalar and vector unaligned access.
@llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) ChangesAccording to the RISC-V profiles specification:
This PR moves all LLVM specific features and extensions above. This may break some CPU definitions that don't support unaligned Full diff: https://github.com/llvm/llvm-project/pull/108551.diff 1 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 52f5a637eb740d..54423a5b5cc0f4 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -6,6 +6,132 @@
//
//===----------------------------------------------------------------------===//
+//===----------------------------------------------------------------------===//
+// LLVM specific features and extensions
+//===----------------------------------------------------------------------===//
+
+// Feature32Bit exists to mark CPUs that support RV32 to distinquish them from
+// tuning CPU names.
+def Feature32Bit
+ : SubtargetFeature<"32bit", "IsRV32", "true", "Implements RV32">;
+def Feature64Bit
+ : SubtargetFeature<"64bit", "IsRV64", "true", "Implements RV64">;
+def IsRV64 : Predicate<"Subtarget->is64Bit()">,
+ AssemblerPredicate<(all_of Feature64Bit),
+ "RV64I Base Instruction Set">;
+def IsRV32 : Predicate<"!Subtarget->is64Bit()">,
+ AssemblerPredicate<(all_of (not Feature64Bit)),
+ "RV32I Base Instruction Set">;
+
+defvar RV32 = DefaultMode;
+def RV64 : HwMode<"+64bit", [IsRV64]>;
+
+def FeatureRelax
+ : SubtargetFeature<"relax", "EnableLinkerRelax", "true",
+ "Enable Linker relaxation.">;
+
+foreach i = {1-31} in
+ def FeatureReserveX#i :
+ SubtargetFeature<"reserve-x"#i, "UserReservedRegister[RISCV::X"#i#"]",
+ "true", "Reserve X"#i>;
+
+def FeatureSaveRestore : SubtargetFeature<"save-restore", "EnableSaveRestore",
+ "true", "Enable save/restore.">;
+
+def FeatureNoTrailingSeqCstFence : SubtargetFeature<"no-trailing-seq-cst-fence",
+ "EnableTrailingSeqCstFence",
+ "false",
+ "Disable trailing fence for seq-cst store.">;
+
+def FeatureUnalignedScalarMem
+ : SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",
+ "true", "Has reasonably performant unaligned scalar "
+ "loads and stores">;
+
+def FeatureUnalignedVectorMem
+ : SubtargetFeature<"unaligned-vector-mem", "EnableUnalignedVectorMem",
+ "true", "Has reasonably performant unaligned vector "
+ "loads and stores">;
+
+def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
+ "UsePostRAScheduler", "true", "Schedule again after register allocation">;
+
+def FeaturePredictableSelectIsExpensive
+ : SubtargetFeature<"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
+ "Prefer likely predicted branches over selects">;
+
+def TuneOptimizedZeroStrideLoad
+ : SubtargetFeature<"optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
+ "true", "Optimized (perform fewer memory operations)"
+ "zero-stride vector load">;
+
+def Experimental
+ : SubtargetFeature<"experimental", "HasExperimental",
+ "true", "Experimental intrinsics">;
+
+// Some vector hardware implementations do not process all VLEN bits in parallel
+// and instead split over multiple cycles. DLEN refers to the datapath width
+// that can be done in parallel.
+def TuneDLenFactor2
+ : SubtargetFeature<"dlen-factor-2", "DLenFactor2", "true",
+ "Vector unit DLEN(data path width) is half of VLEN">;
+
+def TuneNoDefaultUnroll
+ : SubtargetFeature<"no-default-unroll", "EnableDefaultUnroll", "false",
+ "Disable default unroll preference.">;
+
+// SiFive 7 is able to fuse integer ALU operations with a preceding branch
+// instruction.
+def TuneShortForwardBranchOpt
+ : SubtargetFeature<"short-forward-branch-opt", "HasShortForwardBranchOpt",
+ "true", "Enable short forward branch optimization">;
+def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
+def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
+
+// Some subtargets require a S2V transfer buffer to move scalars into vectors.
+// FIXME: Forming .vx/.vf/.wx/.wf can reduce register pressure.
+def TuneNoSinkSplatOperands
+ : SubtargetFeature<"no-sink-splat-operands", "SinkSplatOperands",
+ "false", "Disable sink splat operands to enable .vx, .vf,"
+ ".wx, and .wf instructions">;
+
+def TunePreferWInst
+ : SubtargetFeature<"prefer-w-inst", "PreferWInst", "true",
+ "Prefer instructions with W suffix">;
+
+def TuneConditionalCompressedMoveFusion
+ : SubtargetFeature<"conditional-cmv-fusion", "HasConditionalCompressedMoveFusion",
+ "true", "Enable branch+c.mv fusion">;
+def HasConditionalMoveFusion : Predicate<"Subtarget->hasConditionalMoveFusion()">;
+def NoConditionalMoveFusion : Predicate<"!Subtarget->hasConditionalMoveFusion()">;
+
+def TuneSiFive7 : SubtargetFeature<"sifive7", "RISCVProcFamily", "SiFive7",
+ "SiFive 7-Series processors">;
+
+def TuneVentanaVeyron : SubtargetFeature<"ventana-veyron", "RISCVProcFamily", "VentanaVeyron",
+ "Ventana Veyron-Series processors">;
+
+// Assume that lock-free native-width atomics are available, even if the target
+// and operating system combination would not usually provide them. The user
+// is responsible for providing any necessary __sync implementations. Code
+// built with this feature is not ABI-compatible with code built without this
+// feature, if atomic variables are exposed across the ABI boundary.
+def FeatureForcedAtomics : SubtargetFeature<
+ "forced-atomics", "HasForcedAtomics", "true",
+ "Assume that lock-free native-width atomics are available">;
+def HasAtomicLdSt
+ : Predicate<"Subtarget->hasStdExtA() || Subtarget->hasForcedAtomics()">;
+
+def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
+ "AllowTaggedGlobals",
+ "true", "Use an instruction sequence for taking the address of a global "
+ "that allows a memory tag in the upper address bits">;
+
+def FeatureForcedSWShadowStack : SubtargetFeature<
+ "forced-sw-shadow-stack", "HasForcedSWShadowStack", "true",
+ "Implement shadow stack with software.">;
+def HasForcedSWShadowStack : Predicate<"Subtarget->hasForcedSWShadowStack()">;
+
//===----------------------------------------------------------------------===//
// RISC-V subtarget features and instruction predicates.
//===----------------------------------------------------------------------===//
@@ -104,7 +230,8 @@ def FeatureStdExtZiccif
def FeatureStdExtZicclsm
: RISCVExtension<"zicclsm", 1, 0,
- "'Zicclsm' (Main Memory Supports Misaligned Loads/Stores)">;
+ "'Zicclsm' (Main Memory Supports Misaligned Loads/Stores)",
+ [FeatureUnalignedScalarMem, FeatureUnalignedVectorMem]>;
def FeatureStdExtZiccrse
: RISCVExtension<"ziccrse", 1, 0,
@@ -1299,129 +1426,3 @@ def HasVendorXwchc
: Predicate<"Subtarget->hasVendorXwchc()">,
AssemblerPredicate<(all_of FeatureVendorXwchc),
"'Xwchc' (WCH/QingKe additional compressed opcodes)">;
-
-//===----------------------------------------------------------------------===//
-// LLVM specific features and extensions
-//===----------------------------------------------------------------------===//
-
-// Feature32Bit exists to mark CPUs that support RV32 to distinquish them from
-// tuning CPU names.
-def Feature32Bit
- : SubtargetFeature<"32bit", "IsRV32", "true", "Implements RV32">;
-def Feature64Bit
- : SubtargetFeature<"64bit", "IsRV64", "true", "Implements RV64">;
-def IsRV64 : Predicate<"Subtarget->is64Bit()">,
- AssemblerPredicate<(all_of Feature64Bit),
- "RV64I Base Instruction Set">;
-def IsRV32 : Predicate<"!Subtarget->is64Bit()">,
- AssemblerPredicate<(all_of (not Feature64Bit)),
- "RV32I Base Instruction Set">;
-
-defvar RV32 = DefaultMode;
-def RV64 : HwMode<"+64bit", [IsRV64]>;
-
-def FeatureRelax
- : SubtargetFeature<"relax", "EnableLinkerRelax", "true",
- "Enable Linker relaxation.">;
-
-foreach i = {1-31} in
- def FeatureReserveX#i :
- SubtargetFeature<"reserve-x"#i, "UserReservedRegister[RISCV::X"#i#"]",
- "true", "Reserve X"#i>;
-
-def FeatureSaveRestore : SubtargetFeature<"save-restore", "EnableSaveRestore",
- "true", "Enable save/restore.">;
-
-def FeatureNoTrailingSeqCstFence : SubtargetFeature<"no-trailing-seq-cst-fence",
- "EnableTrailingSeqCstFence",
- "false",
- "Disable trailing fence for seq-cst store.">;
-
-def FeatureUnalignedScalarMem
- : SubtargetFeature<"unaligned-scalar-mem", "EnableUnalignedScalarMem",
- "true", "Has reasonably performant unaligned scalar "
- "loads and stores">;
-
-def FeatureUnalignedVectorMem
- : SubtargetFeature<"unaligned-vector-mem", "EnableUnalignedVectorMem",
- "true", "Has reasonably performant unaligned vector "
- "loads and stores">;
-
-def FeaturePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
- "UsePostRAScheduler", "true", "Schedule again after register allocation">;
-
-def FeaturePredictableSelectIsExpensive
- : SubtargetFeature<"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
- "Prefer likely predicted branches over selects">;
-
-def TuneOptimizedZeroStrideLoad
- : SubtargetFeature<"optimized-zero-stride-load", "HasOptimizedZeroStrideLoad",
- "true", "Optimized (perform fewer memory operations)"
- "zero-stride vector load">;
-
-def Experimental
- : SubtargetFeature<"experimental", "HasExperimental",
- "true", "Experimental intrinsics">;
-
-// Some vector hardware implementations do not process all VLEN bits in parallel
-// and instead split over multiple cycles. DLEN refers to the datapath width
-// that can be done in parallel.
-def TuneDLenFactor2
- : SubtargetFeature<"dlen-factor-2", "DLenFactor2", "true",
- "Vector unit DLEN(data path width) is half of VLEN">;
-
-def TuneNoDefaultUnroll
- : SubtargetFeature<"no-default-unroll", "EnableDefaultUnroll", "false",
- "Disable default unroll preference.">;
-
-// SiFive 7 is able to fuse integer ALU operations with a preceding branch
-// instruction.
-def TuneShortForwardBranchOpt
- : SubtargetFeature<"short-forward-branch-opt", "HasShortForwardBranchOpt",
- "true", "Enable short forward branch optimization">;
-def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
-def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
-
-// Some subtargets require a S2V transfer buffer to move scalars into vectors.
-// FIXME: Forming .vx/.vf/.wx/.wf can reduce register pressure.
-def TuneNoSinkSplatOperands
- : SubtargetFeature<"no-sink-splat-operands", "SinkSplatOperands",
- "false", "Disable sink splat operands to enable .vx, .vf,"
- ".wx, and .wf instructions">;
-
-def TunePreferWInst
- : SubtargetFeature<"prefer-w-inst", "PreferWInst", "true",
- "Prefer instructions with W suffix">;
-
-def TuneConditionalCompressedMoveFusion
- : SubtargetFeature<"conditional-cmv-fusion", "HasConditionalCompressedMoveFusion",
- "true", "Enable branch+c.mv fusion">;
-def HasConditionalMoveFusion : Predicate<"Subtarget->hasConditionalMoveFusion()">;
-def NoConditionalMoveFusion : Predicate<"!Subtarget->hasConditionalMoveFusion()">;
-
-def TuneSiFive7 : SubtargetFeature<"sifive7", "RISCVProcFamily", "SiFive7",
- "SiFive 7-Series processors">;
-
-def TuneVentanaVeyron : SubtargetFeature<"ventana-veyron", "RISCVProcFamily", "VentanaVeyron",
- "Ventana Veyron-Series processors">;
-
-// Assume that lock-free native-width atomics are available, even if the target
-// and operating system combination would not usually provide them. The user
-// is responsible for providing any necessary __sync implementations. Code
-// built with this feature is not ABI-compatible with code built without this
-// feature, if atomic variables are exposed across the ABI boundary.
-def FeatureForcedAtomics : SubtargetFeature<
- "forced-atomics", "HasForcedAtomics", "true",
- "Assume that lock-free native-width atomics are available">;
-def HasAtomicLdSt
- : Predicate<"Subtarget->hasStdExtA() || Subtarget->hasForcedAtomics()">;
-
-def FeatureTaggedGlobals : SubtargetFeature<"tagged-globals",
- "AllowTaggedGlobals",
- "true", "Use an instruction sequence for taking the address of a global "
- "that allows a memory tag in the upper address bits">;
-
-def FeatureForcedSWShadowStack : SubtargetFeature<
- "forced-sw-shadow-stack", "HasForcedSWShadowStack", "true",
- "Implement shadow stack with software.">;
-def HasForcedSWShadowStack : Predicate<"Subtarget->hasForcedSWShadowStack()">;
|
Zicclsm makes no guarantees about performance of unaligned access. It only says they won't trap. They can be emulated in the kernel and take thousands of cycles. |
See also https://patchew.org/linux/[email protected]/[email protected]/
|
I think I misunderstood it, thanks everyone! |
This patch is the follow-up of #94352 with some updates: 1. Add support for more extensions for `zve*`, `zimop`, `zc*`, `zcmop` and `zawrs`. 2. Use `RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF` to check whether the processor supports fast misaligned scalar memory access. #108551 reminds me that the patch https://lore.kernel.org/all/[email protected]/T/ has been merged. Address comment #94352 (comment). References: 1. constants: https://github.com/torvalds/linux/blame/v6.11-rc7/arch/riscv/include/uapi/asm/hwprobe.h 2. https://docs.kernel.org/arch/riscv/hwprobe.html 3. Related commits: 1. `zve*` support: torvalds/linux@de8f828 2. `zimop` support: torvalds/linux@36f8960 3. `zc*` support: torvalds/linux@0ad70db 4. `zcmop` support: torvalds/linux@fc078ea 5. `zawrs` support: torvalds/linux@244c18f 6. scalar misaligned perf: torvalds/linux@c42e2f0 and torvalds/linux@1f52888
According to the RISC-V profiles specification:
Zicclsm
should imply both scalar and vector unaligned access.This PR moves all LLVM specific features and extensions above.
This may break some CPU definitions that don't support unaligned
vector access, like
spacemit-x60
. But I believe this is theright thing, so I'd like to gather more comments.