-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV] Select atomic_{load/store} to pseudos and expand them later #67108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In previous implementation, we will insert leading/trailing fences in `AtomicExpandPass` and change the memory order to `monotonic` before ISel. And then atomic load/store are selected to normal load/store instructions. In this patch, we won't insert fences for atomic load/store before ISel and we select them to pseudos. Then, we expand them just like other atomic operations in `RISCVExpandAtomicPseudo`. I do this for two reason: 1. All atomic operations are expanded in `RISCVExpandAtomicPseudo` except atomic load/store, which is inconsistent and a little confusing when digging into the implementation. 2. For some hardware implementations, `load+fence` and `fence+store` can be fused to optimized macro instructions. This requires that fence and load/store are glued together. We can achieve this via defining `DAGMutation`s in `RISCVMacroFusion.cpp`, but I think this expansion method is more straightforward since we consider atomic load/store as whole instructions.
@llvm/pr-subscribers-backend-risc-v ChangesIn previous implementation, we will insert leading/trailing fences In this patch, we won't insert fences for atomic load/store before I do this for two reason:
Full diff: https://github.com/llvm/llvm-project/pull/67108.diff 5 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
index bb772fc5da92244..ba71b18d77c7abb 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandAtomicPseudoInsts.cpp
@@ -48,6 +48,9 @@ class RISCVExpandAtomicPseudo : public MachineFunctionPass {
bool expandMBB(MachineBasicBlock &MBB);
bool expandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
MachineBasicBlock::iterator &NextMBBI);
+ bool expandAtomicLoadStore(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI,
+ unsigned int Opcode, bool IsLoad);
bool expandAtomicBinOp(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, AtomicRMWInst::BinOp,
bool IsMasked, int Width,
@@ -111,6 +114,22 @@ bool RISCVExpandAtomicPseudo::expandMI(MachineBasicBlock &MBB,
// expanded instructions for each pseudo is correct in the Size field of the
// tablegen definition for the pseudo.
switch (MBBI->getOpcode()) {
+ case RISCV::PseudoAtomicLB:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::LB, true);
+ case RISCV::PseudoAtomicLH:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::LH, true);
+ case RISCV::PseudoAtomicLW:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::LW, true);
+ case RISCV::PseudoAtomicLD:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::LD, true);
+ case RISCV::PseudoAtomicSB:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::SB, false);
+ case RISCV::PseudoAtomicSH:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::SH, false);
+ case RISCV::PseudoAtomicSW:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::SW, false);
+ case RISCV::PseudoAtomicSD:
+ return expandAtomicLoadStore(MBB, MBBI, RISCV::SD, false);
case RISCV::PseudoAtomicLoadNand32:
return expandAtomicBinOp(MBB, MBBI, AtomicRMWInst::Nand, false, 32,
NextMBBI);
@@ -385,6 +404,91 @@ static void doMaskedAtomicBinOpExpansion(const RISCVInstrInfo *TII,
.addMBB(LoopMBB);
}
+static void insertFence(const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI, DebugLoc DL,
+ AtomicOrdering Ordering) {
+ // fence acq_rel -> fence.tso
+ if (Ordering == AtomicOrdering::AcquireRelease) {
+ BuildMI(MBB, MBBI, DL, TII->get(RISCV::FENCE_TSO));
+ } else {
+ int Pred, Succ;
+ switch (Ordering) {
+ default:
+ llvm_unreachable("Unsupported AtomicOrdering");
+ case AtomicOrdering::Acquire:
+ // fence acquire -> fence r, rw
+ Pred = 0b10;
+ Succ = 0b11;
+ break;
+ case AtomicOrdering::Release:
+ // fence release -> fence rw, w
+ Pred = 0b11;
+ Succ = 0b01;
+ break;
+ case AtomicOrdering::SequentiallyConsistent:
+ // fence seq_cst -> fence rw, rw
+ Pred = 0b11;
+ Succ = 0b11;
+ break;
+ }
+ BuildMI(MBB, MBBI, DL, TII->get(RISCV::FENCE)).addImm(Pred).addImm(Succ);
+ }
+}
+
+static void emitLeadingFence(const RISCVSubtarget *Subtarget,
+ const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI, DebugLoc DL,
+ AtomicOrdering Ordering, bool IsLoad) {
+ if (Subtarget->hasStdExtZtso()) {
+ if (IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent)
+ insertFence(TII, MBB, MBBI, DL, Ordering);
+ return;
+ }
+
+ if (IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent) {
+ insertFence(TII, MBB, MBBI, DL, Ordering);
+ return;
+ }
+
+ if (!IsLoad && isReleaseOrStronger(Ordering))
+ insertFence(TII, MBB, MBBI, DL, AtomicOrdering::Release);
+}
+
+static void emitTrailingFence(const RISCVSubtarget *Subtarget,
+ const RISCVInstrInfo *TII, MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MBBI, DebugLoc DL,
+ AtomicOrdering Ordering, bool IsLoad) {
+ if (Subtarget->hasStdExtZtso()) {
+ if (!IsLoad && Ordering == AtomicOrdering::SequentiallyConsistent)
+ insertFence(TII, MBB, MBBI, DL, Ordering);
+ return;
+ }
+
+ if (IsLoad && isAcquireOrStronger(Ordering)) {
+ insertFence(TII, MBB, MBBI, DL, AtomicOrdering::Acquire);
+ return;
+ }
+
+ if (Subtarget->enableSeqCstTrailingFence() && !IsLoad &&
+ Ordering == AtomicOrdering::SequentiallyConsistent)
+ insertFence(TII, MBB, MBBI, DL, AtomicOrdering::SequentiallyConsistent);
+}
+
+bool RISCVExpandAtomicPseudo::expandAtomicLoadStore(
+ MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+ unsigned int Opcode, bool IsLoad) {
+ auto Ordering = static_cast<AtomicOrdering>(MBBI->getOperand(3).getImm());
+ DebugLoc DL = MBBI->getDebugLoc();
+ emitLeadingFence(STI, TII, MBB, MBBI, DL, Ordering, IsLoad);
+ BuildMI(MBB, MBBI, DL, TII->get(Opcode))
+ .add(MBBI->getOperand(0))
+ .add(MBBI->getOperand(1))
+ .add(MBBI->getOperand(2));
+ emitTrailingFence(STI, TII, MBB, MBBI, DL, Ordering, IsLoad);
+ MBBI->eraseFromParent();
+ return true;
+}
+
bool RISCVExpandAtomicPseudo::expandAtomicBinOp(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
AtomicRMWInst::BinOp BinOp, bool IsMasked, int Width,
@@ -554,7 +658,7 @@ bool RISCVExpandAtomicPseudo::expandAtomicMinMaxOp(
computeAndAddLiveIns(LiveRegs, *DoneMBB);
return true;
-}
+ }
// If a BNE on the cmpxchg comparison result immediately follows the cmpxchg
// operation, it can be folded into the cmpxchg expansion by
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f1cea6c6756f4fc..552666e8abbd88a 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17770,39 +17770,6 @@ void RISCVTargetLowering::LowerAsmOperandForConstraint(
TargetLowering::LowerAsmOperandForConstraint(Op, Constraint, Ops, DAG);
}
-Instruction *RISCVTargetLowering::emitLeadingFence(IRBuilderBase &Builder,
- Instruction *Inst,
- AtomicOrdering Ord) const {
- if (Subtarget.hasStdExtZtso()) {
- if (isa<LoadInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
- return Builder.CreateFence(Ord);
- return nullptr;
- }
-
- if (isa<LoadInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
- return Builder.CreateFence(Ord);
- if (isa<StoreInst>(Inst) && isReleaseOrStronger(Ord))
- return Builder.CreateFence(AtomicOrdering::Release);
- return nullptr;
-}
-
-Instruction *RISCVTargetLowering::emitTrailingFence(IRBuilderBase &Builder,
- Instruction *Inst,
- AtomicOrdering Ord) const {
- if (Subtarget.hasStdExtZtso()) {
- if (isa<StoreInst>(Inst) && Ord == AtomicOrdering::SequentiallyConsistent)
- return Builder.CreateFence(Ord);
- return nullptr;
- }
-
- if (isa<LoadInst>(Inst) && isAcquireOrStronger(Ord))
- return Builder.CreateFence(AtomicOrdering::Acquire);
- if (Subtarget.enableSeqCstTrailingFence() && isa<StoreInst>(Inst) &&
- Ord == AtomicOrdering::SequentiallyConsistent)
- return Builder.CreateFence(AtomicOrdering::SequentiallyConsistent);
- return nullptr;
-}
-
TargetLowering::AtomicExpansionKind
RISCVTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {
// atomicrmw {fadd,fsub} must be expanded to use compare-exchange, as floating
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 815b9be47f56026..52ace789a702c07 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -613,14 +613,6 @@ class RISCVTargetLowering : public TargetLowering {
bool preferZeroCompareBranch() const override { return true; }
- bool shouldInsertFencesForAtomic(const Instruction *I) const override {
- return isa<LoadInst>(I) || isa<StoreInst>(I);
- }
- Instruction *emitLeadingFence(IRBuilderBase &Builder, Instruction *Inst,
- AtomicOrdering Ord) const override;
- Instruction *emitTrailingFence(IRBuilderBase &Builder, Instruction *Inst,
- AtomicOrdering Ord) const override;
-
bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
EVT VT) const override;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 816ceaf95607e71..e56a91026702e31 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1270,11 +1270,45 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
if (MI.isMetaInstruction())
return 0;
+ const MachineFunction &MF = *MI.getParent()->getParent();
+ const auto &ST = MF.getSubtarget<RISCVSubtarget>();
+
unsigned Opcode = MI.getOpcode();
+ switch (Opcode) {
+ default:
+ break;
+ case RISCV::PseudoAtomicLB:
+ case RISCV::PseudoAtomicLH:
+ case RISCV::PseudoAtomicLW:
+ case RISCV::PseudoAtomicLD: {
+ auto Ordering = static_cast<AtomicOrdering>(MI.getOperand(3).getImm());
+ switch (Ordering) {
+ default:
+ return 4;
+ case AtomicOrdering::Acquire:
+ return 8;
+ case AtomicOrdering::SequentiallyConsistent:
+ return ST.hasStdExtZtso() ? 8 : 12;
+ }
+ }
+ case RISCV::PseudoAtomicSB:
+ case RISCV::PseudoAtomicSH:
+ case RISCV::PseudoAtomicSW:
+ case RISCV::PseudoAtomicSD: {
+ auto Ordering = static_cast<AtomicOrdering>(MI.getOperand(3).getImm());
+ switch (Ordering) {
+ default:
+ return 4;
+ case AtomicOrdering::Release:
+ return 8;
+ case AtomicOrdering::SequentiallyConsistent:
+ return ST.hasStdExtZtso() ? 8 : ST.enableSeqCstTrailingFence() ? 12 : 8;
+ }
+ }
+ }
if (Opcode == TargetOpcode::INLINEASM ||
Opcode == TargetOpcode::INLINEASM_BR) {
- const MachineFunction &MF = *MI.getParent()->getParent();
const auto &TM = static_cast<const RISCVTargetMachine &>(MF.getTarget());
return getInlineAsmLength(MI.getOperand(0).getSymbolName(),
*TM.getMCAsmInfo());
@@ -1282,8 +1316,6 @@ unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
if (!MI.memoperands_empty()) {
MachineMemOperand *MMO = *(MI.memoperands_begin());
- const MachineFunction &MF = *MI.getParent()->getParent();
- const auto &ST = MF.getSubtarget<RISCVSubtarget>();
if (ST.hasStdExtZihintntl() && MMO->isNonTemporal()) {
if (ST.hasStdExtCOrZca() && ST.enableRVCHintInstrs()) {
if (isCompressibleInst(MI, STI))
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
index c43af14bb7f7005..5d806de621f61be 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoA.td
@@ -110,21 +110,95 @@ defm AMOCAS_Q : AMO_rr_aq_rl<0b00101, 0b100, "amocas.q">;
//===----------------------------------------------------------------------===//
// Atomic load/store are available under both +a and +force-atomics.
-// Fences will be inserted for atomic load/stores according to the logic in
-// RISCVTargetLowering::{emitLeadingFence,emitTrailingFence}.
+
+// Pseudo atomic load instructions.
+class PseudoAtomicLoad
+ : Pseudo<(outs GPR:$rd), (ins GPRMem:$rs1, simm12:$imm12, ixlenimm:$ordering), []> {
+ let hasSideEffects = 1;
+ let mayLoad = 1;
+ let mayStore = 0;
+}
+
+// Pseudo atomic store instructions.
+class PseudoAtomicStore
+ : Pseudo<(outs), (ins GPR:$rs2, GPRMem:$rs1, simm12:$imm12, ixlenimm:$ordering), []> {
+ let hasSideEffects = 1;
+ let mayLoad = 0;
+ let mayStore = 1;
+}
+
+let IsAtomic = 1 in {
+class AtomicLoadPatFrag<PatFrags base> : PatFrag<(ops node:$ptr), (base node:$ptr)>;
+class AtomicStorePatFrag<PatFrags base> : PatFrag<(ops node:$val, node:$ptr),
+ (base node:$val, node:$ptr)>;
+}
+
+// An atomic load operation that doesn't actually need to be atomic.
+let IsAtomicOrderingAcquireOrStronger = 0 in
+class relaxed_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic load operation that needs acquire semantics.
+let IsAtomicOrderingAcquire = 1 in
+class acquire_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic load operation that needs sequential consistency semantics.
+let IsAtomicOrderingSequentiallyConsistent = 1 in
+class seq_cst_load<PatFrags base> : AtomicLoadPatFrag<base>;
+
+// An atomic store operation that doesn't actually need to be atomic.
+let IsAtomicOrderingReleaseOrStronger = 0 in
+class relaxed_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+// An atomic store operation that needs release semantics.
+let IsAtomicOrderingRelease = 1 in
+class release_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+// An atomic store operation that needs sequential consistency semantics.
+let IsAtomicOrderingSequentiallyConsistent = 1 in
+class seq_cst_store<PatFrags base> : AtomicStorePatFrag<base>;
+
+multiclass AtomicLdPat<PatFrag LoadOp, RVInst Inst> {
+ def : Pat<(XLenVT (relaxed_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+ (Inst GPR:$rs1, simm12:$imm12, 2)>;
+ def : Pat<(XLenVT (acquire_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+ (Inst GPR:$rs1, simm12:$imm12, 4)>;
+ def : Pat<(XLenVT (seq_cst_load<LoadOp> (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12))),
+ (Inst GPR:$rs1, simm12:$imm12, 7)>;
+}
+
+multiclass AtomicStPat<PatFrag StoreOp, RVInst Inst> {
+ def : Pat<(relaxed_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+ (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 2)>;
+ def : Pat<(release_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+ (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 5)>;
+ def : Pat<(seq_cst_store<StoreOp> (XLenVT GPR:$rs2), (AddrRegImm (XLenVT GPR:$rs1), simm12:$imm12)),
+ (Inst GPR:$rs2, GPR:$rs1, simm12:$imm12, 7)>;
+}
+
let Predicates = [HasAtomicLdSt] in {
- def : LdPat<atomic_load_8, LB>;
- def : LdPat<atomic_load_16, LH>;
- def : LdPat<atomic_load_32, LW>;
+ def PseudoAtomicLB : PseudoAtomicLoad;
+ def PseudoAtomicLH : PseudoAtomicLoad;
+ def PseudoAtomicLW : PseudoAtomicLoad;
+
+ def PseudoAtomicSB : PseudoAtomicStore;
+ def PseudoAtomicSH : PseudoAtomicStore;
+ def PseudoAtomicSW : PseudoAtomicStore;
- def : StPat<atomic_store_8, SB, GPR, XLenVT>;
- def : StPat<atomic_store_16, SH, GPR, XLenVT>;
- def : StPat<atomic_store_32, SW, GPR, XLenVT>;
+ defm : AtomicLdPat<atomic_load_8, PseudoAtomicLB>;
+ defm : AtomicLdPat<atomic_load_16, PseudoAtomicLH>;
+ defm : AtomicLdPat<atomic_load_32, PseudoAtomicLW>;
+
+ defm : AtomicStPat<atomic_store_8, PseudoAtomicSB>;
+ defm : AtomicStPat<atomic_store_16, PseudoAtomicSH>;
+ defm : AtomicStPat<atomic_store_32, PseudoAtomicSW>;
}
let Predicates = [HasAtomicLdSt, IsRV64] in {
- def : LdPat<atomic_load_64, LD, i64>;
- def : StPat<atomic_store_64, SD, GPR, i64>;
+ def PseudoAtomicLD : PseudoAtomicLoad;
+ def PseudoAtomicSD : PseudoAtomicStore;
+
+ defm : AtomicLdPat<atomic_load_64, PseudoAtomicLD>;
+ defm : AtomicStPat<atomic_store_64, PseudoAtomicSD>;
}
/// AMOs
|
a675062
to
ef38045
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
…later clang-format
If we want to make |
Ping. |
Ping. |
First of all, sorry that this fell of the review queue. Some thoughts:
|
Oh, and was #69685 just an opportune moment to re-advertise this patch - or did you see that work interacting with this change in some way? if so, could you elaborate please? Many thanks. |
Do other architectures use this approach? Having custom pseudo-based lowering for something SDAG can represent directly just fine is pretty sad (and adds more work for us downstream, too), it's a bunch of code that doesn't really need to exist. I would much rather see the generic lowering be adapted to support the optimisations you want (if they are in fact important) rather than making things clunkier like this. I won't block the patch, but I will heavily encourage this change to not be applied if at all possible. |
Fused atomic load/store are just like Before this PR, the lowering path of atomic load/store is:
After this PR, it will be like:
If I don't know if the above statement has made my intention of this PR clearer, let me know if there are any questions. :-) |
If we have Zalasr, wouldn't we disable the fence emission rather than trying to pattern match it? |
Yes, just don't expand them into memory access + fence... you don't need a pseudo for that |
Yes, I agree. We can just do the instruction selection like AArch64, and the patterns will be like these in this PR except that the selected instructions are The existence of pseudos is to unify the implementation of |
In previous implementation, we will insert leading/trailing fences
in
AtomicExpandPass
and change the memory order tomonotonic
before ISel. And then atomic load/store are selected to normal
load/store instructions.
In this patch, we won't insert fences for atomic load/store before
ISel and we select them to pseudos. Then, we expand them just like
other atomic operations in
RISCVExpandAtomicPseudo
.I do this for two reasons:
All atomic operations are expanded in
RISCVExpandAtomicPseudo
except atomic load/store, which is inconsistent and a little
confusing when digging into the implementation.
For some hardware implementations,
load+fence
andfence+store
can be fused to optimized macro instructions. This requires that
fence and load/store are glued together. We can achieve this via
defining
DAGMutation
s inRISCVMacroFusion.cpp
, but I thinkthis expansion method is more straightforward since we consider
atomic load/store as whole instructions.