-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RISCV] Add branch+c.mv macrofusion for sifive-p450. #76169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) Changessifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead. Patch is 24.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76169.diff 12 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
index 24a13f93af880e..a39f0671a6dc28 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
@@ -109,6 +109,7 @@ bool RISCVExpandPseudo::expandMI(MachineBasicBlock &MBB,
return expandRV32ZdinxStore(MBB, MBBI);
case RISCV::PseudoRV32ZdinxLD:
return expandRV32ZdinxLoad(MBB, MBBI);
+ case RISCV::PseudoCCMOVGPRNoX0:
case RISCV::PseudoCCMOVGPR:
case RISCV::PseudoCCADD:
case RISCV::PseudoCCSUB:
@@ -191,7 +192,8 @@ bool RISCVExpandPseudo::expandCCOp(MachineBasicBlock &MBB,
Register DestReg = MI.getOperand(0).getReg();
assert(MI.getOperand(4).getReg() == DestReg);
- if (MI.getOpcode() == RISCV::PseudoCCMOVGPR) {
+ if (MI.getOpcode() == RISCV::PseudoCCMOVGPR ||
+ MI.getOpcode() == RISCV::PseudoCCMOVGPRNoX0) {
// Add MV.
BuildMI(TrueBB, DL, TII->get(RISCV::ADDI), DestReg)
.add(MI.getOperand(5))
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 2095446c694bde..f02413b27a8d17 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -996,6 +996,12 @@ def TuneShortForwardBranchOpt
def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
+def TuneCMOVBranchOpt
+ : SubtargetFeature<"cmov-branch-opt", "HasCMOVBranchOpt",
+ "true", "Enable branch+c.mv optimization">;
+def CanUseCMOVBranchOpt : Predicate<"Subtarget->canUseCMOVBranchOpt()">;
+def NoCMOVBranchOpt : Predicate<"!Subtarget->canUseCMOVBranchOpt()">;
+
def TuneSiFive7 : SubtargetFeature<"sifive7", "RISCVProcFamily", "SiFive7",
"SiFive 7-Series processors",
[TuneNoDefaultUnroll,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 97d76ca494cbee..4fc84dfa9c141b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -6908,7 +6908,8 @@ static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG,
MVT VT = N->getSimpleValueType(0);
SDLoc DL(N);
- if (!Subtarget.hasShortForwardBranchOpt()) {
+ if (!Subtarget.hasShortForwardBranchOpt() &&
+ !Subtarget.canUseCMOVBranchOpt()) {
// (select c, -1, y) -> -c | y
if (isAllOnesConstant(TrueV)) {
SDValue Neg = DAG.getNegative(CondV, DL, VT);
@@ -7072,7 +7073,8 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
// (select c, t, f) -> (or (czero_eqz t, c), (czero_nez f, c))
// Unless we have the short forward branch optimization.
- if (!Subtarget.hasShortForwardBranchOpt())
+ if (!Subtarget.hasShortForwardBranchOpt() &&
+ !Subtarget.canUseCMOVBranchOpt())
return DAG.getNode(
ISD::OR, DL, VT,
DAG.getNode(RISCVISD::CZERO_EQZ, DL, VT, TrueV, CondV),
@@ -12099,7 +12101,8 @@ static SDValue combineSelectAndUse(SDNode *N, SDValue Slct, SDValue OtherOp,
if (VT.isVector())
return SDValue();
- if (!Subtarget.hasShortForwardBranchOpt()) {
+ if (!Subtarget.hasShortForwardBranchOpt() &&
+ !Subtarget.canUseCMOVBranchOpt()) {
// (select cond, x, (and x, c)) has custom lowering with Zicond.
if ((!Subtarget.hasStdExtZicond() &&
!Subtarget.hasVendorXVentanaCondOps()) ||
@@ -14328,7 +14331,7 @@ static SDValue performSELECTCombine(SDNode *N, SelectionDAG &DAG,
if (SDValue V = useInversedSetcc(N, DAG, Subtarget))
return V;
- if (Subtarget.hasShortForwardBranchOpt())
+ if (Subtarget.hasShortForwardBranchOpt() || Subtarget.canUseCMOVBranchOpt())
return SDValue();
SDValue TrueVal = N->getOperand(1);
@@ -15066,7 +15069,8 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
return DAG.getNode(RISCVISD::SELECT_CC, DL, N->getValueType(0),
{LHS, RHS, CC, TrueV, FalseV});
- if (!Subtarget.hasShortForwardBranchOpt()) {
+ if (!Subtarget.hasShortForwardBranchOpt() &&
+ !Subtarget.canUseCMOVBranchOpt()) {
// (select c, -1, y) -> -c | y
if (isAllOnesConstant(TrueV)) {
SDValue C = DAG.getSetCC(DL, VT, LHS, RHS, CCVal);
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 1dcff7eb563e20..2ccb40ca3a71cf 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -2646,6 +2646,7 @@ bool RISCVInstrInfo::findCommutedOpIndices(const MachineInstr &MI,
case RISCV::TH_MULSH:
// Operands 2 and 3 are commutable.
return fixCommutedOpIndices(SrcOpIdx1, SrcOpIdx2, 2, 3);
+ case RISCV::PseudoCCMOVGPRNoX0:
case RISCV::PseudoCCMOVGPR:
// Operands 4 and 5 are commutable.
return fixCommutedOpIndices(SrcOpIdx1, SrcOpIdx2, 4, 5);
@@ -2802,6 +2803,7 @@ MachineInstr *RISCVInstrInfo::commuteInstructionImpl(MachineInstr &MI,
return TargetInstrInfo::commuteInstructionImpl(WorkingMI, false, OpIdx1,
OpIdx2);
}
+ case RISCV::PseudoCCMOVGPRNoX0:
case RISCV::PseudoCCMOVGPR: {
// CCMOV can be commuted by inverting the condition.
auto CC = static_cast<RISCVCC::CondCode>(MI.getOperand(3).getImm());
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index edc08187d8f775..f09904c6647bb5 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -1387,6 +1387,24 @@ def PseudoCCMOVGPR : Pseudo<(outs GPR:$dst),
ReadSFBALU, ReadSFBALU]>;
}
+// This should always expand to a branch+c.mv so the size is 6 or 4 if the
+// branch is compressible.
+let Predicates = [CanUseCMOVBranchOpt, NoShortForwardBranchOpt],
+ Constraints = "$dst = $falsev", isCommutable = 1, Size = 6 in {
+// This instruction moves $truev to $dst when the condition is true. It will
+// be expanded to control flow in RISCVExpandPseudoInsts.
+// We use GPRNoX0 because c.mv cannot encode X0.
+def PseudoCCMOVGPRNoX0 : Pseudo<(outs GPRNoX0:$dst),
+ (ins GPR:$lhs, GPR:$rhs, ixlenimm:$cc,
+ GPRNoX0:$falsev, GPRNoX0:$truev),
+ [(set GPRNoX0:$dst,
+ (riscv_selectcc_frag:$cc (XLenVT GPR:$lhs),
+ (XLenVT GPR:$rhs),
+ cond, (XLenVT GPRNoX0:$truev),
+ (XLenVT GPRNoX0:$falsev)))]>,
+ Sched<[WriteCMOV, ReadCMOV, ReadCMOV, ReadCMOV, ReadCMOV]>;
+}
+
// Conditional binops, that updates update $dst to (op rs1, rs2) when condition
// is true. Returns $falsev otherwise. Selected by optimizeSelect.
// TODO: Can we use DefaultOperands on the regular binop to accomplish this more
@@ -1535,7 +1553,7 @@ multiclass SelectCC_GPR_rrirr<DAGOperand valty, ValueType vt> {
(IntCCtoRISCVCC $cc), valty:$truev, valty:$falsev)>;
}
-let Predicates = [NoShortForwardBranchOpt] in
+let Predicates = [NoCMOVBranchOpt, NoShortForwardBranchOpt] in
defm Select_GPR : SelectCC_GPR_rrirr<GPR, XLenVT>;
class SelectCompressOpt<CondCode Cond>
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 16c79519fcacc1..49d4eaec7a0492 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -233,7 +233,8 @@ def SIFIVE_P450 : RISCVProcessorModel<"sifive-p450", NoSchedModel,
FeatureStdExtZba,
FeatureStdExtZbb,
FeatureStdExtZbs,
- FeatureStdExtZfhmin]>;
+ FeatureStdExtZfhmin],
+ [TuneCMOVBranchOpt]>;
def SYNTACORE_SCR1_BASE : RISCVProcessorModel<"syntacore-scr1-base",
SyntacoreSCR1Model,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedRocket.td b/llvm/lib/Target/RISCV/RISCVSchedRocket.td
index bb9dfe5d012409..94f2e65560f81c 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedRocket.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedRocket.td
@@ -248,4 +248,5 @@ defm : UnsupportedSchedZbkx;
defm : UnsupportedSchedZfa;
defm : UnsupportedSchedZfh;
defm : UnsupportedSchedSFB;
+defm : UnsupportedSchedCMOV;
}
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index f531ab2fac8f9f..403866d2c65271 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -1213,4 +1213,5 @@ defm : UnsupportedSchedZbc;
defm : UnsupportedSchedZbkb;
defm : UnsupportedSchedZbkx;
defm : UnsupportedSchedZfa;
+defm : UnsupportedSchedCMOV;
}
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td b/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td
index 06ad2075b07361..69ed92bddb565f 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td
@@ -207,4 +207,5 @@ defm : UnsupportedSchedZbkb;
defm : UnsupportedSchedZbkx;
defm : UnsupportedSchedZfa;
defm : UnsupportedSchedZfh;
+defm : UnsupportedSchedCMOV;
}
diff --git a/llvm/lib/Target/RISCV/RISCVSchedule.td b/llvm/lib/Target/RISCV/RISCVSchedule.td
index f6c1b096ad90c4..540142e1c97c2c 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedule.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedule.td
@@ -112,6 +112,10 @@ def WriteFST16 : SchedWrite; // Floating point sp store
def WriteFST32 : SchedWrite; // Floating point sp store
def WriteFST64 : SchedWrite; // Floating point dp store
+// CMOV for sifive-p450.
+def WriteCMOV : SchedWrite;
+def ReadCMOV : SchedRead;
+
// short forward branch for Bullet
def WriteSFB : SchedWrite;
def ReadSFBJmp : SchedRead;
@@ -256,6 +260,14 @@ def : ReadAdvance<ReadSFBALU, 0>;
} // Unsupported = true
}
+multiclass UnsupportedSchedCMOV {
+let Unsupported = true in {
+def : WriteRes<WriteCMOV, []>;
+
+def : ReadAdvance<ReadCMOV, 0>;
+} // Unsupported = true
+}
+
multiclass UnsupportedSchedZfa {
let Unsupported = true in {
def : WriteRes<WriteFRoundF16, []>;
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index 7540218633bfcb..cc41be53dd2fd9 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -150,6 +150,12 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
bool hasHalfFPLoadStoreMove() const {
return HasStdExtZfhmin || HasStdExtZfbfmin;
}
+
+ bool canUseCMOVBranchOpt() const {
+ // Can only predicate c.mv so requires the C or Zca extensions.
+ return HasCMOVBranchOpt && hasStdExtCOrZca();
+ }
+
bool is64Bit() const { return IsRV64; }
MVT getXLenVT() const {
return is64Bit() ? MVT::i64 : MVT::i32;
diff --git a/llvm/test/CodeGen/RISCV/cmov-branch-opt.ll b/llvm/test/CodeGen/RISCV/cmov-branch-opt.ll
new file mode 100644
index 00000000000000..b48b4e0d1a3b83
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/cmov-branch-opt.ll
@@ -0,0 +1,461 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefix=NOCMOV %s
+; RUN: llc -mtriple=riscv64 -mattr=+cmov-branch-opt,+c -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=CMOV,CMOV-NOZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+cmov-branch-opt,+c,+experimental-zicond -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=CMOV,CMOV-ZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+short-forward-branch-opt -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=SHORT_FORWARD,SFB-NOZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+short-forward-branch-opt,+c -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=SHORT_FORWARD,SFB-NOZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+short-forward-branch-opt,+experimental-zicond -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefixes=SHORT_FORWARD,SFB-ZICOND %s
+
+; The conditional move optimization in sifive-p450 requires that only a
+; single c.mv instruction appears in the branch shadow.
+
+; The sifive-7-series can predicate an xor.
+
+define signext i32 @test1(i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test1:
+; NOCMOV: # %bb.0:
+; NOCMOV-NEXT: snez a2, a2
+; NOCMOV-NEXT: addi a2, a2, -1
+; NOCMOV-NEXT: and a1, a1, a2
+; NOCMOV-NEXT: xor a0, a0, a1
+; NOCMOV-NEXT: ret
+;
+; CMOV-LABEL: test1:
+; CMOV: # %bb.0:
+; CMOV-NEXT: xor a1, a1, a0
+; CMOV-NEXT: bnez a2, .LBB0_2
+; CMOV-NEXT: # %bb.1:
+; CMOV-NEXT: mv a0, a1
+; CMOV-NEXT: .LBB0_2:
+; CMOV-NEXT: ret
+;
+; SHORT_FORWARD-LABEL: test1:
+; SHORT_FORWARD: # %bb.0:
+; SHORT_FORWARD-NEXT: bnez a2, .LBB0_2
+; SHORT_FORWARD-NEXT: # %bb.1:
+; SHORT_FORWARD-NEXT: xor a0, a0, a1
+; SHORT_FORWARD-NEXT: .LBB0_2:
+; SHORT_FORWARD-NEXT: ret
+ %c = icmp eq i32 %z, 0
+ %a = xor i32 %x, %y
+ %b = select i1 %c, i32 %a, i32 %x
+ ret i32 %b
+}
+
+define signext i32 @test2(i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test2:
+; NOCMOV: # %bb.0:
+; NOCMOV-NEXT: seqz a2, a2
+; NOCMOV-NEXT: addi a2, a2, -1
+; NOCMOV-NEXT: and a1, a1, a2
+; NOCMOV-NEXT: xor a0, a0, a1
+; NOCMOV-NEXT: ret
+;
+; CMOV-LABEL: test2:
+; CMOV: # %bb.0:
+; CMOV-NEXT: xor a1, a1, a0
+; CMOV-NEXT: beqz a2, .LBB1_2
+; CMOV-NEXT: # %bb.1:
+; CMOV-NEXT: mv a0, a1
+; CMOV-NEXT: .LBB1_2:
+; CMOV-NEXT: ret
+;
+; SHORT_FORWARD-LABEL: test2:
+; SHORT_FORWARD: # %bb.0:
+; SHORT_FORWARD-NEXT: beqz a2, .LBB1_2
+; SHORT_FORWARD-NEXT: # %bb.1:
+; SHORT_FORWARD-NEXT: xor a0, a0, a1
+; SHORT_FORWARD-NEXT: .LBB1_2:
+; SHORT_FORWARD-NEXT: ret
+ %c = icmp eq i32 %z, 0
+ %a = xor i32 %x, %y
+ %b = select i1 %c, i32 %x, i32 %a
+ ret i32 %b
+}
+
+; Make sure we don't share the same basic block for two selects with the same
+; condition.
+define signext i32 @test3(i32 signext %v, i32 signext %w, i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test3:
+; NOCMOV: # %bb.0:
+; NOCMOV-NEXT: seqz a4, a4
+; NOCMOV-NEXT: addi a4, a4, -1
+; NOCMOV-NEXT: and a1, a1, a4
+; NOCMOV-NEXT: xor a0, a0, a1
+; NOCMOV-NEXT: and a3, a3, a4
+; NOCMOV-NEXT: xor a2, a2, a3
+; NOCMOV-NEXT: addw a0, a0, a2
+; NOCMOV-NEXT: ret
+;
+; CMOV-LABEL: test3:
+; CMOV: # %bb.0:
+; CMOV-NEXT: xor a1, a1, a0
+; CMOV-NEXT: bnez a4, .LBB2_2
+; CMOV-NEXT: # %bb.1:
+; CMOV-NEXT: mv a1, a0
+; CMOV-NEXT: .LBB2_2:
+; CMOV-NEXT: xor a0, a2, a3
+; CMOV-NEXT: bnez a4, .LBB2_4
+; CMOV-NEXT: # %bb.3:
+; CMOV-NEXT: mv a0, a2
+; CMOV-NEXT: .LBB2_4:
+; CMOV-NEXT: addw a0, a0, a1
+; CMOV-NEXT: ret
+;
+; SHORT_FORWARD-LABEL: test3:
+; SHORT_FORWARD: # %bb.0:
+; SHORT_FORWARD-NEXT: beqz a4, .LBB2_2
+; SHORT_FORWARD-NEXT: # %bb.1:
+; SHORT_FORWARD-NEXT: xor a0, a0, a1
+; SHORT_FORWARD-NEXT: .LBB2_2:
+; SHORT_FORWARD-NEXT: beqz a4, .LBB2_4
+; SHORT_FORWARD-NEXT: # %bb.3:
+; SHORT_FORWARD-NEXT: xor a2, a2, a3
+; SHORT_FORWARD-NEXT: .LBB2_4:
+; SHORT_FORWARD-NEXT: addw a0, a0, a2
+; SHORT_FORWARD-NEXT: ret
+ %c = icmp eq i32 %z, 0
+ %a = xor i32 %v, %w
+ %b = select i1 %c, i32 %v, i32 %a
+ %d = xor i32 %x, %y
+ %e = select i1 %c, i32 %x, i32 %d
+ %f = add i32 %b, %e
+ ret i32 %f
+}
+
+define signext i32 @test4(i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test4:
+; NOCMOV: # %bb.0:
+; NOCMOV-NEXT: snez a0, a2
+; NOCMOV-NEXT: addi a0, a0, -1
+; NOCMOV-NEXT: andi a0, a0, 3
+; NOCMOV-NEXT: ret
+;
+; CMOV-NOZICOND-LABEL: test4:
+; CMOV-NOZICOND: # %bb.0:
+; CMOV-NOZICOND-NEXT: li a1, 0
+; CMOV-NOZICOND-NEXT: li a0, 3
+; CMOV-NOZICOND-NEXT: beqz a2, .LBB3_2
+; CMOV-NOZICOND-NEXT: # %bb.1:
+; CMOV-NOZICOND-NEXT: mv a0, a1
+; CMOV-NOZICOND-NEXT: .LBB3_2:
+; CMOV-NOZICOND-NEXT: ret
+;
+; CMOV-ZICOND-LABEL: test4:
+; CMOV-ZICOND: # %bb.0:
+; CMOV-ZICOND-NEXT: li a0, 3
+; CMOV-ZICOND-NEXT: czero.nez a0, a0, a2
+; CMOV-ZICOND-NEXT: ret
+;
+; SFB-NOZICOND-LABEL: test4:
+; SFB-NOZICOND: # %bb.0:
+; SFB-NOZICOND-NEXT: li a0, 3
+; SFB-NOZICOND-NEXT: beqz a2, .LBB3_2
+; SFB-NOZICOND-NEXT: # %bb.1:
+; SFB-NOZICOND-NEXT: li a0, 0
+; SFB-NOZICOND-NEXT: .LBB3_2:
+; SFB-NOZICOND-NEXT: ret
+;
+; SFB-ZICOND-LABEL: test4:
+; SFB-ZICOND: # %bb.0:
+; SFB-ZICOND-NEXT: li a0, 3
+; SFB-ZICOND-NEXT: czero.nez a0, a0, a2
+; SFB-ZICOND-NEXT: ret
+ %c = icmp eq i32 %z, 0
+ %a = select i1 %c, i32 3, i32 0
+ ret i32 %a
+}
+
+define i16 @select_xor_1(i16 %A, i8 %cond) {
+; NOCMOV-LABEL: select_xor_1:
+; NOCMOV: # %bb.0: # %entry
+; NOCMOV-NEXT: slli a1, a1, 63
+; NOCMOV-NEXT: srai a1, a1, 63
+; NOCMOV-NEXT: andi a1, a1, 43
+; NOCMOV-NEXT: xor a0, a0, a1
+; NOCMOV-NEXT: ret
+;
+; CMOV-LABEL: select_xor_1:
+; CMOV: # %bb.0: # %entry
+; CMOV-NEXT: andi a1, a1, 1
+; CMOV-NEXT: xori a2, a0, 43
+; CMOV-NEXT: beqz a1, .LBB4_2
+; CMOV-NEXT: # %bb.1: # %entry
+; CMOV-NEXT: mv a0, a2
+; CMOV-NEXT: .LBB4_2: # %entry
+; CMOV-NEXT: ret
+;
+; SHORT_FORWARD-LABEL: select_xor_1:
+; SHORT_FORWARD: # %bb.0: # %entry
+; SHORT_FORWARD-NEXT: andi a1, a1, 1
+; SHORT_FORWARD-NEXT: beqz a1, .LBB4_2
+; SHORT_FORWARD-NEXT: # %bb.1: # %entry
+; SHORT_FORWARD-NEXT: xori a0, a0, 43
+; SHORT_FORWARD-NEXT: .LBB4_2: # %entry
+; SHORT_FORWARD-NEXT: ret
+entry:
+ %and = and i8 %cond, 1
+ %cmp10 = icmp eq i8 %and, 0
+ %0 = xor i16 %A, 43
+ %1 = select i1 %cmp10, i16 %A, i16 %0
+ ret i16 %1
+}
+
+; Equivalent to above, but with icmp ne (and %cond, 1), 1 instead of
+; icmp eq (and %cond, 1), 0
+define i16 @select_xor_1b(i16 %A, i8 %cond) {
+; NOCMOV-LABEL: select_xor_1b:
+; NOCMOV: # %bb.0: # %entry
+; NOCMOV-NEXT: slli a1, a1, 63
+; NOCMOV-NEXT: srai a1, a1, 63
+; NOCMOV-NEXT: andi a1, a1, 43
+; NOCMOV-NEXT: xor a0, a0, a1
+; NOCMOV-NEXT: ret
+;
+; CMOV-LABEL: select_xor_1b:
+; CMOV: # %bb.0: # %entry
+; CMOV-NEXT: andi a1, a1, 1
+; CMOV-NEXT: xori a2, a0, 43
+; CMOV-NEXT: beqz a1, .LBB5_2
+; CMOV-NEXT: # %bb.1: # %entry
+; CMOV-NEXT: mv a0, a2
+; CMOV-NEXT: .LBB5_2: # %entry
+; CMOV-NEXT: ret
+;
+; SHORT_FORWARD-LABEL: select_xor_1b:
+; SHORT_FORWARD: # %bb.0: # %entry
+; SHORT_FORWARD-NEXT: andi a1, a1, 1
+; SHORT_FORWARD-NEXT: beqz a1, .LBB5_2
+; SHORT_FORWARD-NEXT: # %bb.1: # %entry
+; SHORT_FORWARD-NEXT: xori a0, a0, 43
+; SHORT_FORWARD-NEXT: .LBB5_2: # %entry
+; SHORT_FORWARD-NEXT: ret
+entry:
+ %and = and i8 %cond, 1
+ %cmp10 = icmp ne i8 %and, 1
+ %0 = xor i16 %A, 43
+ %1 = select i1 %cmp10, i16 %A, i16 %0
+ ret i16 %1
+}
+
+define i32 @select_xor_2(i32 %A, i32 %B, i8 %cond) {
+; NOCMOV-LABEL: select_xor_2:
+; NOCMOV: # %bb.0: # %entry
+; NOCMOV-NEXT: slli a2, a2, 63
+; NOCMOV-NEXT: srai a2, a2, 63
+; NOCMOV-NEXT: and a1, a1, a2
+; NOCMOV-NEXT: xor a0, a0, a1
+; NOCMOV-NEXT: ret
+;
+; CMOV-LABEL: select_xor_2:
+; CMOV: # %bb.0: # %entry
+; CMOV-NEXT: andi a2, a2, 1
+; CMOV-NEXT: xor a1, a1, a0
+; CMOV-NEXT: beqz a2, .LBB6_2
+; CMOV-NEXT: # %bb.1: # %entry
+; CMOV-NEXT: mv a0, a1
+; CMOV-NEXT: .LBB6_2: # %entry
+; CMOV-NEXT: ret
+;
+; SFB-ZICOND-LABEL: select_xor_2:
+; SFB-ZICOND: # %bb.0: # %entry
+; SFB-ZICOND-NEXT: andi a2, a2, 1
+; SFB-ZICOND-NEXT: beqz a2, .LBB6_2
+; SFB-ZICOND-NEXT: # %bb.1: # %entry
+; SFB-ZICOND-NEXT: xor a0, a1, a0
+; SFB-ZICOND-NEXT: .LBB6_2: # %entry
+; SFB-ZICOND-NEXT: ret
+entry:
+ %and = and i8 %cond, 1
+ %cmp10 = icmp eq i8 %and, 0
+ %0 = xor i32 %B, %A
+ %1 = select i1 %cmp10, i32 %A, i32 %0
+ ret i32 %1
+}
+
+; Equivalent to above, but with icmp ne (and %cond, 1), 1 instead of
+; icmp eq (and %cond, 1), 0
+define i32 @select_xor_2b(i32 %A, i32 %B, i8 %cond) {
+; NO...
[truncated]
|
// This should always expand to a branch+c.mv so the size is 6 or 4 if the | ||
// branch is compressible. | ||
let Predicates = [CanUseCMOVBranchOpt, NoShortForwardBranchOpt], | ||
Constraints = "$dst = $falsev", isCommutable = 1, Size = 6 in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want the isSelect = 1 from the other version? Or was that an intentional omission?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's intentional. isSelect enables the ALU op folding code which we don't want.
@@ -6908,7 +6908,8 @@ static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG, | |||
MVT VT = N->getSimpleValueType(0); | |||
SDLoc DL(N); | |||
|
|||
if (!Subtarget.hasShortForwardBranchOpt()) { | |||
if (!Subtarget.hasShortForwardBranchOpt() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that all machines with ShortForwardBranch also implement the new fusion? If so, we could adjust the flag checks here a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern is a subset of ShortForwardBranch so yes. Would need to make ShortForwardBranch imply TuneCMOVBranchOpt in tablegen which I think is doable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually its more complicated because ShortForwardBranch can fuse mv
or c.mv
but CMOVBranchOpt requires C. So I don't think the dependency alone would be enough, but we could have a single subtarget function to wrap this.
@@ -996,6 +996,12 @@ def TuneShortForwardBranchOpt | |||
def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">; | |||
def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">; | |||
|
|||
def TuneCMOVBranchOpt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is hard for me to parse at a glance. A couple ideas:
- TuneCMovFusion
- TuneShortForwardBranchCMovOnly
@@ -112,6 +112,10 @@ def WriteFST16 : SchedWrite; // Floating point sp store | |||
def WriteFST32 : SchedWrite; // Floating point sp store | |||
def WriteFST64 : SchedWrite; // Floating point dp store | |||
|
|||
// CMOV for sifive-p450. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intrusive, can we just use InstRW
to override the resources in sifive-p450 schedule model? Or is it possible that we may reuse WriteCMOV
/ReadCMOV
someday? For example, Zbt
is back to Zb*
extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is having a WriteCMOV
and a ReadCMOV
intrusive? How are these Sched<Read|Write>
different than any other Sched<Read|Write>
? I understand that not every CPU may use the CMOV instructions, but this is the case for CPUs that do not use vector instructions.
It is my understanding that if a CPU does not implement vector instructions, then it doesn't need to specify behavior of WriteRes and ReadAdvance for vector instructions (UnsupportedV), as this is the case in the RocketModel
. It would be the same thing for CMOV instructions.
I think WriteCMOV and ReadCMOV are minimally invasive. It is my goal to use InstRW
as little as possible in the SchedModels -- SchedRead and SchedWrite aim to replace InstRW in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, your thought makes sense to me. My concern is that we won't have CMOV in the near future and we need a UnsupportedCMOV
for almost all new schedule models (except SiFive's), which is already the situation for UnsupportedSchedSFB
. This is because SFB and this CMOV are only for vendor extensions/features and my thought/strategy is that we should seperate vendor-specific parts from standard parts.
Actually, I think there should be a RISCVInstrInfoSFB.td
which is split out from RISCVInstrInfo.td
.
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.
sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series.
For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.