[RISCV] Add branch+c.mv macrofusion for sifive-p450. #76169

topperc · 2023-12-21T17:16:19Z

sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series.

For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.

sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.

llvmbot · 2023-12-21T17:16:48Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series.

For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.

Patch is 24.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76169.diff

12 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp (+3-1)
(modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+6)
(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+9-5)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+2)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+19-1)
(modified) llvm/lib/Target/RISCV/RISCVProcessors.td (+2-1)
(modified) llvm/lib/Target/RISCV/RISCVSchedRocket.td (+1)
(modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+1)
(modified) llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td (+1)
(modified) llvm/lib/Target/RISCV/RISCVSchedule.td (+12)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+6)
(added) llvm/test/CodeGen/RISCV/cmov-branch-opt.ll (+461)

diff --git a/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp b/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
index 24a13f93af880e..a39f0671a6dc28 100644
--- a/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
@@ -109,6 +109,7 @@ bool RISCVExpandPseudo::expandMI(MachineBasicBlock &MBB,
     return expandRV32ZdinxStore(MBB, MBBI);
   case RISCV::PseudoRV32ZdinxLD:
     return expandRV32ZdinxLoad(MBB, MBBI);
+  case RISCV::PseudoCCMOVGPRNoX0:
   case RISCV::PseudoCCMOVGPR:
   case RISCV::PseudoCCADD:
   case RISCV::PseudoCCSUB:
@@ -191,7 +192,8 @@ bool RISCVExpandPseudo::expandCCOp(MachineBasicBlock &MBB,
   Register DestReg = MI.getOperand(0).getReg();
   assert(MI.getOperand(4).getReg() == DestReg);
 
-  if (MI.getOpcode() == RISCV::PseudoCCMOVGPR) {
+  if (MI.getOpcode() == RISCV::PseudoCCMOVGPR ||
+      MI.getOpcode() == RISCV::PseudoCCMOVGPRNoX0) {
     // Add MV.
     BuildMI(TrueBB, DL, TII->get(RISCV::ADDI), DestReg)
         .add(MI.getOperand(5))
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 2095446c694bde..f02413b27a8d17 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -996,6 +996,12 @@ def TuneShortForwardBranchOpt
 def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
 def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;
 
+def TuneCMOVBranchOpt
+    : SubtargetFeature<"cmov-branch-opt", "HasCMOVBranchOpt",
+                       "true", "Enable branch+c.mv optimization">;
+def CanUseCMOVBranchOpt : Predicate<"Subtarget->canUseCMOVBranchOpt()">;
+def NoCMOVBranchOpt : Predicate<"!Subtarget->canUseCMOVBranchOpt()">;
+
 def TuneSiFive7 : SubtargetFeature<"sifive7", "RISCVProcFamily", "SiFive7",
                                    "SiFive 7-Series processors",
                                    [TuneNoDefaultUnroll,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 97d76ca494cbee..4fc84dfa9c141b 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -6908,7 +6908,8 @@ static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG,
   MVT VT = N->getSimpleValueType(0);
   SDLoc DL(N);
 
-  if (!Subtarget.hasShortForwardBranchOpt()) {
+  if (!Subtarget.hasShortForwardBranchOpt() &&
+      !Subtarget.canUseCMOVBranchOpt()) {
     // (select c, -1, y) -> -c | y
     if (isAllOnesConstant(TrueV)) {
       SDValue Neg = DAG.getNegative(CondV, DL, VT);
@@ -7072,7 +7073,8 @@ SDValue RISCVTargetLowering::lowerSELECT(SDValue Op, SelectionDAG &DAG) const {
 
     // (select c, t, f) -> (or (czero_eqz t, c), (czero_nez f, c))
     // Unless we have the short forward branch optimization.
-    if (!Subtarget.hasShortForwardBranchOpt())
+    if (!Subtarget.hasShortForwardBranchOpt() &&
+        !Subtarget.canUseCMOVBranchOpt())
       return DAG.getNode(
           ISD::OR, DL, VT,
           DAG.getNode(RISCVISD::CZERO_EQZ, DL, VT, TrueV, CondV),
@@ -12099,7 +12101,8 @@ static SDValue combineSelectAndUse(SDNode *N, SDValue Slct, SDValue OtherOp,
   if (VT.isVector())
     return SDValue();
 
-  if (!Subtarget.hasShortForwardBranchOpt()) {
+  if (!Subtarget.hasShortForwardBranchOpt() &&
+      !Subtarget.canUseCMOVBranchOpt()) {
     // (select cond, x, (and x, c)) has custom lowering with Zicond.
     if ((!Subtarget.hasStdExtZicond() &&
          !Subtarget.hasVendorXVentanaCondOps()) ||
@@ -14328,7 +14331,7 @@ static SDValue performSELECTCombine(SDNode *N, SelectionDAG &DAG,
   if (SDValue V = useInversedSetcc(N, DAG, Subtarget))
     return V;
 
-  if (Subtarget.hasShortForwardBranchOpt())
+  if (Subtarget.hasShortForwardBranchOpt() || Subtarget.canUseCMOVBranchOpt())
     return SDValue();
 
   SDValue TrueVal = N->getOperand(1);
@@ -15066,7 +15069,8 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
       return DAG.getNode(RISCVISD::SELECT_CC, DL, N->getValueType(0),
                          {LHS, RHS, CC, TrueV, FalseV});
 
-    if (!Subtarget.hasShortForwardBranchOpt()) {
+    if (!Subtarget.hasShortForwardBranchOpt() &&
+        !Subtarget.canUseCMOVBranchOpt()) {
       // (select c, -1, y) -> -c | y
       if (isAllOnesConstant(TrueV)) {
         SDValue C = DAG.getSetCC(DL, VT, LHS, RHS, CCVal);
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 1dcff7eb563e20..2ccb40ca3a71cf 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -2646,6 +2646,7 @@ bool RISCVInstrInfo::findCommutedOpIndices(const MachineInstr &MI,
   case RISCV::TH_MULSH:
     // Operands 2 and 3 are commutable.
     return fixCommutedOpIndices(SrcOpIdx1, SrcOpIdx2, 2, 3);
+  case RISCV::PseudoCCMOVGPRNoX0:
   case RISCV::PseudoCCMOVGPR:
     // Operands 4 and 5 are commutable.
     return fixCommutedOpIndices(SrcOpIdx1, SrcOpIdx2, 4, 5);
@@ -2802,6 +2803,7 @@ MachineInstr *RISCVInstrInfo::commuteInstructionImpl(MachineInstr &MI,
     return TargetInstrInfo::commuteInstructionImpl(WorkingMI, false, OpIdx1,
                                                    OpIdx2);
   }
+  case RISCV::PseudoCCMOVGPRNoX0:
   case RISCV::PseudoCCMOVGPR: {
     // CCMOV can be commuted by inverting the condition.
     auto CC = static_cast<RISCVCC::CondCode>(MI.getOperand(3).getImm());
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index edc08187d8f775..f09904c6647bb5 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -1387,6 +1387,24 @@ def PseudoCCMOVGPR : Pseudo<(outs GPR:$dst),
                             ReadSFBALU, ReadSFBALU]>;
 }
 
+// This should always expand to a branch+c.mv so the size is 6 or 4 if the
+// branch is compressible.
+let Predicates = [CanUseCMOVBranchOpt, NoShortForwardBranchOpt],
+    Constraints = "$dst = $falsev", isCommutable = 1, Size = 6 in {
+// This instruction moves $truev to $dst when the condition is true. It will
+// be expanded to control flow in RISCVExpandPseudoInsts.
+// We use GPRNoX0 because c.mv cannot encode X0.
+def PseudoCCMOVGPRNoX0 : Pseudo<(outs GPRNoX0:$dst),
+                                (ins GPR:$lhs, GPR:$rhs, ixlenimm:$cc,
+                                 GPRNoX0:$falsev, GPRNoX0:$truev),
+                                [(set GPRNoX0:$dst,
+                                  (riscv_selectcc_frag:$cc (XLenVT GPR:$lhs),
+                                                           (XLenVT GPR:$rhs),
+                                                           cond, (XLenVT GPRNoX0:$truev),
+                                                           (XLenVT GPRNoX0:$falsev)))]>,
+                         Sched<[WriteCMOV, ReadCMOV, ReadCMOV, ReadCMOV, ReadCMOV]>;
+}
+
 // Conditional binops, that updates update $dst to (op rs1, rs2) when condition
 // is true. Returns $falsev otherwise. Selected by optimizeSelect.
 // TODO: Can we use DefaultOperands on the regular binop to accomplish this more
@@ -1535,7 +1553,7 @@ multiclass SelectCC_GPR_rrirr<DAGOperand valty, ValueType vt> {
              (IntCCtoRISCVCC $cc), valty:$truev, valty:$falsev)>;
 }
 
-let Predicates = [NoShortForwardBranchOpt] in
+let Predicates = [NoCMOVBranchOpt, NoShortForwardBranchOpt] in
 defm Select_GPR : SelectCC_GPR_rrirr<GPR, XLenVT>;
 
 class SelectCompressOpt<CondCode Cond>
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 16c79519fcacc1..49d4eaec7a0492 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -233,7 +233,8 @@ def SIFIVE_P450 : RISCVProcessorModel<"sifive-p450", NoSchedModel,
                                        FeatureStdExtZba,
                                        FeatureStdExtZbb,
                                        FeatureStdExtZbs,
-                                       FeatureStdExtZfhmin]>;
+                                       FeatureStdExtZfhmin],
+                                      [TuneCMOVBranchOpt]>;
 
 def SYNTACORE_SCR1_BASE : RISCVProcessorModel<"syntacore-scr1-base",
                                               SyntacoreSCR1Model,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedRocket.td b/llvm/lib/Target/RISCV/RISCVSchedRocket.td
index bb9dfe5d012409..94f2e65560f81c 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedRocket.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedRocket.td
@@ -248,4 +248,5 @@ defm : UnsupportedSchedZbkx;
 defm : UnsupportedSchedZfa;
 defm : UnsupportedSchedZfh;
 defm : UnsupportedSchedSFB;
+defm : UnsupportedSchedCMOV;
 }
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index f531ab2fac8f9f..403866d2c65271 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -1213,4 +1213,5 @@ defm : UnsupportedSchedZbc;
 defm : UnsupportedSchedZbkb;
 defm : UnsupportedSchedZbkx;
 defm : UnsupportedSchedZfa;
+defm : UnsupportedSchedCMOV;
 }
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td b/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td
index 06ad2075b07361..69ed92bddb565f 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSyntacoreSCR1.td
@@ -207,4 +207,5 @@ defm : UnsupportedSchedZbkb;
 defm : UnsupportedSchedZbkx;
 defm : UnsupportedSchedZfa;
 defm : UnsupportedSchedZfh;
+defm : UnsupportedSchedCMOV;
 }
diff --git a/llvm/lib/Target/RISCV/RISCVSchedule.td b/llvm/lib/Target/RISCV/RISCVSchedule.td
index f6c1b096ad90c4..540142e1c97c2c 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedule.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedule.td
@@ -112,6 +112,10 @@ def WriteFST16        : SchedWrite;    // Floating point sp store
 def WriteFST32        : SchedWrite;    // Floating point sp store
 def WriteFST64        : SchedWrite;    // Floating point dp store
 
+// CMOV for sifive-p450.
+def WriteCMOV       : SchedWrite;
+def ReadCMOV        : SchedRead;
+
 // short forward branch for Bullet
 def WriteSFB        : SchedWrite;
 def ReadSFBJmp      : SchedRead;
@@ -256,6 +260,14 @@ def : ReadAdvance<ReadSFBALU, 0>;
 } // Unsupported = true
 }
 
+multiclass UnsupportedSchedCMOV {
+let Unsupported = true in {
+def : WriteRes<WriteCMOV, []>;
+
+def : ReadAdvance<ReadCMOV, 0>;
+} // Unsupported = true
+}
+
 multiclass UnsupportedSchedZfa {
 let Unsupported = true in {
 def : WriteRes<WriteFRoundF16, []>;
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index 7540218633bfcb..cc41be53dd2fd9 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -150,6 +150,12 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   bool hasHalfFPLoadStoreMove() const {
     return HasStdExtZfhmin || HasStdExtZfbfmin;
   }
+
+  bool canUseCMOVBranchOpt() const {
+    // Can only predicate c.mv so requires the C or Zca extensions.
+    return HasCMOVBranchOpt && hasStdExtCOrZca();
+  }
+
   bool is64Bit() const { return IsRV64; }
   MVT getXLenVT() const {
     return is64Bit() ? MVT::i64 : MVT::i32;
diff --git a/llvm/test/CodeGen/RISCV/cmov-branch-opt.ll b/llvm/test/CodeGen/RISCV/cmov-branch-opt.ll
new file mode 100644
index 00000000000000..b48b4e0d1a3b83
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/cmov-branch-opt.ll
@@ -0,0 +1,461 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefix=NOCMOV %s
+; RUN: llc -mtriple=riscv64 -mattr=+cmov-branch-opt,+c -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=CMOV,CMOV-NOZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+cmov-branch-opt,+c,+experimental-zicond -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=CMOV,CMOV-ZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+short-forward-branch-opt -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=SHORT_FORWARD,SFB-NOZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+short-forward-branch-opt,+c -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=SHORT_FORWARD,SFB-NOZICOND %s
+; RUN: llc -mtriple=riscv64 -mattr=+short-forward-branch-opt,+experimental-zicond -verify-machineinstrs < %s \
+; RUN:   | FileCheck -check-prefixes=SHORT_FORWARD,SFB-ZICOND %s
+
+; The conditional move optimization in sifive-p450 requires that only a
+; single c.mv instruction appears in the branch shadow.
+
+; The sifive-7-series can predicate an xor.
+
+define signext i32 @test1(i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test1:
+; NOCMOV:       # %bb.0:
+; NOCMOV-NEXT:    snez a2, a2
+; NOCMOV-NEXT:    addi a2, a2, -1
+; NOCMOV-NEXT:    and a1, a1, a2
+; NOCMOV-NEXT:    xor a0, a0, a1
+; NOCMOV-NEXT:    ret
+;
+; CMOV-LABEL: test1:
+; CMOV:       # %bb.0:
+; CMOV-NEXT:    xor a1, a1, a0
+; CMOV-NEXT:    bnez a2, .LBB0_2
+; CMOV-NEXT:  # %bb.1:
+; CMOV-NEXT:    mv a0, a1
+; CMOV-NEXT:  .LBB0_2:
+; CMOV-NEXT:    ret
+;
+; SHORT_FORWARD-LABEL: test1:
+; SHORT_FORWARD:       # %bb.0:
+; SHORT_FORWARD-NEXT:    bnez a2, .LBB0_2
+; SHORT_FORWARD-NEXT:  # %bb.1:
+; SHORT_FORWARD-NEXT:    xor a0, a0, a1
+; SHORT_FORWARD-NEXT:  .LBB0_2:
+; SHORT_FORWARD-NEXT:    ret
+  %c = icmp eq i32 %z, 0
+  %a = xor i32 %x, %y
+  %b = select i1 %c, i32 %a, i32 %x
+  ret i32 %b
+}
+
+define signext i32 @test2(i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test2:
+; NOCMOV:       # %bb.0:
+; NOCMOV-NEXT:    seqz a2, a2
+; NOCMOV-NEXT:    addi a2, a2, -1
+; NOCMOV-NEXT:    and a1, a1, a2
+; NOCMOV-NEXT:    xor a0, a0, a1
+; NOCMOV-NEXT:    ret
+;
+; CMOV-LABEL: test2:
+; CMOV:       # %bb.0:
+; CMOV-NEXT:    xor a1, a1, a0
+; CMOV-NEXT:    beqz a2, .LBB1_2
+; CMOV-NEXT:  # %bb.1:
+; CMOV-NEXT:    mv a0, a1
+; CMOV-NEXT:  .LBB1_2:
+; CMOV-NEXT:    ret
+;
+; SHORT_FORWARD-LABEL: test2:
+; SHORT_FORWARD:       # %bb.0:
+; SHORT_FORWARD-NEXT:    beqz a2, .LBB1_2
+; SHORT_FORWARD-NEXT:  # %bb.1:
+; SHORT_FORWARD-NEXT:    xor a0, a0, a1
+; SHORT_FORWARD-NEXT:  .LBB1_2:
+; SHORT_FORWARD-NEXT:    ret
+  %c = icmp eq i32 %z, 0
+  %a = xor i32 %x, %y
+  %b = select i1 %c, i32 %x, i32 %a
+  ret i32 %b
+}
+
+; Make sure we don't share the same basic block for two selects with the same
+; condition.
+define signext i32 @test3(i32 signext %v, i32 signext %w, i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test3:
+; NOCMOV:       # %bb.0:
+; NOCMOV-NEXT:    seqz a4, a4
+; NOCMOV-NEXT:    addi a4, a4, -1
+; NOCMOV-NEXT:    and a1, a1, a4
+; NOCMOV-NEXT:    xor a0, a0, a1
+; NOCMOV-NEXT:    and a3, a3, a4
+; NOCMOV-NEXT:    xor a2, a2, a3
+; NOCMOV-NEXT:    addw a0, a0, a2
+; NOCMOV-NEXT:    ret
+;
+; CMOV-LABEL: test3:
+; CMOV:       # %bb.0:
+; CMOV-NEXT:    xor a1, a1, a0
+; CMOV-NEXT:    bnez a4, .LBB2_2
+; CMOV-NEXT:  # %bb.1:
+; CMOV-NEXT:    mv a1, a0
+; CMOV-NEXT:  .LBB2_2:
+; CMOV-NEXT:    xor a0, a2, a3
+; CMOV-NEXT:    bnez a4, .LBB2_4
+; CMOV-NEXT:  # %bb.3:
+; CMOV-NEXT:    mv a0, a2
+; CMOV-NEXT:  .LBB2_4:
+; CMOV-NEXT:    addw a0, a0, a1
+; CMOV-NEXT:    ret
+;
+; SHORT_FORWARD-LABEL: test3:
+; SHORT_FORWARD:       # %bb.0:
+; SHORT_FORWARD-NEXT:    beqz a4, .LBB2_2
+; SHORT_FORWARD-NEXT:  # %bb.1:
+; SHORT_FORWARD-NEXT:    xor a0, a0, a1
+; SHORT_FORWARD-NEXT:  .LBB2_2:
+; SHORT_FORWARD-NEXT:    beqz a4, .LBB2_4
+; SHORT_FORWARD-NEXT:  # %bb.3:
+; SHORT_FORWARD-NEXT:    xor a2, a2, a3
+; SHORT_FORWARD-NEXT:  .LBB2_4:
+; SHORT_FORWARD-NEXT:    addw a0, a0, a2
+; SHORT_FORWARD-NEXT:    ret
+  %c = icmp eq i32 %z, 0
+  %a = xor i32 %v, %w
+  %b = select i1 %c, i32 %v, i32 %a
+  %d = xor i32 %x, %y
+  %e = select i1 %c, i32 %x, i32 %d
+  %f = add i32 %b, %e
+  ret i32 %f
+}
+
+define signext i32 @test4(i32 signext %x, i32 signext %y, i32 signext %z) {
+; NOCMOV-LABEL: test4:
+; NOCMOV:       # %bb.0:
+; NOCMOV-NEXT:    snez a0, a2
+; NOCMOV-NEXT:    addi a0, a0, -1
+; NOCMOV-NEXT:    andi a0, a0, 3
+; NOCMOV-NEXT:    ret
+;
+; CMOV-NOZICOND-LABEL: test4:
+; CMOV-NOZICOND:       # %bb.0:
+; CMOV-NOZICOND-NEXT:    li a1, 0
+; CMOV-NOZICOND-NEXT:    li a0, 3
+; CMOV-NOZICOND-NEXT:    beqz a2, .LBB3_2
+; CMOV-NOZICOND-NEXT:  # %bb.1:
+; CMOV-NOZICOND-NEXT:    mv a0, a1
+; CMOV-NOZICOND-NEXT:  .LBB3_2:
+; CMOV-NOZICOND-NEXT:    ret
+;
+; CMOV-ZICOND-LABEL: test4:
+; CMOV-ZICOND:       # %bb.0:
+; CMOV-ZICOND-NEXT:    li a0, 3
+; CMOV-ZICOND-NEXT:    czero.nez a0, a0, a2
+; CMOV-ZICOND-NEXT:    ret
+;
+; SFB-NOZICOND-LABEL: test4:
+; SFB-NOZICOND:       # %bb.0:
+; SFB-NOZICOND-NEXT:    li a0, 3
+; SFB-NOZICOND-NEXT:    beqz a2, .LBB3_2
+; SFB-NOZICOND-NEXT:  # %bb.1:
+; SFB-NOZICOND-NEXT:    li a0, 0
+; SFB-NOZICOND-NEXT:  .LBB3_2:
+; SFB-NOZICOND-NEXT:    ret
+;
+; SFB-ZICOND-LABEL: test4:
+; SFB-ZICOND:       # %bb.0:
+; SFB-ZICOND-NEXT:    li a0, 3
+; SFB-ZICOND-NEXT:    czero.nez a0, a0, a2
+; SFB-ZICOND-NEXT:    ret
+  %c = icmp eq i32 %z, 0
+  %a = select i1 %c, i32 3, i32 0
+  ret i32 %a
+}
+
+define i16 @select_xor_1(i16 %A, i8 %cond) {
+; NOCMOV-LABEL: select_xor_1:
+; NOCMOV:       # %bb.0: # %entry
+; NOCMOV-NEXT:    slli a1, a1, 63
+; NOCMOV-NEXT:    srai a1, a1, 63
+; NOCMOV-NEXT:    andi a1, a1, 43
+; NOCMOV-NEXT:    xor a0, a0, a1
+; NOCMOV-NEXT:    ret
+;
+; CMOV-LABEL: select_xor_1:
+; CMOV:       # %bb.0: # %entry
+; CMOV-NEXT:    andi a1, a1, 1
+; CMOV-NEXT:    xori a2, a0, 43
+; CMOV-NEXT:    beqz a1, .LBB4_2
+; CMOV-NEXT:  # %bb.1: # %entry
+; CMOV-NEXT:    mv a0, a2
+; CMOV-NEXT:  .LBB4_2: # %entry
+; CMOV-NEXT:    ret
+;
+; SHORT_FORWARD-LABEL: select_xor_1:
+; SHORT_FORWARD:       # %bb.0: # %entry
+; SHORT_FORWARD-NEXT:    andi a1, a1, 1
+; SHORT_FORWARD-NEXT:    beqz a1, .LBB4_2
+; SHORT_FORWARD-NEXT:  # %bb.1: # %entry
+; SHORT_FORWARD-NEXT:    xori a0, a0, 43
+; SHORT_FORWARD-NEXT:  .LBB4_2: # %entry
+; SHORT_FORWARD-NEXT:    ret
+entry:
+ %and = and i8 %cond, 1
+ %cmp10 = icmp eq i8 %and, 0
+ %0 = xor i16 %A, 43
+ %1 = select i1 %cmp10, i16 %A, i16 %0
+ ret i16 %1
+}
+
+; Equivalent to above, but with icmp ne (and %cond, 1), 1 instead of
+; icmp eq (and %cond, 1), 0
+define i16 @select_xor_1b(i16 %A, i8 %cond) {
+; NOCMOV-LABEL: select_xor_1b:
+; NOCMOV:       # %bb.0: # %entry
+; NOCMOV-NEXT:    slli a1, a1, 63
+; NOCMOV-NEXT:    srai a1, a1, 63
+; NOCMOV-NEXT:    andi a1, a1, 43
+; NOCMOV-NEXT:    xor a0, a0, a1
+; NOCMOV-NEXT:    ret
+;
+; CMOV-LABEL: select_xor_1b:
+; CMOV:       # %bb.0: # %entry
+; CMOV-NEXT:    andi a1, a1, 1
+; CMOV-NEXT:    xori a2, a0, 43
+; CMOV-NEXT:    beqz a1, .LBB5_2
+; CMOV-NEXT:  # %bb.1: # %entry
+; CMOV-NEXT:    mv a0, a2
+; CMOV-NEXT:  .LBB5_2: # %entry
+; CMOV-NEXT:    ret
+;
+; SHORT_FORWARD-LABEL: select_xor_1b:
+; SHORT_FORWARD:       # %bb.0: # %entry
+; SHORT_FORWARD-NEXT:    andi a1, a1, 1
+; SHORT_FORWARD-NEXT:    beqz a1, .LBB5_2
+; SHORT_FORWARD-NEXT:  # %bb.1: # %entry
+; SHORT_FORWARD-NEXT:    xori a0, a0, 43
+; SHORT_FORWARD-NEXT:  .LBB5_2: # %entry
+; SHORT_FORWARD-NEXT:    ret
+entry:
+ %and = and i8 %cond, 1
+ %cmp10 = icmp ne i8 %and, 1
+ %0 = xor i16 %A, 43
+ %1 = select i1 %cmp10, i16 %A, i16 %0
+ ret i16 %1
+}
+
+define i32 @select_xor_2(i32 %A, i32 %B, i8 %cond) {
+; NOCMOV-LABEL: select_xor_2:
+; NOCMOV:       # %bb.0: # %entry
+; NOCMOV-NEXT:    slli a2, a2, 63
+; NOCMOV-NEXT:    srai a2, a2, 63
+; NOCMOV-NEXT:    and a1, a1, a2
+; NOCMOV-NEXT:    xor a0, a0, a1
+; NOCMOV-NEXT:    ret
+;
+; CMOV-LABEL: select_xor_2:
+; CMOV:       # %bb.0: # %entry
+; CMOV-NEXT:    andi a2, a2, 1
+; CMOV-NEXT:    xor a1, a1, a0
+; CMOV-NEXT:    beqz a2, .LBB6_2
+; CMOV-NEXT:  # %bb.1: # %entry
+; CMOV-NEXT:    mv a0, a1
+; CMOV-NEXT:  .LBB6_2: # %entry
+; CMOV-NEXT:    ret
+;
+; SFB-ZICOND-LABEL: select_xor_2:
+; SFB-ZICOND:       # %bb.0: # %entry
+; SFB-ZICOND-NEXT:    andi a2, a2, 1
+; SFB-ZICOND-NEXT:    beqz a2, .LBB6_2
+; SFB-ZICOND-NEXT:  # %bb.1: # %entry
+; SFB-ZICOND-NEXT:    xor a0, a1, a0
+; SFB-ZICOND-NEXT:  .LBB6_2: # %entry
+; SFB-ZICOND-NEXT:    ret
+entry:
+ %and = and i8 %cond, 1
+ %cmp10 = icmp eq i8 %and, 0
+ %0 = xor i32 %B, %A
+ %1 = select i1 %cmp10, i32 %A, i32 %0
+ ret i32 %1
+}
+
+; Equivalent to above, but with icmp ne (and %cond, 1), 1 instead of
+; icmp eq (and %cond, 1), 0
+define i32 @select_xor_2b(i32 %A, i32 %B, i8 %cond) {
+; NO...
[truncated]

preames · 2023-12-21T18:56:05Z

llvm/lib/Target/RISCV/RISCVInstrInfo.td

+// This should always expand to a branch+c.mv so the size is 6 or 4 if the
+// branch is compressible.
+let Predicates = [CanUseCMOVBranchOpt, NoShortForwardBranchOpt],
+    Constraints = "$dst = $falsev", isCommutable = 1, Size = 6 in {


Do you want the isSelect = 1 from the other version? Or was that an intentional omission?

It's intentional. isSelect enables the ALU op folding code which we don't want.

preames · 2023-12-21T18:58:30Z

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

@@ -6908,7 +6908,8 @@ static SDValue combineSelectToBinOp(SDNode *N, SelectionDAG &DAG,
  MVT VT = N->getSimpleValueType(0);
  SDLoc DL(N);

-  if (!Subtarget.hasShortForwardBranchOpt()) {
+  if (!Subtarget.hasShortForwardBranchOpt() &&


Is it the case that all machines with ShortForwardBranch also implement the new fusion? If so, we could adjust the flag checks here a bit.

The pattern is a subset of ShortForwardBranch so yes. Would need to make ShortForwardBranch imply TuneCMOVBranchOpt in tablegen which I think is doable.

Actually its more complicated because ShortForwardBranch can fuse mv or c.mv but CMOVBranchOpt requires C. So I don't think the dependency alone would be enough, but we could have a single subtarget function to wrap this.

preames · 2023-12-21T19:00:46Z

llvm/lib/Target/RISCV/RISCVFeatures.td

@@ -996,6 +996,12 @@ def TuneShortForwardBranchOpt
 def HasShortForwardBranchOpt : Predicate<"Subtarget->hasShortForwardBranchOpt()">;
 def NoShortForwardBranchOpt : Predicate<"!Subtarget->hasShortForwardBranchOpt()">;

+def TuneCMOVBranchOpt


This naming is hard for me to parse at a glance. A couple ideas:

TuneCMovFusion

TuneShortForwardBranchCMovOnly

wangpc-pp · 2023-12-22T03:50:29Z

llvm/lib/Target/RISCV/RISCVSchedule.td

@@ -112,6 +112,10 @@ def WriteFST16        : SchedWrite;    // Floating point sp store
 def WriteFST32        : SchedWrite;    // Floating point sp store
 def WriteFST64        : SchedWrite;    // Floating point dp store

+// CMOV for sifive-p450.


This is intrusive, can we just use InstRW to override the resources in sifive-p450 schedule model? Or is it possible that we may reuse WriteCMOV/ReadCMOV someday? For example, Zbt is back to Zb* extensions.

Why is having a WriteCMOV and a ReadCMOV intrusive? How are these Sched<Read|Write> different than any other Sched<Read|Write>? I understand that not every CPU may use the CMOV instructions, but this is the case for CPUs that do not use vector instructions.

It is my understanding that if a CPU does not implement vector instructions, then it doesn't need to specify behavior of WriteRes and ReadAdvance for vector instructions (UnsupportedV), as this is the case in the RocketModel. It would be the same thing for CMOV instructions.

I think WriteCMOV and ReadCMOV are minimally invasive. It is my goal to use InstRW as little as possible in the SchedModels -- SchedRead and SchedWrite aim to replace InstRW in my opinion.

Thanks, your thought makes sense to me. My concern is that we won't have CMOV in the near future and we need a UnsupportedCMOV for almost all new schedule models (except SiFive's), which is already the situation for UnsupportedSchedSFB. This is because SFB and this CMOV are only for vendor extensions/features and my thought/strategy is that we should seperate vendor-specific parts from standard parts.
Actually, I think there should be a RISCVInstrInfoSFB.td which is split out from RISCVInstrInfo.td.

topperc · 2024-01-05T17:30:43Z

Ping

wangpc-pp

LGTM.

sifive-p450 supports a very restricted version of the short forward branch optimization from the sifive-7-series. For sifive-p450, a branch over a single c.mv can be macrofused as a conditional move operation. Due to encoding restrictions on c.mv, we can't conditionally move from X0. That would require c.li instead.

topperc requested review from asb, preames and wangpc-pp December 21, 2023 17:16

llvmbot added the backend:RISC-V label Dec 21, 2023

preames reviewed Dec 21, 2023

View reviewed changes

wangpc-pp reviewed Dec 22, 2023

View reviewed changes

fixup! Rename to address review comments. Remove Scheduler changes.

e36f31b

wangpc-pp approved these changes Jan 8, 2024

View reviewed changes

topperc merged commit faa326d into llvm:main Jan 8, 2024

topperc deleted the craigt/cmov-branch-opt branch January 8, 2024 23:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Add branch+c.mv macrofusion for sifive-p450. #76169

[RISCV] Add branch+c.mv macrofusion for sifive-p450. #76169

Uh oh!

topperc commented Dec 21, 2023

Uh oh!

llvmbot commented Dec 21, 2023

Uh oh!

preames Dec 21, 2023

Uh oh!

topperc Dec 21, 2023

Uh oh!

preames Dec 21, 2023

Uh oh!

topperc Dec 21, 2023

Uh oh!

topperc Dec 21, 2023

Uh oh!

preames Dec 21, 2023

Uh oh!

wangpc-pp Dec 22, 2023

Uh oh!

michaelmaitland Jan 26, 2024 •

edited

Loading

Uh oh!

wangpc-pp Jan 29, 2024

Uh oh!

topperc commented Jan 5, 2024

Uh oh!

wangpc-pp left a comment

Uh oh!

Uh oh!

[RISCV] Add branch+c.mv macrofusion for sifive-p450. #76169

[RISCV] Add branch+c.mv macrofusion for sifive-p450. #76169

Uh oh!

Conversation

topperc commented Dec 21, 2023

Uh oh!

llvmbot commented Dec 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelmaitland Jan 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc commented Jan 5, 2024

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelmaitland Jan 26, 2024 •

edited

Loading