[RISCV][VLOPT] Compute demanded VLs up front #124530

lukel97 · 2025-01-27T11:06:37Z

This replaces the worklist by instead computing what VL is demanded by each instruction's users first, which is done via checkUsers.

The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp.

This means the pass should now be linear in complexity, and allows us to relax the restriction on tied operands in more easily as in #124066.

Stacked on #124734

llvmbot · 2025-01-27T11:07:09Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This replaces the worklist by instead computing what VL is demanded by each instruction's users first.

checkUsers essentially already did this, so it's been renamed to computeDemandedVL.

The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp.

This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in #124066.

Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.

Patch is 86.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124530.diff

5 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+2)
(modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+81-77)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir (+274-4)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.mir (+28)
(modified) llvm/test/CodeGen/RISCV/rvv/vlopt-same-vl.ll (+5-5)

diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bd02880b0d7129..006490c50be4de 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -4232,6 +4232,8 @@ unsigned RISCV::getDestLog2EEW(const MCInstrDesc &Desc, unsigned Log2SEW) {
 
 /// Given two VL operands, do we know that LHS <= RHS?
 bool RISCV::isVLKnownLE(const MachineOperand &LHS, const MachineOperand &RHS) {
+  if (LHS.isImm() && LHS.getImm() == 0)
+    return true;
   if (LHS.isReg() && RHS.isReg() && LHS.getReg().isVirtual() &&
       LHS.getReg() == RHS.getReg())
     return true;
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 976c65e51c2059..c5508fd23c03a1 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -33,6 +33,7 @@ namespace {
 class RISCVVLOptimizer : public MachineFunctionPass {
   const MachineRegisterInfo *MRI;
   const MachineDominatorTree *MDT;
+  const TargetInstrInfo *TII;
 
 public:
   static char ID;
@@ -50,12 +51,15 @@ class RISCVVLOptimizer : public MachineFunctionPass {
   StringRef getPassName() const override { return PASS_NAME; }
 
 private:
-  std::optional<MachineOperand> getMinimumVLForUser(MachineOperand &UserOp);
-  /// Returns the largest common VL MachineOperand that may be used to optimize
-  /// MI. Returns std::nullopt if it failed to find a suitable VL.
-  std::optional<MachineOperand> checkUsers(MachineInstr &MI);
+  MachineOperand getMinimumVLForUser(MachineOperand &UserOp);
+  /// Computes the VL of \p MI that is actually used by its users.
+  MachineOperand computeDemandedVL(const MachineInstr &MI);
   bool tryReduceVL(MachineInstr &MI);
   bool isCandidate(const MachineInstr &MI) const;
+
+  /// For a given instruction, records what elements of it are demanded by
+  /// downstream users.
+  DenseMap<const MachineInstr *, MachineOperand> DemandedVLs;
 };
 
 } // end anonymous namespace
@@ -77,6 +81,15 @@ static bool isVectorRegClass(Register R, const MachineRegisterInfo *MRI) {
   return RISCVRI::isVRegClass(RC->TSFlags);
 }
 
+/// Return true if \p MO is a physical or virtual vector register, false
+/// otherwise.
+static bool isVectorOperand(const MachineOperand &MO,
+                            const MachineRegisterInfo *MRI) {
+  if (!MO.isReg())
+    return false;
+  return isVectorRegClass(MO.getReg(), MRI);
+}
+
 /// Represents the EMUL and EEW of a MachineOperand.
 struct OperandInfo {
   // Represent as 1,2,4,8, ... and fractional indicator. This is because
@@ -1202,15 +1215,14 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
   return true;
 }
 
-std::optional<MachineOperand>
-RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
+MachineOperand RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
   const MachineInstr &UserMI = *UserOp.getParent();
   const MCInstrDesc &Desc = UserMI.getDesc();
 
   if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
     LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
                          " use VLMAX\n");
-    return std::nullopt;
+    return MachineOperand::CreateImm(RISCV::VLMaxSentinel);
   }
 
   // Instructions like reductions may use a vector register as a scalar
@@ -1226,50 +1238,63 @@ RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
   }
 
   unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
-  const MachineOperand &VLOp = UserMI.getOperand(VLOpNum);
+  const MachineOperand VLOp = UserMI.getOperand(VLOpNum);
   // Looking for an immediate or a register VL that isn't X0.
   assert((!VLOp.isReg() || VLOp.getReg() != RISCV::X0) &&
          "Did not expect X0 VL");
+
+  // If we know the demanded VL of UserMI, then we can reduce the VL it
+  // requires.
+  if (DemandedVLs.contains(&UserMI)) {
+    // We can only shrink the demanded VL if the elementwise result doesn't
+    // depend on VL (i.e. not vredsum/viota etc.)
+    // Also conservatively restrict to supported instructions for now.
+    // TODO: Can we remove the isSupportedInstr check?
+    if (!RISCVII::elementsDependOnVL(
+            TII->get(RISCV::getRVVMCOpcode(UserMI.getOpcode())).TSFlags) &&
+        isSupportedInstr(UserMI)) {
+      const MachineOperand &DemandedVL = DemandedVLs.at(&UserMI);
+      if (RISCV::isVLKnownLE(DemandedVL, VLOp))
+        return DemandedVL;
+    }
+  }
+
   return VLOp;
 }
 
-std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
-  // FIXME: Avoid visiting each user for each time we visit something on the
-  // worklist, combined with an extra visit from the outer loop. Restructure
-  // along lines of an instcombine style worklist which integrates the outer
-  // pass.
-  std::optional<MachineOperand> CommonVL;
+MachineOperand RISCVVLOptimizer::computeDemandedVL(const MachineInstr &MI) {
+  const MachineOperand &VLMAX = MachineOperand::CreateImm(RISCV::VLMaxSentinel);
+  MachineOperand DemandedVL = MachineOperand::CreateImm(0);
+
   for (auto &UserOp : MRI->use_operands(MI.getOperand(0).getReg())) {
     const MachineInstr &UserMI = *UserOp.getParent();
     LLVM_DEBUG(dbgs() << "  Checking user: " << UserMI << "\n");
     if (mayReadPastVL(UserMI)) {
       LLVM_DEBUG(dbgs() << "    Abort because used by unsafe instruction\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     // Tied operands might pass through.
     if (UserOp.isTied()) {
       LLVM_DEBUG(dbgs() << "    Abort because user used as tied operand\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
-    auto VLOp = getMinimumVLForUser(UserOp);
-    if (!VLOp)
-      return std::nullopt;
+    const MachineOperand &VLOp = getMinimumVLForUser(UserOp);
 
     // Use the largest VL among all the users. If we cannot determine this
     // statically, then we cannot optimize the VL.
-    if (!CommonVL || RISCV::isVLKnownLE(*CommonVL, *VLOp)) {
-      CommonVL = *VLOp;
-      LLVM_DEBUG(dbgs() << "    User VL is: " << VLOp << "\n");
-    } else if (!RISCV::isVLKnownLE(*VLOp, *CommonVL)) {
+    if (RISCV::isVLKnownLE(DemandedVL, VLOp)) {
+      DemandedVL = VLOp;
+      LLVM_DEBUG(dbgs() << "    Demanded VL is: " << VLOp << "\n");
+    } else if (!RISCV::isVLKnownLE(VLOp, DemandedVL)) {
       LLVM_DEBUG(dbgs() << "    Abort because cannot determine a common VL\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     if (!RISCVII::hasSEWOp(UserMI.getDesc().TSFlags)) {
       LLVM_DEBUG(dbgs() << "    Abort due to lack of SEW operand\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     std::optional<OperandInfo> ConsumerInfo = getOperandInfo(UserOp, MRI);
@@ -1279,7 +1304,7 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
       LLVM_DEBUG(dbgs() << "    Abort due to unknown operand information.\n");
       LLVM_DEBUG(dbgs() << "      ConsumerInfo is: " << ConsumerInfo << "\n");
       LLVM_DEBUG(dbgs() << "      ProducerInfo is: " << ProducerInfo << "\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     // If the operand is used as a scalar operand, then the EEW must be
@@ -1294,53 +1319,51 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
           << "    Abort due to incompatible information for EMUL or EEW.\n");
       LLVM_DEBUG(dbgs() << "      ConsumerInfo is: " << ConsumerInfo << "\n");
       LLVM_DEBUG(dbgs() << "      ProducerInfo is: " << ProducerInfo << "\n");
-      return std::nullopt;
+      return VLMAX;
     }
   }
 
-  return CommonVL;
+  return DemandedVL;
 }
 
 bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) {
   LLVM_DEBUG(dbgs() << "Trying to reduce VL for " << MI << "\n");
 
-  auto CommonVL = checkUsers(MI);
-  if (!CommonVL)
-    return false;
+  const MachineOperand &CommonVL = DemandedVLs.at(&MI);
 
-  assert((CommonVL->isImm() || CommonVL->getReg().isVirtual()) &&
+  assert((CommonVL.isImm() || CommonVL.getReg().isVirtual()) &&
          "Expected VL to be an Imm or virtual Reg");
 
   unsigned VLOpNum = RISCVII::getVLOpNum(MI.getDesc());
   MachineOperand &VLOp = MI.getOperand(VLOpNum);
 
-  if (!RISCV::isVLKnownLE(*CommonVL, VLOp)) {
-    LLVM_DEBUG(dbgs() << "    Abort due to CommonVL not <= VLOp.\n");
+  if (!RISCV::isVLKnownLE(CommonVL, VLOp)) {
+    LLVM_DEBUG(dbgs() << "    Abort due to DemandedVL not <= VLOp.\n");
     return false;
   }
 
-  if (CommonVL->isIdenticalTo(VLOp)) {
+  if (CommonVL.isIdenticalTo(VLOp)) {
     LLVM_DEBUG(
-        dbgs() << "    Abort due to CommonVL == VLOp, no point in reducing.\n");
+        dbgs()
+        << "    Abort due to DemandedVL == VLOp, no point in reducing.\n");
     return false;
   }
 
-  if (CommonVL->isImm()) {
+  if (CommonVL.isImm()) {
     LLVM_DEBUG(dbgs() << "  Reduce VL from " << VLOp << " to "
-                      << CommonVL->getImm() << " for " << MI << "\n");
-    VLOp.ChangeToImmediate(CommonVL->getImm());
+                      << CommonVL.getImm() << " for " << MI << "\n");
+    VLOp.ChangeToImmediate(CommonVL.getImm());
     return true;
   }
-  const MachineInstr *VLMI = MRI->getVRegDef(CommonVL->getReg());
+  const MachineInstr *VLMI = MRI->getVRegDef(CommonVL.getReg());
   if (!MDT->dominates(VLMI, &MI))
     return false;
-  LLVM_DEBUG(
-      dbgs() << "  Reduce VL from " << VLOp << " to "
-             << printReg(CommonVL->getReg(), MRI->getTargetRegisterInfo())
-             << " for " << MI << "\n");
+  LLVM_DEBUG(dbgs() << "  Reduce VL from " << VLOp << " to "
+                    << printReg(CommonVL.getReg(), MRI->getTargetRegisterInfo())
+                    << " for " << MI << "\n");
 
   // All our checks passed. We can reduce VL.
-  VLOp.ChangeToRegister(CommonVL->getReg(), false);
+  VLOp.ChangeToRegister(CommonVL.getReg(), false);
   return true;
 }
 
@@ -1355,52 +1378,33 @@ bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
   if (!ST.hasVInstructions())
     return false;
 
-  SetVector<MachineInstr *> Worklist;
-  auto PushOperands = [this, &Worklist](MachineInstr &MI,
-                                        bool IgnoreSameBlock) {
-    for (auto &Op : MI.operands()) {
-      if (!Op.isReg() || !Op.isUse() || !Op.getReg().isVirtual() ||
-          !isVectorRegClass(Op.getReg(), MRI))
-        continue;
+  TII = ST.getInstrInfo();
 
-      MachineInstr *DefMI = MRI->getVRegDef(Op.getReg());
-      if (!isCandidate(*DefMI))
-        continue;
-
-      if (IgnoreSameBlock && DefMI->getParent() == MI.getParent())
-        continue;
-
-      Worklist.insert(DefMI);
-    }
-  };
-
-  // Do a first pass eagerly rewriting in roughly reverse instruction
-  // order, populate the worklist with any instructions we might need to
-  // revisit.  We avoid adding definitions to the worklist if they're
-  // in the same block - we're about to visit them anyways.
   bool MadeChange = false;
   for (MachineBasicBlock &MBB : MF) {
     // Avoid unreachable blocks as they have degenerate dominance
     if (!MDT->isReachableFromEntry(&MBB))
       continue;
 
-    for (auto &MI : make_range(MBB.rbegin(), MBB.rend())) {
+    // For each instruction that defines a vector, compute what VL its
+    // downstream users demand.
+    for (const auto &MI : reverse(MBB)) {
+      if (!isCandidate(MI))
+        continue;
+      DemandedVLs.insert({&MI, computeDemandedVL(MI)});
+    }
+
+    // Then go through and see if we can reduce the VL of any instructions to
+    // only what's demanded.
+    for (auto &MI : MBB) {
       if (!isCandidate(MI))
         continue;
       if (!tryReduceVL(MI))
         continue;
       MadeChange = true;
-      PushOperands(MI, /*IgnoreSameBlock*/ true);
     }
-  }
 
-  while (!Worklist.empty()) {
-    assert(MadeChange);
-    MachineInstr &MI = *Worklist.pop_back_val();
-    assert(isCandidate(MI));
-    if (!tryReduceVL(MI))
-      continue;
-    PushOperands(MI, /*IgnoreSameBlock*/ false);
+    DemandedVLs.clear();
   }
 
   return MadeChange;
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
index edcd32c4098bca..2684e7c3b139ca 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
@@ -8,8 +8,10 @@ body: |
     ; CHECK-LABEL: name: vop_vi
     ; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vi_incompatible_eew
@@ -18,8 +20,10 @@ body: |
     ; CHECK-LABEL: name: vop_vi_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vi_incompatible_emul
@@ -28,8 +32,10 @@ body: |
     ; CHECK-LABEL: name: vop_vi_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vv
@@ -38,8 +44,10 @@ body: |
     ; CHECK-LABEL: name: vop_vv
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vv_incompatible_eew
@@ -48,8 +56,10 @@ body: |
     ; CHECK-LABEL: name: vop_vv_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 
 ...
 ---
@@ -59,8 +69,10 @@ body: |
     ; CHECK-LABEL: name: vop_vv_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vd
@@ -69,8 +81,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vd
     ; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vd_incompatible_eew
@@ -79,8 +93,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vd_incompatible_eew
     ; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vd_incompatible_emul
@@ -89,8 +105,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vd_incompatible_emul
     ; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 4 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vs2
@@ -99,8 +117,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs2
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs2_incompatible_eew
@@ -109,8 +129,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs2_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs2_incompatible_emul
@@ -119,8 +141,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs2_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVWADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVWADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vs1
@@ -129,8 +153,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs1
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs1_incompatible_eew
@@ -139,8 +165,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs1_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, $noreg, %x, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, $noreg, %x, 1, 4 /* e16 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs1_incompatible_emul
@@ -149,8 +177,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs1_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, %x, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, %x, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_wv_vd
@@ -159,8 +189,10 @@ body: |
     ; CHECK-LABEL: name: vwop_wv_vd
     ; CHECK: early-clobber %x:vr = PseudoVWADD_WV_MF2 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_WV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_wv_vd_incompatible_eew
@@ -169,8 +201,10 @@ body: |
     ; CHECK-LABEL: name: vwop_wv_vd_incompatible_eew
     ; ...
[truncated]

lukel97 · 2025-01-27T11:14:16Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+  if (LHS.isImm() && LHS.getImm() == 0)
+    return true;


This is needed to be able to replace the std::optional with CreateImm(0) in computeDemandedVL

lukel97 · 2025-01-27T11:16:14Z

llvm/test/CodeGen/RISCV/rvv/vl-opt.mir

    %vl:gprnox0 = COPY $x1
    %x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
    %y:vr = PseudoVNSRL_WV_MF4 $noreg, %x, $noreg, %vl, 4 /* e16 */, 0 /* tu, mu */
+    $v8 = COPY %y


Because we're now checking what elements are demanded, any instruction without a use will have a VL of zero demanded, which will end up reducing everything else in the basic block to VL=0. I've added COPYs in the .mir tests to prevent this

I don't think these copies are required now, are there? Adding them is a reasonable thing to do, I'm just confirming this change doesn't need them any more for my own understanding. If so, please separate this into it's own review.

michaelmaitland · 2025-01-27T14:54:27Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

-  /// MI. Returns std::nullopt if it failed to find a suitable VL.
-  std::optional<MachineOperand> checkUsers(MachineInstr &MI);
+  MachineOperand getMinimumVLForUser(MachineOperand &UserOp);
+  /// Computes the VL of \p MI that is actually used by its users.


largest common VL?

Specify what happens if it has no users?

This no longer computes the largest common VL, but instead the minimum demanded VL.

E.g. if MI has a user with VL=2, it's still possible that this would return VL=1 if the user's demanded VL was only 1. Despite it's VL operand being higher. I think there's some other comments that need updated in the function now that you mention it.

michaelmaitland · 2025-01-27T15:07:06Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+    for (const auto &MI : reverse(MBB)) {
+      if (!isCandidate(MI))
+        continue;
+      DemandedVLs.insert({&MI, computeDemandedVL(MI)});


Does the demanded VL of an earlier instruction always stay the same, even if we will eventually optimize the VL of a later (user) instruction?

Yes, because now we don't reduce the VL of anything until after we've analysed the demanded VLs for everything.

computeDemandedVL will work out the minimum possible VL for everything up front, which previously would have required multiple iterations of tryReduceVL. It's able to propagate the VL without mutating anything because of the change to getMinimumVLForUser that peeks through the user's demandedVL.

michaelmaitland · 2025-01-27T15:09:31Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+
+  // If we know the demanded VL of UserMI, then we can reduce the VL it
+  // requires.
+  if (DemandedVLs.contains(&UserMI)) {


Is it really possible to have a VLOp but not be part of DemandedVLs?

I thought the same thing originally first, but then found out we have vcpop.m etc. which don't define any vector operands but use vector operands. So in that case we can't reduce the demanded VL since it's not elementwise

I'm confused. vcpop.m is not a supported instruction. Thus, it's VL should never be reduced, and it shouldn't have an element in the map (not a candidate). It can be a valid user of a supported instruction, but that seems fine? We'd return the VL of the vcpop.m instruction?

Oops, ignore my comment. When I wrote it I was thinking about my initial version of this patch when I computed the demandedVL for every instruction that defined a vector, not just candidates, which is why I was seeing vcpop.m etc.

In the latest version of the patch we still need might have candidate instructions that have aborted, i.e. returned std::nullopt in checkUsers, and so don't have an entry.

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

michaelmaitland

LGTM, but please wait to see if there are any other comments.

michaelmaitland · 2025-01-27T16:03:13Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

  const MachineInstr &UserMI = *UserOp.getParent();
  const MCInstrDesc &Desc = UserMI.getDesc();

  if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
    LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
                         " use VLMAX\n");
-    return std::nullopt;
+    return MachineOperand::CreateImm(RISCV::VLMaxSentinel);


Not related to this patch, but I wonder if we should avoid calling this function when a user does not have a VL operand. The debug output says that assume it uses VLMAX, but I wonder if it just doesn't depend on VL and could have been excluded from calculating the demanded VL.

If a user doesn't have a VL operand it could be a non-pseudo instruction like a COPY etc, which in that case I think we need to preserve the entire VL? I.e. it would still demand VLMAX.

I think you are right. Imagine we return the DEST of the copy, then we need all lanes of the copy. Please ignore my original comment. It does make me wonder if there is any future work related to the VL optimizer looking through copies though.

I actually have a patch that deals with copies of V0 that I hope to post soon.

We currently can't propagate VL through any mask uses because they have to be copied to a physical register first, so my solution was to keep track of all the uses of V0 in a map, similar to how we do it for defs in RISCVVectorPeephole

preames · 2025-01-27T16:19:12Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

-    for (auto &MI : make_range(MBB.rbegin(), MBB.rend())) {
+    // For each instruction that defines a vector, compute what VL its
+    // downstream users demand.
+    for (const auto &MI : reverse(MBB)) {


The reverse can probably be landed as it's an unrelated NFC.

preames

A couple of high level points here:

This is not NFC - as pointed out by your own test changes.
I am really not a fan of the API change to remove std::optional, and it's not clear why you're doing this? You can have a DenseMap with an optional as a value?
The initial demanded VL is sound, but not precise. As you reduce a transitive user, you can reduce the VL of an instruction which in turn reduces it of that instructions defs. You backward walk achieves this within a basic block, but we loose the cross block case.
Per (3) you do need to invalidate. Alternatively, you could adapt the worklist scheme during the computation.
Given (4) - the worklist variant - I'd probably write this as memo-ization, not upfront computation. I'd guess (not having tried to actual write it), that the code structure would be cleaner.

lukel97 · 2025-01-27T16:45:12Z

The initial demanded VL is sound, but not precise. As you reduce a transitive user, you can reduce the VL of an instruction which in turn reduces it of that instructions defs. You backward walk achieves this within a basic block, but we loose the cross block case.

It should be possible to analyze all the blocks up front with a reverse post order traversal, which would handle the cross block case.

I'd like to explore that case first since I'm a bit hesitant to go down the invalidation path. I'm worried it might lead to more compile time edge cases like the one in #123878

github-actions · 2025-01-28T12:06:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

lukel97 · 2025-01-28T12:09:22Z

I've undone all the std::optional API changes and stacked this on top of #124734. The analysis should now also be precise across blocks as it's computed globally in post-order traversal, thanks for catching that.

The diff should be much smaller now, I think the initial version made the change look a lot larger than what it is.

michaelmaitland · 2025-01-28T15:14:15Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

    return false;
+  auto CommonVL = std::make_optional(DemandedVLs.at(&MI));


Are we just using CommonVL as an optional that always has a value below? I think we could be using it as a MachineOperand.

Yeah, I changed it back to an optional to remove the -> to . diff below, and was hoping to clean it up in a follow up. If reviewers would prefer I can convert it in this PR

Feel free to clean it up in a follow up.

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

preames · 2025-01-28T16:25:12Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+
+  /// For a given instruction, records what elements of it are demanded by
+  /// downstream users.
+  DenseMap<const MachineInstr *, MachineOperand> DemandedVLs;


Please just have the value be an optional. This simplifies the code, and has no real additional cost. Please stop trying to micro-optimize this. If you want to come back to it, please feel free, but let's do the simple version first.

To be clear, this is not just a stylistic point. I am suspecting from your code structure below that you have a bug in your handling of the std::nullopt case, and want as much of that logic to disappear from this review as possible.

More than happy to do so! Hope this didn't come across as an optimisation/stylistic point, that wasn't my intention.

preames · 2025-01-28T16:33:12Z

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

+
+  // If we know the demanded VL of UserMI, then we can reduce the VL it
+  // requires.
+  if (DemandedVLs.contains(&UserMI)) {


I'm confused. vcpop.m is not a supported instruction. Thus, it's VL should never be reduced, and it shouldn't have an element in the map (not a candidate). It can be a valid user of a supported instruction, but that seems fine? We'd return the VL of the vcpop.m instruction?

preames

LGTM w/comments addressed. On the COPY point, if that's unclear, please ask.

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

preames · 2025-01-28T18:43:22Z

llvm/test/CodeGen/RISCV/rvv/vl-opt.mir

    %vl:gprnox0 = COPY $x1
    %x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
    %y:vr = PseudoVNSRL_WV_MF4 $noreg, %x, $noreg, %vl, 4 /* e16 */, 0 /* tu, mu */
+    $v8 = COPY %y


I don't think these copies are required now, are there? Adding them is a reasonable thing to do, I'm just confirming this change doesn't need them any more for my own understanding. If so, please separate this into it's own review.

michaelmaitland · 2025-01-28T19:37:42Z

llvm/test/CodeGen/RISCV/rvv/vlopt-same-vl.ll

@@ -12,15 +12,15 @@

 define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
  ; CHECK: User VL is: 4
-  ; CHECK-NEXT: Abort due to CommonVL == VLOp, no point in reducing.
+  ; CHECK: Abort due to CommonVL == VLOp, no point in reducing.


Why drop NEXT?

tryReduceVL is no longer called immediately after the checkUsers, so the CHECK-NEXT failed

This replaces the worklist by instead computing what VL is demanded by each instruction's users first. checkUsers essentially already did this, so it's been renamed to computeDemandedVL. The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp. This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in llvm#124066. Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.

…onVL -> DemandedVL

…for unreachable blocks.

…emandedVL must be a candidate

…itional

lukel97 · 2025-01-29T03:23:38Z

I don't think these copies are required now, are there?

Not after I added back std::optional in checkUsers, I've removed them now

… demanded (#124066) The motivation for this to allow reducing the vl when a user is a ternary pseudo, where the third operand is tied and also acts as a passthru. When checking the users of an instruction, we currently bail if the user is used as a passthru because all of its elements past vl will be used for the tail. We can allow passthru users if we know the tail of their result isn't used, which we will have computed beforehand after #124530 It's worth noting that this is all irrelevant of the tail policy, because tail agnostic still ends up using the passthru. I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on qemu with rvv_ta_all_1s=true) Fixes #123760

…unction I was running into failed assertions of `isCandidate(UserMI)` in `getMinimumVLForUser`, but only occurring with `-enable-machine-outliner=never`. I believe this is a red herring, and it just so happens the memory allocation pattern on my machine exposed the bug with that flag. DemandedVLs is never cleared, which means it accumulates more MachineInstr pointer keys over time, and it's possible that when e.g. running on function 'b', a MachineInstr pointer points to the same memory location used for a candidate in 'a'. This causes the assertion to fail. Comment left on #124530 with more information.

asb · 2025-02-02T18:18:48Z

I've directly committed 52c1162 to fix the fact that DemandedVLs is never cleared. I'll leave details of my reproducer here, but because triggering the failed assertions depends on the precise sequence of memory reuse it may not be portable to other systems (which is why I haven't committed it as a test case).

After some reduction (made a bit dodgy as the crash isn't guaranteed to happen), the following invoked with llc -enable-machine-outliner=never < reduced.ll will usually cause the assertion isCandidate(UserMI) in getMinimumVLForUser to fail. Seemingly, that option happens to change the memory allocation pattern such that a candidate MachineInstr* added to DemandedVLs when iterating over some earlier function aliases with a MachinInstr* when iterating over a later function (reallocations have presumably taken place).

I've directly committed what seems to be the most obvious fix.

; ModuleID = 'out.ll'
source_filename = "pr53645-2.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

@u = global [2 x <8 x i16>] [<8 x i16> <i16 73, i16 -5, i16 0, i16 174, i16 921, i16 -1, i16 17, i16 178>, <8 x i16> <i16 1, i16 8173, i16 -1, i16 -64, i16 12, i16 29612, i16 128, i16 8912>]
@s = global [2 x <8 x i16>] [<8 x i16> <i16 73, i16 -9123, i16 32761, i16 8191, i16 16371, i16 1201, i16 12701, i16 9999>, <8 x i16> <i16 9903, i16 -1, i16 -7323, i16 0, i16 -7, i16 -323, i16 9124, i16 -9199>]

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024)
define void @ur65656565(ptr noundef writeonly captures(none) initializes((0, 16)) %x, ptr noundef readonly captures(none) %y) #0 {
entry:
  %0 = load <8 x i16>, ptr %y, align 16, !tbaa !9
  %rem = urem <8 x i16> %0, <i16 6, i16 5, i16 6, i16 5, i16 6, i16 5, i16 6, i16 5>
  store <8 x i16> %rem, ptr %x, align 16, !tbaa !9
  ret void
}

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024)
define void @uq77777777(ptr noundef writeonly captures(none) initializes((0, 16)) %x, ptr noundef readonly captures(none) %y) #0 {
entry:
  %0 = load <8 x i16>, ptr %y, align 16, !tbaa !9
  %div = udiv <8 x i16> %0, splat (i16 7)
  store <8 x i16> %div, ptr %x, align 16, !tbaa !9
  ret void
}

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024)
define void @sq77777777(ptr noundef writeonly captures(none) initializes((0, 16)) %x, ptr noundef readonly captures(none) %y) #0 {
entry:
  %0 = load <8 x i16>, ptr %y, align 16, !tbaa !9
  %div = sdiv <8 x i16> %0, splat (i16 7)
  store <8 x i16> %div, ptr %x, align 16, !tbaa !9
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr captures(none)) #1

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr captures(none)) #1

attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+b,+c,+d,+f,+m,+relax,+supm,+v,+za64rs,+zaamo,+zalrsc,+zawrs,+zba,+zbb,+zbs,+zca,+zcb,+zcmop,+zfa,+zfhmin,+zic64b,+zicbom,+zicbop,+zicboz,+ziccamoa,+ziccif,+zicclsm,+ziccrse,+zicntr,+zicond,+zicsr,+zihintntl,+zihintpause,+zihpm,+zimop,+zkt,+zmmul,+zvbb,+zve32f,+zve32x,+zve64d,+zve64f,+zve64x,+zvfhmin,+zvkb,+zvkt,+zvl128b,+zvl32b,+zvl64b,-e,-experimental-sdext,-experimental-sdtrig,-experimental-smctr,-experimental-ssctr,-experimental-svukte,-experimental-xqcia,-experimental-xqciac,-experimental-xqcicli,-experimental-xqcicm,-experimental-xqcics,-experimental-xqcicsr,-experimental-xqciint,-experimental-xqcilo,-experimental-xqcilsm,-experimental-xqcisls,-experimental-zalasr,-experimental-zicfilp,-experimental-zicfiss,-experimental-zvbc32e,-experimental-zvkgs,-h,-sha,-shcounterenw,-shgatpa,-shtvala,-shvsatpa,-shvstvala,-shvstvecd,-smaia,-smcdeleg,-smcsrind,-smdbltrp,-smepmp,-smmpm,-smnpm,-smrnmi,-smstateen,-ssaia,-ssccfg,-ssccptr,-sscofpmf,-sscounterenw,-sscsrind,-ssdbltrp,-ssnpm,-sspm,-ssqosid,-ssstateen,-ssstrict,-sstc,-sstvala,-sstvecd,-ssu64xl,-svade,-svadu,-svbare,-svinval,-svnapot,-svpbmt,-svvptc,-xcvalu,-xcvbi,-xcvbitmanip,-xcvelw,-xcvmac,-xcvmem,-xcvsimd,-xmipscmove,-xmipslsp,-xsfcease,-xsfvcp,-xsfvfnrclipxfqf,-xsfvfwmaccqqq,-xsfvqmaccdod,-xsfvqmaccqoq,-xsifivecdiscarddlone,-xsifivecflushdlone,-xtheadba,-xtheadbb,-xtheadbs,-xtheadcmo,-xtheadcondmov,-xtheadfmemidx,-xtheadmac,-xtheadmemidx,-xtheadmempair,-xtheadsync,-xtheadvdot,-xventanacondops,-xwchc,-za128rs,-zabha,-zacas,-zama16b,-zbc,-zbkb,-zbkc,-zbkx,-zcd,-zce,-zcf,-zcmp,-zcmt,-zdinx,-zfbfmin,-zfh,-zfinx,-zhinx,-zhinxmin,-zifencei,-zk,-zkn,-zknd,-zkne,-zknh,-zkr,-zks,-zksed,-zksh,-ztso,-zvbc,-zvfbfmin,-zvfbfwma,-zvfh,-zvkg,-zvkn,-zvknc,-zvkned,-zvkng,-zvknha,-zvknhb,-zvks,-zvksc,-zvksed,-zvksg,-zvksh,-zvl1024b,-zvl16384b,-zvl2048b,-zvl256b,-zvl32768b,-zvl4096b,-zvl512b,-zvl65536b,-zvl8192b" }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }

!llvm.module.flags = !{!0, !1, !2, !4, !5, !6, !7}
!llvm.ident = !{!8}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"target-abi", !"lp64d"}
!2 = !{i32 6, !"riscv-isa", !3}
!3 = !{!"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_b1p0_v1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0_ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0_zmmul1p0_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcmop1p0_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0_supm1p0"}
!4 = !{i32 8, !"PIC Level", i32 2}
!5 = !{i32 7, !"PIE Level", i32 2}
!6 = !{i32 7, !"uwtable", i32 2}
!7 = !{i32 8, !"SmallDataLimit", i32 0}
!8 = !{!"clang version 21.0.0git"}
!9 = !{!10, !10, i64 0}
!10 = !{!"omnipotent char", !11, i64 0}
!11 = !{!"Simple C/C++ TBAA"}

lukel97 · 2025-02-03T03:27:54Z

I've directly committed 52c1162 to fix the fact that DemandedVLs is never cleared.

Thanks for fixing that, I think we've previously had issues with forgetting to clear structures in passes before

preames · 2025-02-03T17:18:41Z

I've directly committed 52c1162 to fix the fact that DemandedVLs is never cleared.

Thanks. I really should have caught that in review - oops.

lukel97 requested review from michaelmaitland, preames, topperc and wangpc-pp January 27, 2025 11:06

llvmbot added the backend:RISC-V label Jan 27, 2025

lukel97 force-pushed the vloptimizer/demandedVLs branch 2 times, most recently from 2f64994 to 169a45a Compare January 27, 2025 11:13

lukel97 commented Jan 27, 2025

View reviewed changes

lukel97 mentioned this pull request Jan 27, 2025

[RISCV][VLOPT] Allow users that are passthrus if tail elements aren't demanded #124066

Merged

michaelmaitland reviewed Jan 27, 2025

View reviewed changes

michaelmaitland approved these changes Jan 27, 2025

View reviewed changes

preames reviewed Jan 27, 2025

View reviewed changes

preames requested changes Jan 27, 2025

View reviewed changes

lukel97 force-pushed the vloptimizer/demandedVLs branch from 38d35f1 to 7fe39a2 Compare January 28, 2025 12:03

lukel97 changed the title ~~[RISCV][VLOPT] Compute demanded VLs up front. NFC~~ [RISCV][VLOPT] Compute demanded VLs up front Jan 28, 2025

lukel97 force-pushed the vloptimizer/demandedVLs branch from 4a84c81 to 689f4e4 Compare January 28, 2025 12:32

michaelmaitland reviewed Jan 28, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Show resolved Hide resolved

preames reviewed Jan 28, 2025

View reviewed changes

lukel97 force-pushed the vloptimizer/demandedVLs branch from 689f4e4 to f7ea8d4 Compare January 28, 2025 17:31

preames reviewed Jan 28, 2025

View reviewed changes

michaelmaitland reviewed Jan 28, 2025

View reviewed changes

lukel97 added 3 commits January 29, 2025 11:01

Keep VL = 1 check, check isCandidate after popping instead

c6eb48e

Move VL=1 check to tryReduceVL

dd84c16

lukel97 added 7 commits January 29, 2025 11:01

Update comments, remove redundant isSupportedInstr check, rename Comm…

46c590e

…onVL -> DemandedVL

Undo renames + put back std::optional

ed23fc8

Do cross-block analysis with post-order traversal, add assert + test …

61004a9

…for unreachable blocks.

clang-format

343dba9

Remove more parts of the diff

c65518c

Use std::optional in DenseMap, remove TII check given everything in d…

1bf7716

…emandedVL must be a candidate

Remove COPYs from tests that are no longer needed, make insert uncond…

6264a89

…itional

lukel97 force-pushed the vloptimizer/demandedVLs branch from f7ea8d4 to 6264a89 Compare January 29, 2025 03:20

lukel97 merged commit 8675cd3 into llvm:main Jan 29, 2025
8 checks passed

		return false;
		auto CommonVL = std::make_optional(DemandedVLs.at(&MI));

[RISCV][VLOPT] Compute demanded VLs up front #124530

[RISCV][VLOPT] Compute demanded VLs up front #124530

Uh oh!

Conversation

lukel97 commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelmaitland left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jan 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lukel97 commented Jan 27, 2025 •

edited

Loading

lukel97 commented Jan 27, 2025 •

edited

Loading

github-actions bot commented Jan 28, 2025 •

edited

Loading

asb commented Feb 2, 2025 •

edited

Loading