Skip to content

[RISCV][VLOPT] Compute demanded VLs up front #124530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 29, 2025

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jan 27, 2025

This replaces the worklist by instead computing what VL is demanded by each instruction's users first, which is done via checkUsers.

The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp.

This means the pass should now be linear in complexity, and allows us to relax the restriction on tied operands in more easily as in #124066.

Stacked on #124734

@llvmbot
Copy link
Member

llvmbot commented Jan 27, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This replaces the worklist by instead computing what VL is demanded by each instruction's users first.

checkUsers essentially already did this, so it's been renamed to computeDemandedVL.

The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp.

This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in #124066.

Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.


Patch is 86.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124530.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+2)
  • (modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+81-77)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir (+274-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.mir (+28)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vlopt-same-vl.ll (+5-5)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bd02880b0d7129..006490c50be4de 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -4232,6 +4232,8 @@ unsigned RISCV::getDestLog2EEW(const MCInstrDesc &Desc, unsigned Log2SEW) {
 
 /// Given two VL operands, do we know that LHS <= RHS?
 bool RISCV::isVLKnownLE(const MachineOperand &LHS, const MachineOperand &RHS) {
+  if (LHS.isImm() && LHS.getImm() == 0)
+    return true;
   if (LHS.isReg() && RHS.isReg() && LHS.getReg().isVirtual() &&
       LHS.getReg() == RHS.getReg())
     return true;
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 976c65e51c2059..c5508fd23c03a1 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -33,6 +33,7 @@ namespace {
 class RISCVVLOptimizer : public MachineFunctionPass {
   const MachineRegisterInfo *MRI;
   const MachineDominatorTree *MDT;
+  const TargetInstrInfo *TII;
 
 public:
   static char ID;
@@ -50,12 +51,15 @@ class RISCVVLOptimizer : public MachineFunctionPass {
   StringRef getPassName() const override { return PASS_NAME; }
 
 private:
-  std::optional<MachineOperand> getMinimumVLForUser(MachineOperand &UserOp);
-  /// Returns the largest common VL MachineOperand that may be used to optimize
-  /// MI. Returns std::nullopt if it failed to find a suitable VL.
-  std::optional<MachineOperand> checkUsers(MachineInstr &MI);
+  MachineOperand getMinimumVLForUser(MachineOperand &UserOp);
+  /// Computes the VL of \p MI that is actually used by its users.
+  MachineOperand computeDemandedVL(const MachineInstr &MI);
   bool tryReduceVL(MachineInstr &MI);
   bool isCandidate(const MachineInstr &MI) const;
+
+  /// For a given instruction, records what elements of it are demanded by
+  /// downstream users.
+  DenseMap<const MachineInstr *, MachineOperand> DemandedVLs;
 };
 
 } // end anonymous namespace
@@ -77,6 +81,15 @@ static bool isVectorRegClass(Register R, const MachineRegisterInfo *MRI) {
   return RISCVRI::isVRegClass(RC->TSFlags);
 }
 
+/// Return true if \p MO is a physical or virtual vector register, false
+/// otherwise.
+static bool isVectorOperand(const MachineOperand &MO,
+                            const MachineRegisterInfo *MRI) {
+  if (!MO.isReg())
+    return false;
+  return isVectorRegClass(MO.getReg(), MRI);
+}
+
 /// Represents the EMUL and EEW of a MachineOperand.
 struct OperandInfo {
   // Represent as 1,2,4,8, ... and fractional indicator. This is because
@@ -1202,15 +1215,14 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
   return true;
 }
 
-std::optional<MachineOperand>
-RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
+MachineOperand RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
   const MachineInstr &UserMI = *UserOp.getParent();
   const MCInstrDesc &Desc = UserMI.getDesc();
 
   if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
     LLVM_DEBUG(dbgs() << "    Abort due to lack of VL, assume that"
                          " use VLMAX\n");
-    return std::nullopt;
+    return MachineOperand::CreateImm(RISCV::VLMaxSentinel);
   }
 
   // Instructions like reductions may use a vector register as a scalar
@@ -1226,50 +1238,63 @@ RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
   }
 
   unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
-  const MachineOperand &VLOp = UserMI.getOperand(VLOpNum);
+  const MachineOperand VLOp = UserMI.getOperand(VLOpNum);
   // Looking for an immediate or a register VL that isn't X0.
   assert((!VLOp.isReg() || VLOp.getReg() != RISCV::X0) &&
          "Did not expect X0 VL");
+
+  // If we know the demanded VL of UserMI, then we can reduce the VL it
+  // requires.
+  if (DemandedVLs.contains(&UserMI)) {
+    // We can only shrink the demanded VL if the elementwise result doesn't
+    // depend on VL (i.e. not vredsum/viota etc.)
+    // Also conservatively restrict to supported instructions for now.
+    // TODO: Can we remove the isSupportedInstr check?
+    if (!RISCVII::elementsDependOnVL(
+            TII->get(RISCV::getRVVMCOpcode(UserMI.getOpcode())).TSFlags) &&
+        isSupportedInstr(UserMI)) {
+      const MachineOperand &DemandedVL = DemandedVLs.at(&UserMI);
+      if (RISCV::isVLKnownLE(DemandedVL, VLOp))
+        return DemandedVL;
+    }
+  }
+
   return VLOp;
 }
 
-std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
-  // FIXME: Avoid visiting each user for each time we visit something on the
-  // worklist, combined with an extra visit from the outer loop. Restructure
-  // along lines of an instcombine style worklist which integrates the outer
-  // pass.
-  std::optional<MachineOperand> CommonVL;
+MachineOperand RISCVVLOptimizer::computeDemandedVL(const MachineInstr &MI) {
+  const MachineOperand &VLMAX = MachineOperand::CreateImm(RISCV::VLMaxSentinel);
+  MachineOperand DemandedVL = MachineOperand::CreateImm(0);
+
   for (auto &UserOp : MRI->use_operands(MI.getOperand(0).getReg())) {
     const MachineInstr &UserMI = *UserOp.getParent();
     LLVM_DEBUG(dbgs() << "  Checking user: " << UserMI << "\n");
     if (mayReadPastVL(UserMI)) {
       LLVM_DEBUG(dbgs() << "    Abort because used by unsafe instruction\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     // Tied operands might pass through.
     if (UserOp.isTied()) {
       LLVM_DEBUG(dbgs() << "    Abort because user used as tied operand\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
-    auto VLOp = getMinimumVLForUser(UserOp);
-    if (!VLOp)
-      return std::nullopt;
+    const MachineOperand &VLOp = getMinimumVLForUser(UserOp);
 
     // Use the largest VL among all the users. If we cannot determine this
     // statically, then we cannot optimize the VL.
-    if (!CommonVL || RISCV::isVLKnownLE(*CommonVL, *VLOp)) {
-      CommonVL = *VLOp;
-      LLVM_DEBUG(dbgs() << "    User VL is: " << VLOp << "\n");
-    } else if (!RISCV::isVLKnownLE(*VLOp, *CommonVL)) {
+    if (RISCV::isVLKnownLE(DemandedVL, VLOp)) {
+      DemandedVL = VLOp;
+      LLVM_DEBUG(dbgs() << "    Demanded VL is: " << VLOp << "\n");
+    } else if (!RISCV::isVLKnownLE(VLOp, DemandedVL)) {
       LLVM_DEBUG(dbgs() << "    Abort because cannot determine a common VL\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     if (!RISCVII::hasSEWOp(UserMI.getDesc().TSFlags)) {
       LLVM_DEBUG(dbgs() << "    Abort due to lack of SEW operand\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     std::optional<OperandInfo> ConsumerInfo = getOperandInfo(UserOp, MRI);
@@ -1279,7 +1304,7 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
       LLVM_DEBUG(dbgs() << "    Abort due to unknown operand information.\n");
       LLVM_DEBUG(dbgs() << "      ConsumerInfo is: " << ConsumerInfo << "\n");
       LLVM_DEBUG(dbgs() << "      ProducerInfo is: " << ProducerInfo << "\n");
-      return std::nullopt;
+      return VLMAX;
     }
 
     // If the operand is used as a scalar operand, then the EEW must be
@@ -1294,53 +1319,51 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
           << "    Abort due to incompatible information for EMUL or EEW.\n");
       LLVM_DEBUG(dbgs() << "      ConsumerInfo is: " << ConsumerInfo << "\n");
       LLVM_DEBUG(dbgs() << "      ProducerInfo is: " << ProducerInfo << "\n");
-      return std::nullopt;
+      return VLMAX;
     }
   }
 
-  return CommonVL;
+  return DemandedVL;
 }
 
 bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) {
   LLVM_DEBUG(dbgs() << "Trying to reduce VL for " << MI << "\n");
 
-  auto CommonVL = checkUsers(MI);
-  if (!CommonVL)
-    return false;
+  const MachineOperand &CommonVL = DemandedVLs.at(&MI);
 
-  assert((CommonVL->isImm() || CommonVL->getReg().isVirtual()) &&
+  assert((CommonVL.isImm() || CommonVL.getReg().isVirtual()) &&
          "Expected VL to be an Imm or virtual Reg");
 
   unsigned VLOpNum = RISCVII::getVLOpNum(MI.getDesc());
   MachineOperand &VLOp = MI.getOperand(VLOpNum);
 
-  if (!RISCV::isVLKnownLE(*CommonVL, VLOp)) {
-    LLVM_DEBUG(dbgs() << "    Abort due to CommonVL not <= VLOp.\n");
+  if (!RISCV::isVLKnownLE(CommonVL, VLOp)) {
+    LLVM_DEBUG(dbgs() << "    Abort due to DemandedVL not <= VLOp.\n");
     return false;
   }
 
-  if (CommonVL->isIdenticalTo(VLOp)) {
+  if (CommonVL.isIdenticalTo(VLOp)) {
     LLVM_DEBUG(
-        dbgs() << "    Abort due to CommonVL == VLOp, no point in reducing.\n");
+        dbgs()
+        << "    Abort due to DemandedVL == VLOp, no point in reducing.\n");
     return false;
   }
 
-  if (CommonVL->isImm()) {
+  if (CommonVL.isImm()) {
     LLVM_DEBUG(dbgs() << "  Reduce VL from " << VLOp << " to "
-                      << CommonVL->getImm() << " for " << MI << "\n");
-    VLOp.ChangeToImmediate(CommonVL->getImm());
+                      << CommonVL.getImm() << " for " << MI << "\n");
+    VLOp.ChangeToImmediate(CommonVL.getImm());
     return true;
   }
-  const MachineInstr *VLMI = MRI->getVRegDef(CommonVL->getReg());
+  const MachineInstr *VLMI = MRI->getVRegDef(CommonVL.getReg());
   if (!MDT->dominates(VLMI, &MI))
     return false;
-  LLVM_DEBUG(
-      dbgs() << "  Reduce VL from " << VLOp << " to "
-             << printReg(CommonVL->getReg(), MRI->getTargetRegisterInfo())
-             << " for " << MI << "\n");
+  LLVM_DEBUG(dbgs() << "  Reduce VL from " << VLOp << " to "
+                    << printReg(CommonVL.getReg(), MRI->getTargetRegisterInfo())
+                    << " for " << MI << "\n");
 
   // All our checks passed. We can reduce VL.
-  VLOp.ChangeToRegister(CommonVL->getReg(), false);
+  VLOp.ChangeToRegister(CommonVL.getReg(), false);
   return true;
 }
 
@@ -1355,52 +1378,33 @@ bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
   if (!ST.hasVInstructions())
     return false;
 
-  SetVector<MachineInstr *> Worklist;
-  auto PushOperands = [this, &Worklist](MachineInstr &MI,
-                                        bool IgnoreSameBlock) {
-    for (auto &Op : MI.operands()) {
-      if (!Op.isReg() || !Op.isUse() || !Op.getReg().isVirtual() ||
-          !isVectorRegClass(Op.getReg(), MRI))
-        continue;
+  TII = ST.getInstrInfo();
 
-      MachineInstr *DefMI = MRI->getVRegDef(Op.getReg());
-      if (!isCandidate(*DefMI))
-        continue;
-
-      if (IgnoreSameBlock && DefMI->getParent() == MI.getParent())
-        continue;
-
-      Worklist.insert(DefMI);
-    }
-  };
-
-  // Do a first pass eagerly rewriting in roughly reverse instruction
-  // order, populate the worklist with any instructions we might need to
-  // revisit.  We avoid adding definitions to the worklist if they're
-  // in the same block - we're about to visit them anyways.
   bool MadeChange = false;
   for (MachineBasicBlock &MBB : MF) {
     // Avoid unreachable blocks as they have degenerate dominance
     if (!MDT->isReachableFromEntry(&MBB))
       continue;
 
-    for (auto &MI : make_range(MBB.rbegin(), MBB.rend())) {
+    // For each instruction that defines a vector, compute what VL its
+    // downstream users demand.
+    for (const auto &MI : reverse(MBB)) {
+      if (!isCandidate(MI))
+        continue;
+      DemandedVLs.insert({&MI, computeDemandedVL(MI)});
+    }
+
+    // Then go through and see if we can reduce the VL of any instructions to
+    // only what's demanded.
+    for (auto &MI : MBB) {
       if (!isCandidate(MI))
         continue;
       if (!tryReduceVL(MI))
         continue;
       MadeChange = true;
-      PushOperands(MI, /*IgnoreSameBlock*/ true);
     }
-  }
 
-  while (!Worklist.empty()) {
-    assert(MadeChange);
-    MachineInstr &MI = *Worklist.pop_back_val();
-    assert(isCandidate(MI));
-    if (!tryReduceVL(MI))
-      continue;
-    PushOperands(MI, /*IgnoreSameBlock*/ false);
+    DemandedVLs.clear();
   }
 
   return MadeChange;
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
index edcd32c4098bca..2684e7c3b139ca 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
@@ -8,8 +8,10 @@ body: |
     ; CHECK-LABEL: name: vop_vi
     ; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vi_incompatible_eew
@@ -18,8 +20,10 @@ body: |
     ; CHECK-LABEL: name: vop_vi_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vi_incompatible_emul
@@ -28,8 +32,10 @@ body: |
     ; CHECK-LABEL: name: vop_vi_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vv
@@ -38,8 +44,10 @@ body: |
     ; CHECK-LABEL: name: vop_vv
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vop_vv_incompatible_eew
@@ -48,8 +56,10 @@ body: |
     ; CHECK-LABEL: name: vop_vv_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 
 ...
 ---
@@ -59,8 +69,10 @@ body: |
     ; CHECK-LABEL: name: vop_vv_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vd
@@ -69,8 +81,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vd
     ; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vd_incompatible_eew
@@ -79,8 +93,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vd_incompatible_eew
     ; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vd_incompatible_emul
@@ -89,8 +105,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vd_incompatible_emul
     ; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 4 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vs2
@@ -99,8 +117,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs2
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs2_incompatible_eew
@@ -109,8 +129,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs2_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs2_incompatible_emul
@@ -119,8 +141,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs2_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVWADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVWADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_vv_vs1
@@ -129,8 +153,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs1
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs1_incompatible_eew
@@ -139,8 +165,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs1_incompatible_eew
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, $noreg, %x, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8m2 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vrm2 = PseudoVWADD_VV_M1 $noreg, $noreg, %x, 1, 4 /* e16 */, 0
+    $v8m2 = COPY %y
 ...
 ---
 name: vwop_vv_vs1_incompatible_emul
@@ -149,8 +177,10 @@ body: |
     ; CHECK-LABEL: name: vwop_vv_vs1_incompatible_emul
     ; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: early-clobber %y:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, %x, 1, 3 /* e8 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, %x, 1, 3 /* e8 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_wv_vd
@@ -159,8 +189,10 @@ body: |
     ; CHECK-LABEL: name: vwop_wv_vd
     ; CHECK: early-clobber %x:vr = PseudoVWADD_WV_MF2 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
     ; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+    ; CHECK-NEXT: $v8 = COPY %y
     %x:vr = PseudoVWADD_WV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
     %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+    $v8 = COPY %y
 ...
 ---
 name: vwop_wv_vd_incompatible_eew
@@ -169,8 +201,10 @@ body: |
     ; CHECK-LABEL: name: vwop_wv_vd_incompatible_eew
     ; ...
[truncated]

@lukel97 lukel97 force-pushed the vloptimizer/demandedVLs branch 2 times, most recently from 2f64994 to 169a45a Compare January 27, 2025 11:13
Comment on lines 4235 to 4236
if (LHS.isImm() && LHS.getImm() == 0)
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to be able to replace the std::optional with CreateImm(0) in computeDemandedVL

%vl:gprnox0 = COPY $x1
%x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
%y:vr = PseudoVNSRL_WV_MF4 $noreg, %x, $noreg, %vl, 4 /* e16 */, 0 /* tu, mu */
$v8 = COPY %y
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're now checking what elements are demanded, any instruction without a use will have a VL of zero demanded, which will end up reducing everything else in the basic block to VL=0. I've added COPYs in the .mir tests to prevent this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these copies are required now, are there? Adding them is a reasonable thing to do, I'm just confirming this change doesn't need them any more for my own understanding. If so, please separate this into it's own review.

/// MI. Returns std::nullopt if it failed to find a suitable VL.
std::optional<MachineOperand> checkUsers(MachineInstr &MI);
MachineOperand getMinimumVLForUser(MachineOperand &UserOp);
/// Computes the VL of \p MI that is actually used by its users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

largest common VL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify what happens if it has no users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer computes the largest common VL, but instead the minimum demanded VL.

E.g. if MI has a user with VL=2, it's still possible that this would return VL=1 if the user's demanded VL was only 1. Despite it's VL operand being higher. I think there's some other comments that need updated in the function now that you mention it.

for (const auto &MI : reverse(MBB)) {
if (!isCandidate(MI))
continue;
DemandedVLs.insert({&MI, computeDemandedVL(MI)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the demanded VL of an earlier instruction always stay the same, even if we will eventually optimize the VL of a later (user) instruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because now we don't reduce the VL of anything until after we've analysed the demanded VLs for everything.

computeDemandedVL will work out the minimum possible VL for everything up front, which previously would have required multiple iterations of tryReduceVL. It's able to propagate the VL without mutating anything because of the change to getMinimumVLForUser that peeks through the user's demandedVL.


// If we know the demanded VL of UserMI, then we can reduce the VL it
// requires.
if (DemandedVLs.contains(&UserMI)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really possible to have a VLOp but not be part of DemandedVLs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same thing originally first, but then found out we have vcpop.m etc. which don't define any vector operands but use vector operands. So in that case we can't reduce the demanded VL since it's not elementwise

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. vcpop.m is not a supported instruction. Thus, it's VL should never be reduced, and it shouldn't have an element in the map (not a candidate). It can be a valid user of a supported instruction, but that seems fine? We'd return the VL of the vcpop.m instruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, ignore my comment. When I wrote it I was thinking about my initial version of this patch when I computed the demandedVL for every instruction that defined a vector, not just candidates, which is why I was seeing vcpop.m etc.

In the latest version of the patch we still need might have candidate instructions that have aborted, i.e. returned std::nullopt in checkUsers, and so don't have an entry.

Copy link
Contributor

@michaelmaitland michaelmaitland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please wait to see if there are any other comments.

const MachineInstr &UserMI = *UserOp.getParent();
const MCInstrDesc &Desc = UserMI.getDesc();

if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
LLVM_DEBUG(dbgs() << " Abort due to lack of VL, assume that"
" use VLMAX\n");
return std::nullopt;
return MachineOperand::CreateImm(RISCV::VLMaxSentinel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this patch, but I wonder if we should avoid calling this function when a user does not have a VL operand. The debug output says that assume it uses VLMAX, but I wonder if it just doesn't depend on VL and could have been excluded from calculating the demanded VL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user doesn't have a VL operand it could be a non-pseudo instruction like a COPY etc, which in that case I think we need to preserve the entire VL? I.e. it would still demand VLMAX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. Imagine we return the DEST of the copy, then we need all lanes of the copy. Please ignore my original comment. It does make me wonder if there is any future work related to the VL optimizer looking through copies though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have a patch that deals with copies of V0 that I hope to post soon.

We currently can't propagate VL through any mask uses because they have to be copied to a physical register first, so my solution was to keep track of all the uses of V0 in a map, similar to how we do it for defs in RISCVVectorPeephole

for (auto &MI : make_range(MBB.rbegin(), MBB.rend())) {
// For each instruction that defines a vector, compute what VL its
// downstream users demand.
for (const auto &MI : reverse(MBB)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reverse can probably be landed as it's an unrelated NFC.

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of high level points here:

  1. This is not NFC - as pointed out by your own test changes.
  2. I am really not a fan of the API change to remove std::optional, and it's not clear why you're doing this? You can have a DenseMap with an optional as a value?
  3. The initial demanded VL is sound, but not precise. As you reduce a transitive user, you can reduce the VL of an instruction which in turn reduces it of that instructions defs. You backward walk achieves this within a basic block, but we loose the cross block case.
  4. Per (3) you do need to invalidate. Alternatively, you could adapt the worklist scheme during the computation.
  5. Given (4) - the worklist variant - I'd probably write this as memo-ization, not upfront computation. I'd guess (not having tried to actual write it), that the code structure would be cleaner.

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 27, 2025

  1. The initial demanded VL is sound, but not precise. As you reduce a transitive user, you can reduce the VL of an instruction which in turn reduces it of that instructions defs. You backward walk achieves this within a basic block, but we loose the cross block case.

It should be possible to analyze all the blocks up front with a reverse post order traversal, which would handle the cross block case.

I'd like to explore that case first since I'm a bit hesitant to go down the invalidation path. I'm worried it might lead to more compile time edge cases like the one in #123878

@lukel97 lukel97 force-pushed the vloptimizer/demandedVLs branch from 38d35f1 to 7fe39a2 Compare January 28, 2025 12:03
@lukel97 lukel97 changed the title [RISCV][VLOPT] Compute demanded VLs up front. NFC [RISCV][VLOPT] Compute demanded VLs up front Jan 28, 2025
Copy link

github-actions bot commented Jan 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 28, 2025

I've undone all the std::optional API changes and stacked this on top of #124734. The analysis should now also be precise across blocks as it's computed globally in post-order traversal, thanks for catching that.

The diff should be much smaller now, I think the initial version made the change look a lot larger than what it is.

@lukel97 lukel97 force-pushed the vloptimizer/demandedVLs branch from 4a84c81 to 689f4e4 Compare January 28, 2025 12:32
return false;
auto CommonVL = std::make_optional(DemandedVLs.at(&MI));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just using CommonVL as an optional that always has a value below? I think we could be using it as a MachineOperand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I changed it back to an optional to remove the -> to . diff below, and was hoping to clean it up in a follow up. If reviewers would prefer I can convert it in this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to clean it up in a follow up.


/// For a given instruction, records what elements of it are demanded by
/// downstream users.
DenseMap<const MachineInstr *, MachineOperand> DemandedVLs;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just have the value be an optional. This simplifies the code, and has no real additional cost. Please stop trying to micro-optimize this. If you want to come back to it, please feel free, but let's do the simple version first.

To be clear, this is not just a stylistic point. I am suspecting from your code structure below that you have a bug in your handling of the std::nullopt case, and want as much of that logic to disappear from this review as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than happy to do so! Hope this didn't come across as an optimisation/stylistic point, that wasn't my intention.


// If we know the demanded VL of UserMI, then we can reduce the VL it
// requires.
if (DemandedVLs.contains(&UserMI)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. vcpop.m is not a supported instruction. Thus, it's VL should never be reduced, and it shouldn't have an element in the map (not a candidate). It can be a valid user of a supported instruction, but that seems fine? We'd return the VL of the vcpop.m instruction?

@lukel97 lukel97 force-pushed the vloptimizer/demandedVLs branch from 689f4e4 to f7ea8d4 Compare January 28, 2025 17:31
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/comments addressed. On the COPY point, if that's unclear, please ask.

%vl:gprnox0 = COPY $x1
%x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */
%y:vr = PseudoVNSRL_WV_MF4 $noreg, %x, $noreg, %vl, 4 /* e16 */, 0 /* tu, mu */
$v8 = COPY %y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these copies are required now, are there? Adding them is a reasonable thing to do, I'm just confirming this change doesn't need them any more for my own understanding. If so, please separate this into it's own review.

@@ -12,15 +12,15 @@

define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
; CHECK: User VL is: 4
; CHECK-NEXT: Abort due to CommonVL == VLOp, no point in reducing.
; CHECK: Abort due to CommonVL == VLOp, no point in reducing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop NEXT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryReduceVL is no longer called immediately after the checkUsers, so the CHECK-NEXT failed

This replaces the worklist by instead computing what VL is demanded by each instruction's users first.

checkUsers essentially already did this, so it's been renamed to computeDemandedVL.

The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp.

This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in llvm#124066.

Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.
@lukel97 lukel97 force-pushed the vloptimizer/demandedVLs branch from f7ea8d4 to 6264a89 Compare January 29, 2025 03:20
@lukel97
Copy link
Contributor Author

lukel97 commented Jan 29, 2025

I don't think these copies are required now, are there?

Not after I added back std::optional in checkUsers, I've removed them now

@lukel97 lukel97 merged commit 8675cd3 into llvm:main Jan 29, 2025
8 checks passed
lukel97 added a commit that referenced this pull request Jan 30, 2025
… demanded (#124066)

The motivation for this to allow reducing the vl when a user is a
ternary pseudo, where the third operand is tied and also acts as a
passthru.

When checking the users of an instruction, we currently bail if the user
is used as a passthru because all of its elements past vl will be used
for the tail.

We can allow passthru users if we know the tail of their result isn't
used, which we will have computed beforehand after #124530

It's worth noting that this is all irrelevant of the tail policy,
because tail agnostic still ends up using the passthru.

I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on
qemu with rvv_ta_all_1s=true)

Fixes #123760
asb added a commit that referenced this pull request Feb 2, 2025
…unction

I was running into failed assertions of `isCandidate(UserMI)` in
`getMinimumVLForUser`, but only occurring with
`-enable-machine-outliner=never`. I believe this is a red herring, and
it just so happens the memory allocation pattern on my machine exposed
the bug with that flag.

DemandedVLs is never cleared, which means it accumulates more
MachineInstr pointer keys over time, and it's possible that when e.g.
running on function 'b', a MachineInstr pointer points to the same
memory location used for a candidate in 'a'. This causes the assertion
to fail.

Comment left on #124530 with more information.
@asb
Copy link
Contributor

asb commented Feb 2, 2025

I've directly committed 52c1162 to fix the fact that DemandedVLs is never cleared. I'll leave details of my reproducer here, but because triggering the failed assertions depends on the precise sequence of memory reuse it may not be portable to other systems (which is why I haven't committed it as a test case).

After some reduction (made a bit dodgy as the crash isn't guaranteed to happen), the following invoked with llc -enable-machine-outliner=never < reduced.ll will usually cause the assertion isCandidate(UserMI) in getMinimumVLForUser to fail. Seemingly, that option happens to change the memory allocation pattern such that a candidate MachineInstr* added to DemandedVLs when iterating over some earlier function aliases with a MachinInstr* when iterating over a later function (reallocations have presumably taken place).

I've directly committed what seems to be the most obvious fix.

; ModuleID = 'out.ll'
source_filename = "pr53645-2.c"
target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

@u = global [2 x <8 x i16>] [<8 x i16> <i16 73, i16 -5, i16 0, i16 174, i16 921, i16 -1, i16 17, i16 178>, <8 x i16> <i16 1, i16 8173, i16 -1, i16 -64, i16 12, i16 29612, i16 128, i16 8912>]
@s = global [2 x <8 x i16>] [<8 x i16> <i16 73, i16 -9123, i16 32761, i16 8191, i16 16371, i16 1201, i16 12701, i16 9999>, <8 x i16> <i16 9903, i16 -1, i16 -7323, i16 0, i16 -7, i16 -323, i16 9124, i16 -9199>]

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024)
define void @ur65656565(ptr noundef writeonly captures(none) initializes((0, 16)) %x, ptr noundef readonly captures(none) %y) #0 {
entry:
  %0 = load <8 x i16>, ptr %y, align 16, !tbaa !9
  %rem = urem <8 x i16> %0, <i16 6, i16 5, i16 6, i16 5, i16 6, i16 5, i16 6, i16 5>
  store <8 x i16> %rem, ptr %x, align 16, !tbaa !9
  ret void
}

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024)
define void @uq77777777(ptr noundef writeonly captures(none) initializes((0, 16)) %x, ptr noundef readonly captures(none) %y) #0 {
entry:
  %0 = load <8 x i16>, ptr %y, align 16, !tbaa !9
  %div = udiv <8 x i16> %0, splat (i16 7)
  store <8 x i16> %div, ptr %x, align 16, !tbaa !9
  ret void
}

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024)
define void @sq77777777(ptr noundef writeonly captures(none) initializes((0, 16)) %x, ptr noundef readonly captures(none) %y) #0 {
entry:
  %0 = load <8 x i16>, ptr %y, align 16, !tbaa !9
  %div = sdiv <8 x i16> %0, splat (i16 7)
  store <8 x i16> %div, ptr %x, align 16, !tbaa !9
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.start.p0(i64 immarg, ptr captures(none)) #1

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: readwrite)
declare void @llvm.lifetime.end.p0(i64 immarg, ptr captures(none)) #1

attributes #0 = { mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable vscale_range(2,1024) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+b,+c,+d,+f,+m,+relax,+supm,+v,+za64rs,+zaamo,+zalrsc,+zawrs,+zba,+zbb,+zbs,+zca,+zcb,+zcmop,+zfa,+zfhmin,+zic64b,+zicbom,+zicbop,+zicboz,+ziccamoa,+ziccif,+zicclsm,+ziccrse,+zicntr,+zicond,+zicsr,+zihintntl,+zihintpause,+zihpm,+zimop,+zkt,+zmmul,+zvbb,+zve32f,+zve32x,+zve64d,+zve64f,+zve64x,+zvfhmin,+zvkb,+zvkt,+zvl128b,+zvl32b,+zvl64b,-e,-experimental-sdext,-experimental-sdtrig,-experimental-smctr,-experimental-ssctr,-experimental-svukte,-experimental-xqcia,-experimental-xqciac,-experimental-xqcicli,-experimental-xqcicm,-experimental-xqcics,-experimental-xqcicsr,-experimental-xqciint,-experimental-xqcilo,-experimental-xqcilsm,-experimental-xqcisls,-experimental-zalasr,-experimental-zicfilp,-experimental-zicfiss,-experimental-zvbc32e,-experimental-zvkgs,-h,-sha,-shcounterenw,-shgatpa,-shtvala,-shvsatpa,-shvstvala,-shvstvecd,-smaia,-smcdeleg,-smcsrind,-smdbltrp,-smepmp,-smmpm,-smnpm,-smrnmi,-smstateen,-ssaia,-ssccfg,-ssccptr,-sscofpmf,-sscounterenw,-sscsrind,-ssdbltrp,-ssnpm,-sspm,-ssqosid,-ssstateen,-ssstrict,-sstc,-sstvala,-sstvecd,-ssu64xl,-svade,-svadu,-svbare,-svinval,-svnapot,-svpbmt,-svvptc,-xcvalu,-xcvbi,-xcvbitmanip,-xcvelw,-xcvmac,-xcvmem,-xcvsimd,-xmipscmove,-xmipslsp,-xsfcease,-xsfvcp,-xsfvfnrclipxfqf,-xsfvfwmaccqqq,-xsfvqmaccdod,-xsfvqmaccqoq,-xsifivecdiscarddlone,-xsifivecflushdlone,-xtheadba,-xtheadbb,-xtheadbs,-xtheadcmo,-xtheadcondmov,-xtheadfmemidx,-xtheadmac,-xtheadmemidx,-xtheadmempair,-xtheadsync,-xtheadvdot,-xventanacondops,-xwchc,-za128rs,-zabha,-zacas,-zama16b,-zbc,-zbkb,-zbkc,-zbkx,-zcd,-zce,-zcf,-zcmp,-zcmt,-zdinx,-zfbfmin,-zfh,-zfinx,-zhinx,-zhinxmin,-zifencei,-zk,-zkn,-zknd,-zkne,-zknh,-zkr,-zks,-zksed,-zksh,-ztso,-zvbc,-zvfbfmin,-zvfbfwma,-zvfh,-zvkg,-zvkn,-zvknc,-zvkned,-zvkng,-zvknha,-zvknhb,-zvks,-zvksc,-zvksed,-zvksg,-zvksh,-zvl1024b,-zvl16384b,-zvl2048b,-zvl256b,-zvl32768b,-zvl4096b,-zvl512b,-zvl65536b,-zvl8192b" }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: readwrite) }

!llvm.module.flags = !{!0, !1, !2, !4, !5, !6, !7}
!llvm.ident = !{!8}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 1, !"target-abi", !"lp64d"}
!2 = !{i32 6, !"riscv-isa", !3}
!3 = !{!"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_b1p0_v1p0_zic64b1p0_zicbom1p0_zicbop1p0_zicboz1p0_ziccamoa1p0_ziccif1p0_zicclsm1p0_ziccrse1p0_zicntr2p0_zicond1p0_zicsr2p0_zihintntl1p0_zihintpause2p0_zihpm2p0_zimop1p0_zmmul1p0_za64rs1p0_zaamo1p0_zalrsc1p0_zawrs1p0_zfa1p0_zfhmin1p0_zca1p0_zcb1p0_zcmop1p0_zba1p0_zbb1p0_zbs1p0_zkt1p0_zvbb1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvfhmin1p0_zvkb1p0_zvkt1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0_supm1p0"}
!4 = !{i32 8, !"PIC Level", i32 2}
!5 = !{i32 7, !"PIE Level", i32 2}
!6 = !{i32 7, !"uwtable", i32 2}
!7 = !{i32 8, !"SmallDataLimit", i32 0}
!8 = !{!"clang version 21.0.0git"}
!9 = !{!10, !10, i64 0}
!10 = !{!"omnipotent char", !11, i64 0}
!11 = !{!"Simple C/C++ TBAA"}

@lukel97
Copy link
Contributor Author

lukel97 commented Feb 3, 2025

I've directly committed 52c1162 to fix the fact that DemandedVLs is never cleared.

Thanks for fixing that, I think we've previously had issues with forgetting to clear structures in passes before

@preames
Copy link
Collaborator

preames commented Feb 3, 2025

I've directly committed 52c1162 to fix the fact that DemandedVLs is never cleared.

Thanks. I really should have caught that in review - oops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants