-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV][VLOPT] Compute demanded VLs up front #124530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesThis replaces the worklist by instead computing what VL is demanded by each instruction's users first. checkUsers essentially already did this, so it's been renamed to computeDemandedVL. The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp. This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in #124066. Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands. Patch is 86.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124530.diff 5 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bd02880b0d7129..006490c50be4de 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -4232,6 +4232,8 @@ unsigned RISCV::getDestLog2EEW(const MCInstrDesc &Desc, unsigned Log2SEW) {
/// Given two VL operands, do we know that LHS <= RHS?
bool RISCV::isVLKnownLE(const MachineOperand &LHS, const MachineOperand &RHS) {
+ if (LHS.isImm() && LHS.getImm() == 0)
+ return true;
if (LHS.isReg() && RHS.isReg() && LHS.getReg().isVirtual() &&
LHS.getReg() == RHS.getReg())
return true;
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 976c65e51c2059..c5508fd23c03a1 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -33,6 +33,7 @@ namespace {
class RISCVVLOptimizer : public MachineFunctionPass {
const MachineRegisterInfo *MRI;
const MachineDominatorTree *MDT;
+ const TargetInstrInfo *TII;
public:
static char ID;
@@ -50,12 +51,15 @@ class RISCVVLOptimizer : public MachineFunctionPass {
StringRef getPassName() const override { return PASS_NAME; }
private:
- std::optional<MachineOperand> getMinimumVLForUser(MachineOperand &UserOp);
- /// Returns the largest common VL MachineOperand that may be used to optimize
- /// MI. Returns std::nullopt if it failed to find a suitable VL.
- std::optional<MachineOperand> checkUsers(MachineInstr &MI);
+ MachineOperand getMinimumVLForUser(MachineOperand &UserOp);
+ /// Computes the VL of \p MI that is actually used by its users.
+ MachineOperand computeDemandedVL(const MachineInstr &MI);
bool tryReduceVL(MachineInstr &MI);
bool isCandidate(const MachineInstr &MI) const;
+
+ /// For a given instruction, records what elements of it are demanded by
+ /// downstream users.
+ DenseMap<const MachineInstr *, MachineOperand> DemandedVLs;
};
} // end anonymous namespace
@@ -77,6 +81,15 @@ static bool isVectorRegClass(Register R, const MachineRegisterInfo *MRI) {
return RISCVRI::isVRegClass(RC->TSFlags);
}
+/// Return true if \p MO is a physical or virtual vector register, false
+/// otherwise.
+static bool isVectorOperand(const MachineOperand &MO,
+ const MachineRegisterInfo *MRI) {
+ if (!MO.isReg())
+ return false;
+ return isVectorRegClass(MO.getReg(), MRI);
+}
+
/// Represents the EMUL and EEW of a MachineOperand.
struct OperandInfo {
// Represent as 1,2,4,8, ... and fractional indicator. This is because
@@ -1202,15 +1215,14 @@ bool RISCVVLOptimizer::isCandidate(const MachineInstr &MI) const {
return true;
}
-std::optional<MachineOperand>
-RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
+MachineOperand RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
const MachineInstr &UserMI = *UserOp.getParent();
const MCInstrDesc &Desc = UserMI.getDesc();
if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) {
LLVM_DEBUG(dbgs() << " Abort due to lack of VL, assume that"
" use VLMAX\n");
- return std::nullopt;
+ return MachineOperand::CreateImm(RISCV::VLMaxSentinel);
}
// Instructions like reductions may use a vector register as a scalar
@@ -1226,50 +1238,63 @@ RISCVVLOptimizer::getMinimumVLForUser(MachineOperand &UserOp) {
}
unsigned VLOpNum = RISCVII::getVLOpNum(Desc);
- const MachineOperand &VLOp = UserMI.getOperand(VLOpNum);
+ const MachineOperand VLOp = UserMI.getOperand(VLOpNum);
// Looking for an immediate or a register VL that isn't X0.
assert((!VLOp.isReg() || VLOp.getReg() != RISCV::X0) &&
"Did not expect X0 VL");
+
+ // If we know the demanded VL of UserMI, then we can reduce the VL it
+ // requires.
+ if (DemandedVLs.contains(&UserMI)) {
+ // We can only shrink the demanded VL if the elementwise result doesn't
+ // depend on VL (i.e. not vredsum/viota etc.)
+ // Also conservatively restrict to supported instructions for now.
+ // TODO: Can we remove the isSupportedInstr check?
+ if (!RISCVII::elementsDependOnVL(
+ TII->get(RISCV::getRVVMCOpcode(UserMI.getOpcode())).TSFlags) &&
+ isSupportedInstr(UserMI)) {
+ const MachineOperand &DemandedVL = DemandedVLs.at(&UserMI);
+ if (RISCV::isVLKnownLE(DemandedVL, VLOp))
+ return DemandedVL;
+ }
+ }
+
return VLOp;
}
-std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
- // FIXME: Avoid visiting each user for each time we visit something on the
- // worklist, combined with an extra visit from the outer loop. Restructure
- // along lines of an instcombine style worklist which integrates the outer
- // pass.
- std::optional<MachineOperand> CommonVL;
+MachineOperand RISCVVLOptimizer::computeDemandedVL(const MachineInstr &MI) {
+ const MachineOperand &VLMAX = MachineOperand::CreateImm(RISCV::VLMaxSentinel);
+ MachineOperand DemandedVL = MachineOperand::CreateImm(0);
+
for (auto &UserOp : MRI->use_operands(MI.getOperand(0).getReg())) {
const MachineInstr &UserMI = *UserOp.getParent();
LLVM_DEBUG(dbgs() << " Checking user: " << UserMI << "\n");
if (mayReadPastVL(UserMI)) {
LLVM_DEBUG(dbgs() << " Abort because used by unsafe instruction\n");
- return std::nullopt;
+ return VLMAX;
}
// Tied operands might pass through.
if (UserOp.isTied()) {
LLVM_DEBUG(dbgs() << " Abort because user used as tied operand\n");
- return std::nullopt;
+ return VLMAX;
}
- auto VLOp = getMinimumVLForUser(UserOp);
- if (!VLOp)
- return std::nullopt;
+ const MachineOperand &VLOp = getMinimumVLForUser(UserOp);
// Use the largest VL among all the users. If we cannot determine this
// statically, then we cannot optimize the VL.
- if (!CommonVL || RISCV::isVLKnownLE(*CommonVL, *VLOp)) {
- CommonVL = *VLOp;
- LLVM_DEBUG(dbgs() << " User VL is: " << VLOp << "\n");
- } else if (!RISCV::isVLKnownLE(*VLOp, *CommonVL)) {
+ if (RISCV::isVLKnownLE(DemandedVL, VLOp)) {
+ DemandedVL = VLOp;
+ LLVM_DEBUG(dbgs() << " Demanded VL is: " << VLOp << "\n");
+ } else if (!RISCV::isVLKnownLE(VLOp, DemandedVL)) {
LLVM_DEBUG(dbgs() << " Abort because cannot determine a common VL\n");
- return std::nullopt;
+ return VLMAX;
}
if (!RISCVII::hasSEWOp(UserMI.getDesc().TSFlags)) {
LLVM_DEBUG(dbgs() << " Abort due to lack of SEW operand\n");
- return std::nullopt;
+ return VLMAX;
}
std::optional<OperandInfo> ConsumerInfo = getOperandInfo(UserOp, MRI);
@@ -1279,7 +1304,7 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
LLVM_DEBUG(dbgs() << " Abort due to unknown operand information.\n");
LLVM_DEBUG(dbgs() << " ConsumerInfo is: " << ConsumerInfo << "\n");
LLVM_DEBUG(dbgs() << " ProducerInfo is: " << ProducerInfo << "\n");
- return std::nullopt;
+ return VLMAX;
}
// If the operand is used as a scalar operand, then the EEW must be
@@ -1294,53 +1319,51 @@ std::optional<MachineOperand> RISCVVLOptimizer::checkUsers(MachineInstr &MI) {
<< " Abort due to incompatible information for EMUL or EEW.\n");
LLVM_DEBUG(dbgs() << " ConsumerInfo is: " << ConsumerInfo << "\n");
LLVM_DEBUG(dbgs() << " ProducerInfo is: " << ProducerInfo << "\n");
- return std::nullopt;
+ return VLMAX;
}
}
- return CommonVL;
+ return DemandedVL;
}
bool RISCVVLOptimizer::tryReduceVL(MachineInstr &MI) {
LLVM_DEBUG(dbgs() << "Trying to reduce VL for " << MI << "\n");
- auto CommonVL = checkUsers(MI);
- if (!CommonVL)
- return false;
+ const MachineOperand &CommonVL = DemandedVLs.at(&MI);
- assert((CommonVL->isImm() || CommonVL->getReg().isVirtual()) &&
+ assert((CommonVL.isImm() || CommonVL.getReg().isVirtual()) &&
"Expected VL to be an Imm or virtual Reg");
unsigned VLOpNum = RISCVII::getVLOpNum(MI.getDesc());
MachineOperand &VLOp = MI.getOperand(VLOpNum);
- if (!RISCV::isVLKnownLE(*CommonVL, VLOp)) {
- LLVM_DEBUG(dbgs() << " Abort due to CommonVL not <= VLOp.\n");
+ if (!RISCV::isVLKnownLE(CommonVL, VLOp)) {
+ LLVM_DEBUG(dbgs() << " Abort due to DemandedVL not <= VLOp.\n");
return false;
}
- if (CommonVL->isIdenticalTo(VLOp)) {
+ if (CommonVL.isIdenticalTo(VLOp)) {
LLVM_DEBUG(
- dbgs() << " Abort due to CommonVL == VLOp, no point in reducing.\n");
+ dbgs()
+ << " Abort due to DemandedVL == VLOp, no point in reducing.\n");
return false;
}
- if (CommonVL->isImm()) {
+ if (CommonVL.isImm()) {
LLVM_DEBUG(dbgs() << " Reduce VL from " << VLOp << " to "
- << CommonVL->getImm() << " for " << MI << "\n");
- VLOp.ChangeToImmediate(CommonVL->getImm());
+ << CommonVL.getImm() << " for " << MI << "\n");
+ VLOp.ChangeToImmediate(CommonVL.getImm());
return true;
}
- const MachineInstr *VLMI = MRI->getVRegDef(CommonVL->getReg());
+ const MachineInstr *VLMI = MRI->getVRegDef(CommonVL.getReg());
if (!MDT->dominates(VLMI, &MI))
return false;
- LLVM_DEBUG(
- dbgs() << " Reduce VL from " << VLOp << " to "
- << printReg(CommonVL->getReg(), MRI->getTargetRegisterInfo())
- << " for " << MI << "\n");
+ LLVM_DEBUG(dbgs() << " Reduce VL from " << VLOp << " to "
+ << printReg(CommonVL.getReg(), MRI->getTargetRegisterInfo())
+ << " for " << MI << "\n");
// All our checks passed. We can reduce VL.
- VLOp.ChangeToRegister(CommonVL->getReg(), false);
+ VLOp.ChangeToRegister(CommonVL.getReg(), false);
return true;
}
@@ -1355,52 +1378,33 @@ bool RISCVVLOptimizer::runOnMachineFunction(MachineFunction &MF) {
if (!ST.hasVInstructions())
return false;
- SetVector<MachineInstr *> Worklist;
- auto PushOperands = [this, &Worklist](MachineInstr &MI,
- bool IgnoreSameBlock) {
- for (auto &Op : MI.operands()) {
- if (!Op.isReg() || !Op.isUse() || !Op.getReg().isVirtual() ||
- !isVectorRegClass(Op.getReg(), MRI))
- continue;
+ TII = ST.getInstrInfo();
- MachineInstr *DefMI = MRI->getVRegDef(Op.getReg());
- if (!isCandidate(*DefMI))
- continue;
-
- if (IgnoreSameBlock && DefMI->getParent() == MI.getParent())
- continue;
-
- Worklist.insert(DefMI);
- }
- };
-
- // Do a first pass eagerly rewriting in roughly reverse instruction
- // order, populate the worklist with any instructions we might need to
- // revisit. We avoid adding definitions to the worklist if they're
- // in the same block - we're about to visit them anyways.
bool MadeChange = false;
for (MachineBasicBlock &MBB : MF) {
// Avoid unreachable blocks as they have degenerate dominance
if (!MDT->isReachableFromEntry(&MBB))
continue;
- for (auto &MI : make_range(MBB.rbegin(), MBB.rend())) {
+ // For each instruction that defines a vector, compute what VL its
+ // downstream users demand.
+ for (const auto &MI : reverse(MBB)) {
+ if (!isCandidate(MI))
+ continue;
+ DemandedVLs.insert({&MI, computeDemandedVL(MI)});
+ }
+
+ // Then go through and see if we can reduce the VL of any instructions to
+ // only what's demanded.
+ for (auto &MI : MBB) {
if (!isCandidate(MI))
continue;
if (!tryReduceVL(MI))
continue;
MadeChange = true;
- PushOperands(MI, /*IgnoreSameBlock*/ true);
}
- }
- while (!Worklist.empty()) {
- assert(MadeChange);
- MachineInstr &MI = *Worklist.pop_back_val();
- assert(isCandidate(MI));
- if (!tryReduceVL(MI))
- continue;
- PushOperands(MI, /*IgnoreSameBlock*/ false);
+ DemandedVLs.clear();
}
return MadeChange;
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
index edcd32c4098bca..2684e7c3b139ca 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt-op-info.mir
@@ -8,8 +8,10 @@ body: |
; CHECK-LABEL: name: vop_vi
; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, 1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vop_vi_incompatible_eew
@@ -18,8 +20,10 @@ body: |
; CHECK-LABEL: name: vop_vi_incompatible_eew
; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+ $v8 = COPY %y
...
---
name: vop_vi_incompatible_emul
@@ -28,8 +32,10 @@ body: |
; CHECK-LABEL: name: vop_vi_incompatible_emul
; CHECK: %x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VI_M1 $noreg, $noreg, 9, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vop_vv
@@ -38,8 +44,10 @@ body: |
; CHECK-LABEL: name: vop_vv
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vop_vv_incompatible_eew
@@ -48,8 +56,10 @@ body: |
; CHECK-LABEL: name: vop_vv_incompatible_eew
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+ $v8 = COPY %y
...
---
@@ -59,8 +69,10 @@ body: |
; CHECK-LABEL: name: vop_vv_incompatible_emul
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vwop_vv_vd
@@ -69,8 +81,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vd
; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+ $v8 = COPY %y
...
---
name: vwop_vv_vd_incompatible_eew
@@ -79,8 +93,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vd_incompatible_eew
; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vwop_vv_vd_incompatible_emul
@@ -89,8 +105,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vd_incompatible_emul
; CHECK: early-clobber %x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_MF2 $noreg, %x, $noreg, 1, 4 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vwop_vv_vs2
@@ -99,8 +117,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vs2
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8m2 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8m2 = COPY %y
...
---
name: vwop_vv_vs2_incompatible_eew
@@ -109,8 +129,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vs2_incompatible_eew
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8m2 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+ $v8m2 = COPY %y
...
---
name: vwop_vv_vs2_incompatible_emul
@@ -119,8 +141,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vs2_incompatible_emul
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vr = PseudoVWADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVWADD_VV_MF2 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vwop_vv_vs1
@@ -129,8 +153,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vs1
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8m2 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vrm2 = PseudoVWADD_VV_M1 $noreg, %x, $noreg, 1, 3 /* e8 */, 0
+ $v8m2 = COPY %y
...
---
name: vwop_vv_vs1_incompatible_eew
@@ -139,8 +165,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vs1_incompatible_eew
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vrm2 = PseudoVWADD_VV_M1 $noreg, $noreg, %x, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8m2 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vrm2 = PseudoVWADD_VV_M1 $noreg, $noreg, %x, 1, 4 /* e16 */, 0
+ $v8m2 = COPY %y
...
---
name: vwop_vv_vs1_incompatible_emul
@@ -149,8 +177,10 @@ body: |
; CHECK-LABEL: name: vwop_vv_vs1_incompatible_emul
; CHECK: %x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: early-clobber %y:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, %x, 1, 3 /* e8 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVADD_VV_M1 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVWADD_VV_MF2 $noreg, $noreg, %x, 1, 3 /* e8 */, 0
+ $v8 = COPY %y
...
---
name: vwop_wv_vd
@@ -159,8 +189,10 @@ body: |
; CHECK-LABEL: name: vwop_wv_vd
; CHECK: early-clobber %x:vr = PseudoVWADD_WV_MF2 $noreg, $noreg, $noreg, 1, 3 /* e8 */, 0 /* tu, mu */
; CHECK-NEXT: %y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0 /* tu, mu */
+ ; CHECK-NEXT: $v8 = COPY %y
%x:vr = PseudoVWADD_WV_MF2 $noreg, $noreg, $noreg, -1, 3 /* e8 */, 0
%y:vr = PseudoVADD_VV_M1 $noreg, %x, $noreg, 1, 4 /* e16 */, 0
+ $v8 = COPY %y
...
---
name: vwop_wv_vd_incompatible_eew
@@ -169,8 +201,10 @@ body: |
; CHECK-LABEL: name: vwop_wv_vd_incompatible_eew
; ...
[truncated]
|
2f64994
to
169a45a
Compare
if (LHS.isImm() && LHS.getImm() == 0) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to be able to replace the std::optional with CreateImm(0) in computeDemandedVL
%vl:gprnox0 = COPY $x1 | ||
%x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */ | ||
%y:vr = PseudoVNSRL_WV_MF4 $noreg, %x, $noreg, %vl, 4 /* e16 */, 0 /* tu, mu */ | ||
$v8 = COPY %y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're now checking what elements are demanded, any instruction without a use will have a VL of zero demanded, which will end up reducing everything else in the basic block to VL=0. I've added COPYs in the .mir tests to prevent this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these copies are required now, are there? Adding them is a reasonable thing to do, I'm just confirming this change doesn't need them any more for my own understanding. If so, please separate this into it's own review.
/// MI. Returns std::nullopt if it failed to find a suitable VL. | ||
std::optional<MachineOperand> checkUsers(MachineInstr &MI); | ||
MachineOperand getMinimumVLForUser(MachineOperand &UserOp); | ||
/// Computes the VL of \p MI that is actually used by its users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
largest common VL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify what happens if it has no users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer computes the largest common VL, but instead the minimum demanded VL.
E.g. if MI has a user with VL=2, it's still possible that this would return VL=1 if the user's demanded VL was only 1. Despite it's VL operand being higher. I think there's some other comments that need updated in the function now that you mention it.
for (const auto &MI : reverse(MBB)) { | ||
if (!isCandidate(MI)) | ||
continue; | ||
DemandedVLs.insert({&MI, computeDemandedVL(MI)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the demanded VL of an earlier instruction always stay the same, even if we will eventually optimize the VL of a later (user) instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because now we don't reduce the VL of anything until after we've analysed the demanded VLs for everything.
computeDemandedVL will work out the minimum possible VL for everything up front, which previously would have required multiple iterations of tryReduceVL. It's able to propagate the VL without mutating anything because of the change to getMinimumVLForUser that peeks through the user's demandedVL.
|
||
// If we know the demanded VL of UserMI, then we can reduce the VL it | ||
// requires. | ||
if (DemandedVLs.contains(&UserMI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really possible to have a VLOp but not be part of DemandedVLs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same thing originally first, but then found out we have vcpop.m etc. which don't define any vector operands but use vector operands. So in that case we can't reduce the demanded VL since it's not elementwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. vcpop.m is not a supported instruction. Thus, it's VL should never be reduced, and it shouldn't have an element in the map (not a candidate). It can be a valid user of a supported instruction, but that seems fine? We'd return the VL of the vcpop.m instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, ignore my comment. When I wrote it I was thinking about my initial version of this patch when I computed the demandedVL for every instruction that defined a vector, not just candidates, which is why I was seeing vcpop.m etc.
In the latest version of the patch we still need might have candidate instructions that have aborted, i.e. returned std::nullopt in checkUsers, and so don't have an entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please wait to see if there are any other comments.
const MachineInstr &UserMI = *UserOp.getParent(); | ||
const MCInstrDesc &Desc = UserMI.getDesc(); | ||
|
||
if (!RISCVII::hasVLOp(Desc.TSFlags) || !RISCVII::hasSEWOp(Desc.TSFlags)) { | ||
LLVM_DEBUG(dbgs() << " Abort due to lack of VL, assume that" | ||
" use VLMAX\n"); | ||
return std::nullopt; | ||
return MachineOperand::CreateImm(RISCV::VLMaxSentinel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this patch, but I wonder if we should avoid calling this function when a user does not have a VL operand. The debug output says that assume it uses VLMAX, but I wonder if it just doesn't depend on VL and could have been excluded from calculating the demanded VL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user doesn't have a VL operand it could be a non-pseudo instruction like a COPY etc, which in that case I think we need to preserve the entire VL? I.e. it would still demand VLMAX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. Imagine we return the DEST of the copy, then we need all lanes of the copy. Please ignore my original comment. It does make me wonder if there is any future work related to the VL optimizer looking through copies though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have a patch that deals with copies of V0 that I hope to post soon.
We currently can't propagate VL through any mask uses because they have to be copied to a physical register first, so my solution was to keep track of all the uses of V0 in a map, similar to how we do it for defs in RISCVVectorPeephole
for (auto &MI : make_range(MBB.rbegin(), MBB.rend())) { | ||
// For each instruction that defines a vector, compute what VL its | ||
// downstream users demand. | ||
for (const auto &MI : reverse(MBB)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reverse can probably be landed as it's an unrelated NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of high level points here:
- This is not NFC - as pointed out by your own test changes.
- I am really not a fan of the API change to remove std::optional, and it's not clear why you're doing this? You can have a DenseMap with an optional as a value?
- The initial demanded VL is sound, but not precise. As you reduce a transitive user, you can reduce the VL of an instruction which in turn reduces it of that instructions defs. You backward walk achieves this within a basic block, but we loose the cross block case.
- Per (3) you do need to invalidate. Alternatively, you could adapt the worklist scheme during the computation.
- Given (4) - the worklist variant - I'd probably write this as memo-ization, not upfront computation. I'd guess (not having tried to actual write it), that the code structure would be cleaner.
It should be possible to analyze all the blocks up front with a reverse post order traversal, which would handle the cross block case. I'd like to explore that case first since I'm a bit hesitant to go down the invalidation path. I'm worried it might lead to more compile time edge cases like the one in #123878 |
38d35f1
to
7fe39a2
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
I've undone all the std::optional API changes and stacked this on top of #124734. The analysis should now also be precise across blocks as it's computed globally in post-order traversal, thanks for catching that. The diff should be much smaller now, I think the initial version made the change look a lot larger than what it is. |
4a84c81
to
689f4e4
Compare
return false; | ||
auto CommonVL = std::make_optional(DemandedVLs.at(&MI)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we just using CommonVL as an optional that always has a value below? I think we could be using it as a MachineOperand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I changed it back to an optional to remove the ->
to .
diff below, and was hoping to clean it up in a follow up. If reviewers would prefer I can convert it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to clean it up in a follow up.
|
||
/// For a given instruction, records what elements of it are demanded by | ||
/// downstream users. | ||
DenseMap<const MachineInstr *, MachineOperand> DemandedVLs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just have the value be an optional. This simplifies the code, and has no real additional cost. Please stop trying to micro-optimize this. If you want to come back to it, please feel free, but let's do the simple version first.
To be clear, this is not just a stylistic point. I am suspecting from your code structure below that you have a bug in your handling of the std::nullopt case, and want as much of that logic to disappear from this review as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than happy to do so! Hope this didn't come across as an optimisation/stylistic point, that wasn't my intention.
|
||
// If we know the demanded VL of UserMI, then we can reduce the VL it | ||
// requires. | ||
if (DemandedVLs.contains(&UserMI)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. vcpop.m is not a supported instruction. Thus, it's VL should never be reduced, and it shouldn't have an element in the map (not a candidate). It can be a valid user of a supported instruction, but that seems fine? We'd return the VL of the vcpop.m instruction?
689f4e4
to
f7ea8d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w/comments addressed. On the COPY point, if that's unclear, please ask.
%vl:gprnox0 = COPY $x1 | ||
%x:vr = PseudoVADD_VV_MF4 $noreg, $noreg, $noreg, -1, 4 /* e16 */, 0 /* tu, mu */ | ||
%y:vr = PseudoVNSRL_WV_MF4 $noreg, %x, $noreg, %vl, 4 /* e16 */, 0 /* tu, mu */ | ||
$v8 = COPY %y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these copies are required now, are there? Adding them is a reasonable thing to do, I'm just confirming this change doesn't need them any more for my own understanding. If so, please separate this into it's own review.
@@ -12,15 +12,15 @@ | |||
|
|||
define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) { | |||
; CHECK: User VL is: 4 | |||
; CHECK-NEXT: Abort due to CommonVL == VLOp, no point in reducing. | |||
; CHECK: Abort due to CommonVL == VLOp, no point in reducing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why drop NEXT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tryReduceVL is no longer called immediately after the checkUsers, so the CHECK-NEXT failed
This replaces the worklist by instead computing what VL is demanded by each instruction's users first. checkUsers essentially already did this, so it's been renamed to computeDemandedVL. The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp. This means the pass should now be in linear complexity, and allows us to relax the restriction on tied operands in more easily as in llvm#124066. Note that in order to avoid std::optional inside the DenseMap, I've removed the std::optionals and replaced them with VLMAX or 0 constant operands.
…onVL -> DemandedVL
…for unreachable blocks.
…emandedVL must be a candidate
f7ea8d4
to
6264a89
Compare
Not after I added back std::optional in checkUsers, I've removed them now |
… demanded (#124066) The motivation for this to allow reducing the vl when a user is a ternary pseudo, where the third operand is tied and also acts as a passthru. When checking the users of an instruction, we currently bail if the user is used as a passthru because all of its elements past vl will be used for the tail. We can allow passthru users if we know the tail of their result isn't used, which we will have computed beforehand after #124530 It's worth noting that this is all irrelevant of the tail policy, because tail agnostic still ends up using the passthru. I've checked that SPEC CPU 2017 + llvm-test-suite pass with this (on qemu with rvv_ta_all_1s=true) Fixes #123760
…unction I was running into failed assertions of `isCandidate(UserMI)` in `getMinimumVLForUser`, but only occurring with `-enable-machine-outliner=never`. I believe this is a red herring, and it just so happens the memory allocation pattern on my machine exposed the bug with that flag. DemandedVLs is never cleared, which means it accumulates more MachineInstr pointer keys over time, and it's possible that when e.g. running on function 'b', a MachineInstr pointer points to the same memory location used for a candidate in 'a'. This causes the assertion to fail. Comment left on #124530 with more information.
I've directly committed 52c1162 to fix the fact that After some reduction (made a bit dodgy as the crash isn't guaranteed to happen), the following invoked with I've directly committed what seems to be the most obvious fix.
|
Thanks for fixing that, I think we've previously had issues with forgetting to clear structures in passes before |
Thanks. I really should have caught that in review - oops. |
This replaces the worklist by instead computing what VL is demanded by each instruction's users first, which is done via checkUsers.
The demanded VLs are stored in a DenseMap, and then we can just do a single forward pass of tryReduceVL where we check if a candidate's demanded VL is less than its VLOp.
This means the pass should now be linear in complexity, and allows us to relax the restriction on tied operands in more easily as in #124066.
Stacked on #124734