-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL #123878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Michael Maitland (michaelmaitland) ChangesThis fixes #123862. Full diff: https://github.com/llvm/llvm-project/pull/123878.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 54ca8ccd8d9e90..9182e1f751933c 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -1320,6 +1320,13 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &OrigMI) {
}
if (CommonVL->isImm()) {
+ if (CommonVL->isImm() && VLOp.isImm() &&
+ VLOp.getImm() == CommonVL->getImm()) {
+ LLVM_DEBUG(dbgs() << " VL is already reduced to" << VLOp << " for "
+ << MI << "\n");
+ continue;
+ }
+
LLVM_DEBUG(dbgs() << " Reduce VL from " << VLOp << " to "
<< CommonVL->getImm() << " for " << MI << "\n");
VLOp.ChangeToImmediate(CommonVL->getImm());
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
index 1cc30f077feb4a..d6143f69288e66 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
@@ -150,3 +150,14 @@ define <vscale x 4 x i32> @dont_optimize_tied_def(<vscale x 4 x i32> %a, <vscale
ret <vscale x 4 x i32> %2
}
+define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
+; CHECK-LABEL: same_vl_imm:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetivli zero, 4, e32, m2, ta, ma
+; CHECK-NEXT: vadd.vv v8, v10, v12
+; CHECK-NEXT: vadd.vv v8, v8, v10
+; CHECK-NEXT: ret
+ %v = call <vscale x 4 x i32> @llvm.riscv.vadd.nxv4i32.nxv4i32(<vscale x 4 x i32> poison, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b, iXLen 4)
+ %w = call <vscale x 4 x i32> @llvm.riscv.vadd.nxv4i32.nxv4i32(<vscale x 4 x i32> poison, <vscale x 4 x i32> %v, <vscale x 4 x i32> %a, iXLen 4)
+ ret <vscale x 4 x i32> %w
+}
|
Thanks! That seems to have worked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an infinite loop, but a pathologically slow case where the chain of instructions is really long. But pruning the worklist when the CommonVL is the same seems like a sensible way to manage it.
If so, I think we should add a list of handled instructions to remove duplicated instructions in: llvm-project/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Lines 1341 to 1355 in ebb27cc
Just like: llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp Lines 15777 to 15799 in ebb27cc
Never mind, I didn't notice we are using |
The problem is that we visit every instruction in the basic block in the outer loop. And we visit every earlier instruction anytime we make a change using a worklist. So the problem is that we process most of the graph using the worklist, but then the outer loop still causes us to revisit everything again. Each time we run through the worklist all over again. What we should probably do is pre-load the worklist with the entire basic block instead of using the outer loop. The worklist is a SetVector so won't add anything more than once. |
The really long hang happens during the first call to Would making the worklist a queue instead of a stack also help? |
@@ -4244,6 +4244,21 @@ bool RISCV::isVLKnownLE(const MachineOperand &LHS, const MachineOperand &RHS) { | |||
return LHS.getImm() <= RHS.getImm(); | |||
} | |||
|
|||
/// Given two VL operands, do we know that LHS < RHS? | |||
bool RISCV::isVLKnownLT(const MachineOperand &LHS, const MachineOperand &RHS) { | |||
if (LHS.isReg() && RHS.isReg() && LHS.getReg().isVirtual() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put an early-exit here which checks for LHS and RHS both being immediates and this simplifies greatly.
This implements a suggestion by Craig in PR llvm#123878. We can move the worklist management out of the per-instruction work and do it once at the end of scanning all the instructions. This should reduce repeat visitation of the same instruction when no changes can be made. Note that this does not remove the inherent O(N^2) in the algorithm. We're still potentially visiiting every user of every def. I also included a guard for unreachable blocks since that had been mentioned as a possible cause. It seems we've rulled that out, but guarding for this case is still a good idea.
I implemented this suggestion in #123973. I want to be careful to say I don't see this as an alternative, but as a possible additional fix or defense in depth. I do want to see this change move forward as it will reduce the depth of the recursive (worklist) walk when we're not achieving anything useful. |
I appreciate all of the comments here. I agree that we should improve how we manage our worklist. This was something that @preames pointed out as an area for improvement in the original addition of this patch. I'd like to land this patch independently of any improvements to the management of the worklist. I think that this change and a worklist change are beneficial on their own. Thank you @lukel97 for pointing out that this is not a case of infinite loop, but a case that is pathologically slow. I've added the pathologically slow test case to this PR, since this PR fixes the slowness problem. |
# RUN: llc %s -o - -mtriple=riscv64 -mattr=+v -run-pass=riscv-vl-optimizer -verify-machineinstrs | FileCheck %s | ||
|
||
--- | ||
name: test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that adding this is a good idea. This won't show up as a test failure if this ever fails, it will only show up as a long running test. I'm not sure that's likely to get noticed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed this concern.
c0d7aa0
to
39a0cb7
Compare
|
||
; REQUIRES: asserts | ||
|
||
define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment which explains why you're checking for the particular debug output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, honestly, this seems like borderline bad practice. I would be fine with this landing without a dedicated test since it is NFC and "only" improves compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me know what you prefer. This test case is testing observable change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This implements a suggestion by Craig in PR #123878. We can move the worklist management out of the per-instruction work and do it once at the end of scanning all the instructions. This should reduce repeat visitation of the same instruction when no changes can be made. Note that this does not remove the inherent O(N^2) in the algorithm. We're still potentially visiiting every user of every def. I also included a guard for unreachable blocks since that had been mentioned as a possible cause. It seems we've rulled that out, but guarding for this case is still a good idea.
9616eca
to
b48d918
Compare
78385e6
to
3e45873
Compare
This fixes #123862.