[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL #123878

michaelmaitland · 2025-01-22T04:46:18Z

This fixes #123862.

llvmbot · 2025-01-22T04:46:51Z

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

This fixes #123862.

Full diff: https://github.com/llvm/llvm-project/pull/123878.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp (+7)
(modified) llvm/test/CodeGen/RISCV/rvv/vl-opt.ll (+11)

diff --git a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
index 54ca8ccd8d9e90..9182e1f751933c 100644
--- a/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp
@@ -1320,6 +1320,13 @@ bool RISCVVLOptimizer::tryReduceVL(MachineInstr &OrigMI) {
     }
 
     if (CommonVL->isImm()) {
+      if (CommonVL->isImm() && VLOp.isImm() &&
+          VLOp.getImm() == CommonVL->getImm()) {
+        LLVM_DEBUG(dbgs() << "  VL is already reduced to" << VLOp << " for "
+                          << MI << "\n");
+        continue;
+      }
+
       LLVM_DEBUG(dbgs() << "  Reduce VL from " << VLOp << " to "
                         << CommonVL->getImm() << " for " << MI << "\n");
       VLOp.ChangeToImmediate(CommonVL->getImm());
diff --git a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
index 1cc30f077feb4a..d6143f69288e66 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vl-opt.ll
@@ -150,3 +150,14 @@ define <vscale x 4 x i32> @dont_optimize_tied_def(<vscale x 4 x i32> %a, <vscale
   ret <vscale x 4 x i32> %2
 }
 
+define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
+; CHECK-LABEL: same_vl_imm:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetivli zero, 4, e32, m2, ta, ma
+; CHECK-NEXT:    vadd.vv v8, v10, v12
+; CHECK-NEXT:    vadd.vv v8, v8, v10
+; CHECK-NEXT:    ret
+  %v = call <vscale x 4 x i32> @llvm.riscv.vadd.nxv4i32.nxv4i32(<vscale x 4 x i32> poison, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b, iXLen 4)
+  %w = call <vscale x 4 x i32> @llvm.riscv.vadd.nxv4i32.nxv4i32(<vscale x 4 x i32> poison, <vscale x 4 x i32> %v, <vscale x 4 x i32> %a, iXLen 4)
+  ret <vscale x 4 x i32> %w
+}

MaheshRavishankar · 2025-01-22T05:14:57Z

Thanks! That seems to have worked.

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll

lukel97

I don't think this is an infinite loop, but a pathologically slow case where the chain of instructions is really long. But pruning the worklist when the CommonVL is the same seems like a sensible way to manage it.

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll

wangpc-pp · 2025-01-22T06:57:32Z

I don't think this is an infinite loop, but a pathologically slow case where the chain of instructions is really long. But pruning the worklist when the CommonVL is the same seems like a sensible way to manage it.

If so, I think we should add a list of handled instructions to remove duplicated instructions in:

llvm-project/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

Lines 1341 to 1355 in ebb27cc

    
           // Now add all inputs to this instruction to the worklist. 
        
           for (auto &Op : MI.operands()) { 
        
             if (!Op.isReg() || !Op.isUse() || !Op.getReg().isVirtual()) 
        
               continue; 
        
             if (!isVectorRegClass(Op.getReg(), MRI)) 
        
               continue; 
        
             MachineInstr *DefMI = MRI->getVRegDef(Op.getReg()); 
        
             if (!isCandidate(*DefMI)) 
        
               continue; 
        
             Worklist.insert(DefMI); 
        
           }

Just like:

llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Lines 15777 to 15799 in ebb27cc

    
           SmallVector<SDNode *> Worklist; 
        
           SmallSet<SDNode *, 8> Inserted; 
        
           Worklist.push_back(N); 
        
           Inserted.insert(N); 
        
           SmallVector<CombineResult> CombinesToApply; 
        
           while (!Worklist.empty()) { 
        
             SDNode *Root = Worklist.pop_back_val(); 
        
             NodeExtensionHelper LHS(Root, 0, DAG, Subtarget); 
        
             NodeExtensionHelper RHS(Root, 1, DAG, Subtarget); 
        
             auto AppendUsersIfNeeded = [&Worklist, &Subtarget, 
        
                                         &Inserted](const NodeExtensionHelper &Op) { 
        
               if (Op.needToPromoteOtherUsers()) { 
        
                 for (SDUse &Use : Op.OrigOperand->uses()) { 
        
                   SDNode *TheUser = Use.getUser(); 
        
                   if (!NodeExtensionHelper::isSupportedRoot(TheUser, Subtarget)) 
        
                     return false; 
        
                   // We only support the first 2 operands of FMA. 
        
                   if (Use.getOperandNo() >= 2) 
        
                     return false; 
        
                   if (Inserted.insert(TheUser).second) 
        
                     Worklist.push_back(TheUser);

Never mind, I didn't notice we are using SetVector here. :-)

topperc · 2025-01-22T07:05:06Z

I don't think this is an infinite loop, but a pathologically slow case where the chain of instructions is really long. But pruning the worklist when the CommonVL is the same seems like a sensible way to manage it.

If so, I think we should add a list of handled instructions to remove duplicated instructions in:

llvm-project/llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

Lines 1341 to 1355 in ebb27cc

// Now add all inputs to this instruction to the worklist.

for (auto &Op : MI.operands()) {

if (!Op.isReg() || !Op.isUse() || !Op.getReg().isVirtual())

continue;

if (!isVectorRegClass(Op.getReg(), MRI))

continue;

MachineInstr *DefMI = MRI->getVRegDef(Op.getReg());

if (!isCandidate(*DefMI))

continue;

Worklist.insert(DefMI);

}

Just like:

llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Lines 15777 to 15799 in ebb27cc

SmallVector<SDNode *> Worklist;

SmallSet<SDNode *, 8> Inserted;

Worklist.push_back(N);

Inserted.insert(N);

SmallVector<CombineResult> CombinesToApply;

while (!Worklist.empty()) {

SDNode *Root = Worklist.pop_back_val();

NodeExtensionHelper LHS(Root, 0, DAG, Subtarget);

NodeExtensionHelper RHS(Root, 1, DAG, Subtarget);

auto AppendUsersIfNeeded = [&Worklist, &Subtarget,

&Inserted](const NodeExtensionHelper &Op) {

if (Op.needToPromoteOtherUsers()) {

for (SDUse &Use : Op.OrigOperand->uses()) {

SDNode *TheUser = Use.getUser();

if (!NodeExtensionHelper::isSupportedRoot(TheUser, Subtarget))

return false;

// We only support the first 2 operands of FMA.

if (Use.getOperandNo() >= 2)

return false;

if (Inserted.insert(TheUser).second)

Worklist.push_back(TheUser);

The problem is that we visit every instruction in the basic block in the outer loop. And we visit every earlier instruction anytime we make a change using a worklist. So the problem is that we process most of the graph using the worklist, but then the outer loop still causes us to revisit everything again. Each time we run through the worklist all over again.

What we should probably do is pre-load the worklist with the entire basic block instead of using the outer loop. The worklist is a SetVector so won't add anything more than once.

lukel97 · 2025-01-22T07:30:09Z

The problem is that we visit every instruction in the basic block in the outer loop. And we visit every earlier instruction anytime we make a change using a worklist. So the problem is that we process most of the graph using the worklist, but then the outer loop still causes us to revisit everything again. Each time we run through the worklist all over again.

What we should probably do is pre-load the worklist with the entire basic block instead of using the outer loop. The worklist is a SetVector so won't add anything more than once.

The really long hang happens during the first call to tryReduce. But adding everything to the worklist up front would still probably help.

Would making the worklist a queue instead of a stack also help?

preames · 2025-01-22T15:36:31Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

@@ -4244,6 +4244,21 @@ bool RISCV::isVLKnownLE(const MachineOperand &LHS, const MachineOperand &RHS) {
  return LHS.getImm() <= RHS.getImm();
 }

+/// Given two VL operands, do we know that LHS < RHS?
+bool RISCV::isVLKnownLT(const MachineOperand &LHS, const MachineOperand &RHS) {
+  if (LHS.isReg() && RHS.isReg() && LHS.getReg().isVirtual() &&


Put an early-exit here which checks for LHS and RHS both being immediates and this simplifies greatly.

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp

This implements a suggestion by Craig in PR llvm#123878. We can move the worklist management out of the per-instruction work and do it once at the end of scanning all the instructions. This should reduce repeat visitation of the same instruction when no changes can be made. Note that this does not remove the inherent O(N^2) in the algorithm. We're still potentially visiiting every user of every def. I also included a guard for unreachable blocks since that had been mentioned as a possible cause. It seems we've rulled that out, but guarding for this case is still a good idea.

preames · 2025-01-22T17:43:13Z

The problem is that we visit every instruction in the basic block in the outer loop. And we visit every earlier instruction anytime we make a change using a worklist. So the problem is that we process most of the graph using the worklist, but then the outer loop still causes us to revisit everything again. Each time we run through the worklist all over again.

What we should probably do is pre-load the worklist with the entire basic block instead of using the outer loop. The worklist is a SetVector so won't add anything more than once.

I implemented this suggestion in #123973. I want to be careful to say I don't see this as an alternative, but as a possible additional fix or defense in depth. I do want to see this change move forward as it will reduce the depth of the recursive (worklist) walk when we're not achieving anything useful.

michaelmaitland · 2025-01-22T17:44:42Z

I appreciate all of the comments here. I agree that we should improve how we manage our worklist. This was something that @preames pointed out as an area for improvement in the original addition of this patch. I'd like to land this patch independently of any improvements to the management of the worklist. I think that this change and a worklist change are beneficial on their own.

Thank you @lukel97 for pointing out that this is not a case of infinite loop, but a case that is pathologically slow. I've added the pathologically slow test case to this PR, since this PR fixes the slowness problem.

preames · 2025-01-22T17:50:25Z

llvm/test/CodeGen/RISCV/rvv/vlopt-slow-case.mir

+# RUN: llc %s -o - -mtriple=riscv64 -mattr=+v -run-pass=riscv-vl-optimizer -verify-machineinstrs | FileCheck %s
+
+---
+name: test


I don't know that adding this is a good idea. This won't show up as a test failure if this ever fails, it will only show up as a long running test. I'm not sure that's likely to get noticed.

I have addressed this concern.

preames · 2025-01-22T18:07:52Z

llvm/test/CodeGen/RISCV/rvv/vlopt-same-vl.ll

+
+; REQUIRES: asserts
+
+define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {


Please add a comment which explains why you're checking for the particular debug output.

Though, honestly, this seems like borderline bad practice. I would be fine with this landing without a dedicated test since it is NFC and "only" improves compile time.

Please let me know what you prefer. This test case is testing observable change.

preames

LGTM

This implements a suggestion by Craig in PR #123878. We can move the worklist management out of the per-instruction work and do it once at the end of scanning all the instructions. This should reduce repeat visitation of the same instruction when no changes can be made. Note that this does not remove the inherent O(N^2) in the algorithm. We're still potentially visiiting every user of every def. I also included a guard for unreachable blocks since that had been mentioned as a possible cause. It seems we've rulled that out, but guarding for this case is still a good idea.

This fixes llvm#123862.

michaelmaitland requested review from preames, MaheshRavishankar and topperc January 22, 2025 04:46

llvmbot added the backend:RISC-V label Jan 22, 2025

michaelmaitland mentioned this pull request Jan 22, 2025

Generation of Object file gets stuck in infinite loop after enabling RISCVVLOptimizer #123862

Closed

MaheshRavishankar mentioned this pull request Jan 22, 2025

Drop the revert of llvm/llvm-project@169c32e iree-org/iree#19761

Closed

lukel97 reviewed Jan 22, 2025

View reviewed changes

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll Outdated Show resolved Hide resolved

lukel97 reviewed Jan 22, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVVLOptimizer.cpp Outdated Show resolved Hide resolved

llvm/test/CodeGen/RISCV/rvv/vl-opt.ll Outdated Show resolved Hide resolved

preames reviewed Jan 22, 2025

View reviewed changes

preames mentioned this pull request Jan 22, 2025

[RISCV][VLOpt] Reorganize visit order and worklist management #123973

Merged

michaelmaitland requested review from preames, lukel97 and wangpc-pp January 22, 2025 17:44

preames reviewed Jan 22, 2025

View reviewed changes

michaelmaitland force-pushed the vlopt-same-vl branch from c0d7aa0 to 39a0cb7 Compare January 22, 2025 17:59

michaelmaitland requested a review from preames January 22, 2025 18:06

preames reviewed Jan 22, 2025

View reviewed changes

michaelmaitland requested a review from preames January 22, 2025 18:13

preames approved these changes Jan 22, 2025

View reviewed changes

michaelmaitland added 2 commits January 22, 2025 10:44

[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL

927f8ed

This fixes llvm#123862.

fixup! generalize to regs

d72d13e

michaelmaitland added 4 commits January 22, 2025 10:44

fixup! add slow test case

faf1893

fixup! respond to review

2a14cd1

fixup! useful test case

296dee6

fixup! add comment to test

b48d918

michaelmaitland force-pushed the vlopt-same-vl branch from 9616eca to b48d918 Compare January 22, 2025 18:45

fixup! rebase

3e45873

michaelmaitland force-pushed the vlopt-same-vl branch from 78385e6 to 3e45873 Compare January 22, 2025 18:47

michaelmaitland merged commit 1687aa2 into llvm:main Jan 22, 2025
5 of 6 checks passed

michaelmaitland deleted the vlopt-same-vl branch January 22, 2025 18:50

lukel97 mentioned this pull request Jan 27, 2025

[RISCV][VLOPT] Compute demanded VLs up front #124530

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL #123878

[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL #123878

michaelmaitland commented Jan 22, 2025

llvmbot commented Jan 22, 2025

MaheshRavishankar commented Jan 22, 2025

lukel97 left a comment

wangpc-pp commented Jan 22, 2025 •

edited

Loading

topperc commented Jan 22, 2025

lukel97 commented Jan 22, 2025

preames Jan 22, 2025

preames commented Jan 22, 2025

michaelmaitland commented Jan 22, 2025

preames Jan 22, 2025

michaelmaitland Jan 22, 2025

preames Jan 22, 2025

preames Jan 22, 2025

michaelmaitland Jan 22, 2025 •

edited

Loading

preames left a comment


		; REQUIRES: asserts

		define <vscale x 4 x i32> @same_vl_imm(<vscale x 4 x i32> %passthru, <vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {

[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL #123878

[RISCV][VLOPT] Don't reduce the VL is the same as CommonVL #123878

Conversation

michaelmaitland commented Jan 22, 2025

llvmbot commented Jan 22, 2025

MaheshRavishankar commented Jan 22, 2025

lukel97 left a comment

Choose a reason for hiding this comment

wangpc-pp commented Jan 22, 2025 • edited Loading

topperc commented Jan 22, 2025

lukel97 commented Jan 22, 2025

preames Jan 22, 2025

Choose a reason for hiding this comment

preames commented Jan 22, 2025

michaelmaitland commented Jan 22, 2025

preames Jan 22, 2025

Choose a reason for hiding this comment

michaelmaitland Jan 22, 2025

Choose a reason for hiding this comment

preames Jan 22, 2025

Choose a reason for hiding this comment

preames Jan 22, 2025

Choose a reason for hiding this comment

michaelmaitland Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

wangpc-pp commented Jan 22, 2025 •

edited

Loading

michaelmaitland Jan 22, 2025 •

edited

Loading