Skip to content

[LV] Don't skip instrs with side-effects in reg pressure computation. #126415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 19, 2025

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Feb 9, 2025

calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users.

This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects.

@llvmbot
Copy link
Member

llvmbot commented Feb 9, 2025

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-powerpc

Author: Florian Hahn (fhahn)

Changes

calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users.

This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects.


Full diff: https://github.com/llvm/llvm-project/pull/126415.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/LoongArch/reg-usage.ll (+2-1)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+6-4)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index dacee6445072aab..5f31a186cfd6e7c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5253,7 +5253,7 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) {
       OpenIntervals.erase(ToRemove);
 
     // Ignore instructions that are never used within the loop.
-    if (!Ends.count(I))
+    if (!Ends.count(I) && !I->mayHaveSideEffects())
       continue;
 
     // Skip ignored values.
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll
index 8239d32445c1054..961ef64acd4a46c 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll
@@ -14,8 +14,9 @@
 define void @get_invariant_reg_usage(ptr %z) {
 ; CHECK: LV: Checking a loop in 'get_invariant_reg_usage'
 ; CHECK: LV(REG): VF = vscale x 16
-; CHECK-NEXT: LV(REG): Found max usage: 1 item
+; CHECK-NEXT: LV(REG): Found max usage: 2 item
 ; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 3 registers
+; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 1 registers 
 ; CHECK-NEXT: LV(REG): Found invariant usage: 2 item
 ; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
 ; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 8 registers 
diff --git a/llvm/test/Transforms/LoopVectorize/LoongArch/reg-usage.ll b/llvm/test/Transforms/LoopVectorize/LoongArch/reg-usage.ll
index f45a2f0f5b7e8f3..021ef0d543a18ef 100644
--- a/llvm/test/Transforms/LoopVectorize/LoongArch/reg-usage.ll
+++ b/llvm/test/Transforms/LoopVectorize/LoongArch/reg-usage.ll
@@ -15,8 +15,9 @@ define void @bar(ptr %A, i32 signext %n) {
 ; CHECK-SCALAR-NEXT: LV(REG): RegisterClass: LoongArch::GPRRC, 1 registers
 ; CHECK-SCALAR-NEXT: LV: The target has 30 registers of LoongArch::GPRRC register class
 ; CHECK-SCALAR-NEXT: LV: The target has 32 registers of LoongArch::FPRRC register class
-; CHECK-VECTOR:      LV(REG): Found max usage: 1 item
+; CHECK-VECTOR:      LV(REG): Found max usage: 2 item
 ; CHECK-VECTOR-NEXT: LV(REG): RegisterClass: LoongArch::VRRC, 3 registers
+; CHECK-VECTOR-NEXT: LV(REG): RegisterClass: LoongArch::GPRRC, 1 registers
 ; CHECK-VECTOR-NEXT: LV(REG): Found invariant usage: 1 item
 ; CHECK-VECTOR-NEXT: LV(REG): RegisterClass: LoongArch::GPRRC, 1 registers
 ; CHECK-VECTOR-NEXT: LV: The target has 32 registers of LoongArch::VRRC register class
diff --git a/llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll b/llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll
index aac9aff3d391f5d..db4b580a3967718 100644
--- a/llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll
+++ b/llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll
@@ -179,7 +179,7 @@ define void @double_(ptr nocapture %A, i32 %n) nounwind uwtable ssp {
 
 ;CHECK-PWR9: LV(REG): VF = 1
 ;CHECK-PWR9: LV(REG): Found max usage: 2 item
-;CHECK-PWR9-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
+;CHECK-PWR9-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 3 registers
 ;CHECK-PWR9-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 5 registers
 ;CHECK-PWR9: LV(REG): Found invariant usage: 1 item
 ;CHECK-PWR9-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 1 registers
@@ -248,7 +248,7 @@ define void @fp16_(ptr nocapture readonly %pIn, ptr nocapture %pOut, i32 %numRow
 ;CHECK-LABEL: fp16_
 ;CHECK: LV(REG): VF = 1
 ;CHECK: LV(REG): Found max usage: 2 item
-;CHECK: LV(REG): RegisterClass: PPC::GPRRC, 4 registers
+;CHECK: LV(REG): RegisterClass: PPC::GPRRC, 5 registers
 ;CHECK: LV(REG): RegisterClass: PPC::VSXRC, 2 registers
 entry:
   %tmp.0.extract.trunc = trunc i32 %scale.coerce to i16
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index f630f4f21e065ff..4e563c242ee848c 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -128,8 +128,9 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV(REG): At #5 Interval # 3
 ; CHECK-NEXT:  LV(REG): At #6 Interval # 3
 ; CHECK-NEXT:  LV(REG): At #7 Interval # 3
-; CHECK-NEXT:  LV(REG): At #9 Interval # 1
-; CHECK-NEXT:  LV(REG): At #10 Interval # 2
+; CHECK-NEXT:  LV(REG): At #8 Interval # 3
+; CHECK-NEXT:  LV(REG): At #9 Interval # 2
+; CHECK-NEXT:  LV(REG): At #10 Interval # 3
 ; CHECK-NEXT:  LV(REG): VF = vscale x 4
 ; CHECK-NEXT:  LV(REG): Found max usage: 2 item
 ; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
@@ -377,8 +378,9 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV(REG): At #5 Interval # 3
 ; CHECK-NEXT:  LV(REG): At #6 Interval # 3
 ; CHECK-NEXT:  LV(REG): At #7 Interval # 3
-; CHECK-NEXT:  LV(REG): At #9 Interval # 1
-; CHECK-NEXT:  LV(REG): At #10 Interval # 2
+; CHECK-NEXT:  LV(REG): At #8 Interval # 3
+; CHECK-NEXT:  LV(REG): At #9 Interval # 2
+; CHECK-NEXT:  LV(REG): At #10 Interval # 3
 ; CHECK-NEXT:  LV(REG): VF = vscale x 4
 ; CHECK-NEXT:  LV(REG): Found max usage: 2 item
 ; CHECK-NEXT:  LV(REG): RegisterClass: RISCV::GPRRC, 3 registers

Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense, LGTM.

@@ -5253,7 +5253,7 @@ LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) {
OpenIntervals.erase(ToRemove);

// Ignore instructions that are never used within the loop.
if (!Ends.count(I))
if (!Ends.count(I) && !I->mayHaveSideEffects())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this may also include calls to intrinsics or functions that return void?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any instruction that is not explicitly used and has side-effects. So here we don't skip intrinsics any more or calls that return void.

Instructions that should not count as 'real' users, e.g. llvm.assume() calls are skipped below (via ValuesToIgnore).

calculateRegisterUsage adds end points for each user of an instruction to
Ends and ignores instructions not added to it, i.e. instructions with no
users.

This means things like stores aren't included, which in turn means
values that are only used in stores are also not included for
consideration. This means we underestimate the register usage in cases
where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in
Ends) if they have side-effects.
@fhahn fhahn force-pushed the lv-reg-usage-sideeffects branch from 40dd9b2 to c214984 Compare March 19, 2025 12:20
@fhahn fhahn merged commit 11b8699 into llvm:main Mar 19, 2025
9 of 10 checks passed
@fhahn fhahn deleted the lv-reg-usage-sideeffects branch March 19, 2025 15:13
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 19, 2025
…omputation. (#126415)

calculateRegisterUsage adds end points for each user of an instruction
to Ends and ignores instructions not added to it, i.e. instructions with
no users.

This means things like stores aren't included, which in turn means
values that are only used in stores are also not included for
consideration. This means we underestimate the register usage in cases
where the only users are things like stores.

Update the code to don't skip instructions without users (i.e. not in
Ends) if they have side-effects.

PR: llvm/llvm-project#126415
fhahn added a commit that referenced this pull request Apr 8, 2025
Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.

There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.

There are the following changes:

 * Scalar usage increases in most cases by 1, as we always create a
   scalar canonical IV, which is alive across the loop and is not
   considered by the legacy implementation

 * Output is ordered by insertion, now scalar registers are added first
   due the canonical IV phi.

 * Using the VPlan, we now also more precisely know if an induction will
   be vectorized or scalarized.

Depends on #126415

PR: #126437
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 8, 2025
…26437)

Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.

There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.

There are the following changes:

 * Scalar usage increases in most cases by 1, as we always create a
   scalar canonical IV, which is alive across the loop and is not
   considered by the legacy implementation

 * Output is ordered by insertion, now scalar registers are added first
   due the canonical IV phi.

 * Using the VPlan, we now also more precisely know if an induction will
   be vectorized or scalarized.

Depends on llvm/llvm-project#126415

PR: llvm/llvm-project#126437
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.

There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.

There are the following changes:

 * Scalar usage increases in most cases by 1, as we always create a
   scalar canonical IV, which is alive across the loop and is not
   considered by the legacy implementation

 * Output is ordered by insertion, now scalar registers are added first
   due the canonical IV phi.

 * Using the VPlan, we now also more precisely know if an induction will
   be vectorized or scalarized.

Depends on llvm#126415

PR: llvm#126437
SamTebbs33 pushed a commit to SamTebbs33/llvm-project that referenced this pull request May 1, 2025
Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.

There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.

There are the following changes:

 * Scalar usage increases in most cases by 1, as we always create a
   scalar canonical IV, which is alive across the loop and is not
   considered by the legacy implementation

 * Output is ordered by insertion, now scalar registers are added first
   due the canonical IV phi.

 * Using the VPlan, we now also more precisely know if an induction will
   be vectorized or scalarized.

Depends on llvm#126415

PR: llvm#126437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants