-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RegAlloc] Scale the spill weight by target factor #113675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-loongarch @llvm/pr-subscribers-llvm-regalloc Author: Pengcheng Wang (wangpc-pp) ChangesCurrently, the spill weight is only determined by isDef/isUse and For example, for To sovle this problem, a new target hook For RISC-V, the factors are set to the I believe all the targets can benefit from this change, but I will Partially fixes #113489. Patch is 870.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113675.diff 78 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 161bb247a0e968..a58ba178ac8484 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -117,14 +117,14 @@ class LiveIntervals {
/// If \p PSI is provided the calculation is altered for optsize functions.
static float getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineInstr &MI,
+ const MachineInstr &MI, unsigned Factor = 1,
ProfileSummaryInfo *PSI = nullptr);
/// Calculate the spill weight to assign to a single instruction.
/// If \p PSI is provided the calculation is altered for optsize functions.
static float getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineBasicBlock *MBB,
+ const MachineBasicBlock *MBB, unsigned Factor = 1,
ProfileSummaryInfo *PSI = nullptr);
LiveInterval &getInterval(Register Reg) {
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 292fa3c94969be..8726d2e33dbc83 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -926,6 +926,9 @@ class TargetRegisterInfo : public MCRegisterInfo {
/// Returns a -1 terminated array of pressure set IDs.
virtual const int *getRegUnitPressureSets(unsigned RegUnit) const = 0;
+ /// Get the factor of spill weight for this register class.
+ virtual unsigned getSpillWeightFactor(const TargetRegisterClass *RC) const;
+
/// Get a list of 'hint' registers that the register allocator should try
/// first when allocating a physical register for the virtual register
/// VirtReg. These registers are effectively moved to the front of the
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index f361c956092e88..8c3ab0d1e43a89 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -189,6 +189,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// Do not update future local split artifacts.
bool ShouldUpdateLI = !IsLocalSplitArtifact;
+ unsigned Factor = TRI.getSpillWeightFactor(MRI.getRegClass(LI.reg()));
if (IsLocalSplitArtifact) {
MachineBasicBlock *LocalMBB = LIS.getMBBFromIndex(*End);
assert(LocalMBB == LIS.getMBBFromIndex(*Start) &&
@@ -199,10 +200,10 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// localLI = COPY other
// ...
// other = COPY localLI
- TotalWeight +=
- LiveIntervals::getSpillWeight(true, false, &MBFI, LocalMBB, PSI);
- TotalWeight +=
- LiveIntervals::getSpillWeight(false, true, &MBFI, LocalMBB, PSI);
+ TotalWeight += LiveIntervals::getSpillWeight(true, false, &MBFI, LocalMBB,
+ Factor, PSI);
+ TotalWeight += LiveIntervals::getSpillWeight(false, true, &MBFI, LocalMBB,
+ Factor, PSI);
NumInstr += 2;
}
@@ -274,7 +275,8 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// Calculate instr weight.
bool Reads, Writes;
std::tie(Reads, Writes) = MI->readsWritesVirtualRegister(LI.reg());
- Weight = LiveIntervals::getSpillWeight(Writes, Reads, &MBFI, *MI, PSI);
+ Weight =
+ LiveIntervals::getSpillWeight(Writes, Reads, &MBFI, *MI, Factor, PSI);
// Give extra weight to what looks like a loop induction variable update.
if (Writes && IsExiting && LIS.isLiveOutOfMBB(LI, MBB))
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 21a316cf99a217..48f4538122be3e 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -877,15 +877,15 @@ LiveIntervals::hasPHIKill(const LiveInterval &LI, const VNInfo *VNI) const {
float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineInstr &MI,
+ const MachineInstr &MI, unsigned Factor,
ProfileSummaryInfo *PSI) {
- return getSpillWeight(isDef, isUse, MBFI, MI.getParent(), PSI);
+ return getSpillWeight(isDef, isUse, MBFI, MI.getParent(), Factor, PSI);
}
float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
const MachineBasicBlock *MBB,
- ProfileSummaryInfo *PSI) {
+ unsigned Factor, ProfileSummaryInfo *PSI) {
float Weight = isDef + isUse;
const auto *MF = MBB->getParent();
// When optimizing for size we only consider the codesize impact of spilling
@@ -893,7 +893,7 @@ float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
if (PSI && (MF->getFunction().hasOptSize() ||
llvm::shouldOptimizeForSize(MF, PSI, MBFI)))
return Weight;
- return Weight * MBFI->getBlockFreqRelativeToEntryBlock(MBB);
+ return Weight * MBFI->getBlockFreqRelativeToEntryBlock(MBB) * Factor;
}
LiveRange::Segment
diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index ac9a3d6f0d1a60..d1f02489db62cb 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -415,6 +415,11 @@ bool TargetRegisterInfo::shouldRewriteCopySrc(const TargetRegisterClass *DefRC,
return shareSameRegisterFile(*this, DefRC, DefSubReg, SrcRC, SrcSubReg);
}
+unsigned
+TargetRegisterInfo::getSpillWeightFactor(const TargetRegisterClass *RC) const {
+ return 1;
+}
+
// Compute target-independent register allocator hints to help eliminate copies.
bool TargetRegisterInfo::getRegAllocationHints(
Register VirtReg, ArrayRef<MCPhysReg> Order,
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 26195ef721db39..884a62c3e70679 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -803,6 +803,11 @@ RISCVRegisterInfo::getRegisterCostTableIndex(const MachineFunction &MF) const {
: 0;
}
+unsigned
+RISCVRegisterInfo::getSpillWeightFactor(const TargetRegisterClass *RC) const {
+ return getRegClassWeight(RC).RegWeight;
+}
+
// Add two address hints to improve chances of being able to use a compressed
// instruction.
bool RISCVRegisterInfo::getRegAllocationHints(
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.h b/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
index 6ddb1eb9c14d5e..51e7f9d3b0cc1d 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
@@ -127,6 +127,8 @@ struct RISCVRegisterInfo : public RISCVGenRegisterInfo {
unsigned getRegisterCostTableIndex(const MachineFunction &MF) const override;
+ unsigned getSpillWeightFactor(const TargetRegisterClass *RC) const override;
+
bool getRegAllocationHints(Register VirtReg, ArrayRef<MCPhysReg> Order,
SmallVectorImpl<MCPhysReg> &Hints,
const MachineFunction &MF, const VirtRegMap *VRM,
diff --git a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
index cd2208e31eb6d3..b37454b3b24434 100644
--- a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
@@ -561,18 +561,7 @@ declare <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64>, i1 immarg
define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vp_abs_nxv16i64:
; CHECK: # %bb.0:
-; CHECK-NEXT: addi sp, sp, -16
-; CHECK-NEXT: .cfi_def_cfa_offset 16
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sub sp, sp, a1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT: vmv1r.v v24, v0
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vs8r.v v8, (a1) # Unknown-size Folded Spill
+; CHECK-NEXT: vmv1r.v v7, v0
; CHECK-NEXT: csrr a1, vlenb
; CHECK-NEXT: srli a2, a1, 3
; CHECK-NEXT: vsetvli a3, zero, e8, mf4, ta, ma
@@ -582,28 +571,16 @@ define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 1
; CHECK-NEXT: addi a3, a3, -1
; CHECK-NEXT: and a2, a3, a2
; CHECK-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v16, 0, v0.t
-; CHECK-NEXT: vmax.vv v8, v16, v8, v0.t
-; CHECK-NEXT: addi a2, sp, 16
-; CHECK-NEXT: vs8r.v v8, (a2) # Unknown-size Folded Spill
+; CHECK-NEXT: vrsub.vi v24, v16, 0, v0.t
+; CHECK-NEXT: vmax.vv v16, v16, v24, v0.t
; CHECK-NEXT: bltu a0, a1, .LBB46_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB46_2:
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vl8r.v v8, (a1) # Unknown-size Folded Reload
+; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; CHECK-NEXT: vrsub.vi v16, v8, 0, v0.t
-; CHECK-NEXT: vmax.vv v8, v8, v16, v0.t
-; CHECK-NEXT: addi a0, sp, 16
-; CHECK-NEXT: vl8r.v v16, (a0) # Unknown-size Folded Reload
-; CHECK-NEXT: csrr a0, vlenb
-; CHECK-NEXT: slli a0, a0, 4
-; CHECK-NEXT: add sp, sp, a0
-; CHECK-NEXT: addi sp, sp, 16
+; CHECK-NEXT: vrsub.vi v24, v8, 0, v0.t
+; CHECK-NEXT: vmax.vv v8, v8, v24, v0.t
; CHECK-NEXT: ret
%v = call <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64> %va, i1 false, <vscale x 16 x i1> %m, i32 %evl)
ret <vscale x 16 x i64> %v
diff --git a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
index afce04d107e728..94bc2851a6bf40 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
@@ -2307,7 +2307,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vsll.vx v24, v24, a3, v0.t
; RV32-NEXT: vor.vv v16, v16, v24, v0.t
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 4
+; RV32-NEXT: slli a4, a4, 3
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
@@ -2315,28 +2315,30 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma
; RV32-NEXT: vlse64.v v16, (a4), zero
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 3
+; RV32-NEXT: slli a4, a4, 4
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
; RV32-NEXT: lui a4, 4080
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vand.vx v24, v8, a4, v0.t
-; RV32-NEXT: vsll.vi v24, v24, 24, v0.t
-; RV32-NEXT: addi a5, sp, 16
-; RV32-NEXT: vs8r.v v24, (a5) # Unknown-size Folded Spill
-; RV32-NEXT: vand.vv v24, v8, v16, v0.t
-; RV32-NEXT: vsll.vi v16, v24, 8, v0.t
-; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
-; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: vand.vx v16, v8, a4, v0.t
+; RV32-NEXT: vsll.vi v24, v16, 24, v0.t
; RV32-NEXT: csrr a5, vlenb
; RV32-NEXT: slli a5, a5, 4
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
+; RV32-NEXT: vl8r.v v16, (a5) # Unknown-size Folded Reload
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
+; RV32-NEXT: vsll.vi v16, v16, 8, v0.t
+; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: csrr a5, vlenb
+; RV32-NEXT: slli a5, a5, 3
+; RV32-NEXT: add a5, sp, a5
+; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v16, v24, v16, v0.t
; RV32-NEXT: csrr a5, vlenb
-; RV32-NEXT: slli a5, a5, 4
+; RV32-NEXT: slli a5, a5, 3
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vs8r.v v16, (a5) # Unknown-size Folded Spill
@@ -2350,7 +2352,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vand.vx v24, v24, a4, v0.t
; RV32-NEXT: vsrl.vi v8, v8, 8, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 3
+; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2360,7 +2362,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 4
+; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2668,7 +2670,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vsll.vx v24, v24, a3, v0.t
; RV32-NEXT: vor.vv v16, v16, v24, v0.t
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 4
+; RV32-NEXT: slli a4, a4, 3
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
@@ -2676,28 +2678,30 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma
; RV32-NEXT: vlse64.v v16, (a4), zero
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 3
+; RV32-NEXT: slli a4, a4, 4
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
; RV32-NEXT: lui a4, 4080
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vand.vx v24, v8, a4, v0.t
-; RV32-NEXT: vsll.vi v24, v24, 24, v0.t
-; RV32-NEXT: addi a5, sp, 16
-; RV32-NEXT: vs8r.v v24, (a5) # Unknown-size Folded Spill
-; RV32-NEXT: vand.vv v24, v8, v16, v0.t
-; RV32-NEXT: vsll.vi v16, v24, 8, v0.t
-; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
-; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: vand.vx v16, v8, a4, v0.t
+; RV32-NEXT: vsll.vi v24, v16, 24, v0.t
; RV32-NEXT: csrr a5, vlenb
; RV32-NEXT: slli a5, a5, 4
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
+; RV32-NEXT: vl8r.v v16, (a5) # Unknown-size Folded Reload
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
+; RV32-NEXT: vsll.vi v16, v16, 8, v0.t
+; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: csrr a5, vlenb
+; RV32-NEXT: slli a5, a5, 3
+; RV32-NEXT: add a5, sp, a5
+; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v16, v24, v16, v0.t
; RV32-NEXT: csrr a5, vlenb
-; RV32-NEXT: slli a5, a5, 4
+; RV32-NEXT: slli a5, a5, 3
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vs8r.v v16, (a5) # Unknown-size Folded Spill
@@ -2711,7 +2715,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vand.vx v24, v24, a4, v0.t
; RV32-NEXT: vsrl.vi v8, v8, 8, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 3
+; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2721,7 +2725,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 4
+; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -3010,89 +3014,65 @@ declare <vscale x 64 x i16> @llvm.vp.bitreverse.nxv64i16(<vscale x 64 x i16>, <v
define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vscale x 64 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vp_bitreverse_nxv64i16:
; CHECK: # %bb.0:
-; CHECK-NEXT: addi sp, sp, -16
-; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: vmv1r.v v7, v0
; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sub sp, sp, a1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT: vmv1r.v v24, v0
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vs8r.v v8, (a1) # Unknown-size Folded Spill
-; CHECK-NEXT: csrr a2, vlenb
-; CHECK-NEXT: srli a1, a2, 1
+; CHECK-NEXT: srli a2, a1, 1
; CHECK-NEXT: vsetvli a3, zero, e8, m1, ta, ma
-; CHECK-NEXT: vslidedown.vx v0, v0, a1
-; CHECK-NEXT: slli a2, a2, 2
-; CHECK-NEXT: sub a1, a0, a2
-; CHECK-NEXT: sltu a3, a0, a1
+; CHECK-NEXT: vslidedown.vx v0, v0, a2
+; CHECK-NEXT: slli a1, a1, 2
+; CHECK-NEXT: sub a2, a0, a1
+; CHECK-NEXT: sltu a3, a0, a2
; CHECK-NEXT: addi a3, a3, -1
-; CHECK-NEXT: and a1, a3, a1
-; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, ma
-; CHECK-NEXT: vsrl.vi v8, v16, 8, v0.t
+; CHECK-NEXT: and a2, a3, a2
+; CHECK-NEXT: vsetvli zero, a2, e16, m8, ta, ma
+; CHECK-NEXT: vsrl.vi v24, v16, 8, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 8, v0.t
-; CHECK-NEXT: vor.vv v16, v16, v8, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 4, v0.t
-; CHECK-NEXT: lui a1, 1
-; CHECK-NEXT: addi a1, a1, -241
-; CHECK-NEXT: vand.vx v8, v8, a1, v0.t
-; CHECK-NEXT: vand.vx v16, v16, a1, v0.t
+; CHECK-NEXT: vor.vv v16, v16, v24, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 4, v0.t
+; CHECK-NEXT: lui a2, 1
+; CHECK-NEXT: addi a2, a2, -241
+; CHECK-NEXT: vand.vx v24, v24, a2, v0.t
+; CHECK-NEXT: vand.vx v16, v16, a2, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 4, v0.t
-; CHECK-NEXT: vor.vv v16, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 2, v0.t
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 2, v0.t
; CHECK-NEXT: lui a3, 3
; CHECK-NEXT: addi a3, a3, 819
-; CHECK-NEXT: vand.vx v8, v8, a3, v0.t
+; CHECK-NEXT: vand.vx v24, v24, a3, v0.t
; CHECK-NEXT: vand.vx v16, v16, a3, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 2, v0.t
-; CHECK-NEXT: vor.vv v16, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 1, v0.t
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 1, v0.t
; CHECK-NEXT: lui a4, 5
; CHECK-NEXT: addi a4, a4, 1365
-; CHECK-NEXT: vand.vx v8, v8, a4, v0.t
+; CHECK-NEXT: vand.vx v24, v24, a4, v0.t
; CHECK-NEXT: vand.vx v16, v16, a4, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 1, v0.t
-; CHECK-NEXT: vor.vv v8, v8, v16, v0.t
-; CHECK-NEXT: addi a5, sp, 16
-; CHECK-NEXT: vs8r.v v8, (a5) # Unknown-size Folded Spill
-; CHECK-NEXT: bltu a0, a2, .LBB46_2
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: bltu a0, a1, .LBB46_2
; CHECK-NEXT: # %bb.1:
-; CHECK-NEXT: mv a0, a2
+; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB46_2:
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: csrr a2, vlenb
-; CHECK-NEXT: slli a2, a2, 3
-; CHECK-NEXT: add a2, sp, a2
-; CHECK-NEXT: addi a2, a2, 16
-; CHECK-NEXT: vl8r.v v8, (a2) # Unknown-size Folded Reload
+; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e16, m8, ta, ma
-; CHECK-NEXT: vsrl.vi v16, v8, 8, v0.t
+; CHECK-NEXT: vsrl.vi v24, v8, 8, v0.t
; CHECK-NEXT: vsll.vi v8, v8, 8, v0.t
-; CHECK-NEXT: vor.vv v8, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v16, v8, 4, v0.t...
[truncated]
|
@llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) ChangesCurrently, the spill weight is only determined by isDef/isUse and For example, for To sovle this problem, a new target hook For RISC-V, the factors are set to the I believe all the targets can benefit from this change, but I will Partially fixes #113489. Patch is 870.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/113675.diff 78 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/LiveIntervals.h b/llvm/include/llvm/CodeGen/LiveIntervals.h
index 161bb247a0e968..a58ba178ac8484 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervals.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervals.h
@@ -117,14 +117,14 @@ class LiveIntervals {
/// If \p PSI is provided the calculation is altered for optsize functions.
static float getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineInstr &MI,
+ const MachineInstr &MI, unsigned Factor = 1,
ProfileSummaryInfo *PSI = nullptr);
/// Calculate the spill weight to assign to a single instruction.
/// If \p PSI is provided the calculation is altered for optsize functions.
static float getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineBasicBlock *MBB,
+ const MachineBasicBlock *MBB, unsigned Factor = 1,
ProfileSummaryInfo *PSI = nullptr);
LiveInterval &getInterval(Register Reg) {
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 292fa3c94969be..8726d2e33dbc83 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -926,6 +926,9 @@ class TargetRegisterInfo : public MCRegisterInfo {
/// Returns a -1 terminated array of pressure set IDs.
virtual const int *getRegUnitPressureSets(unsigned RegUnit) const = 0;
+ /// Get the factor of spill weight for this register class.
+ virtual unsigned getSpillWeightFactor(const TargetRegisterClass *RC) const;
+
/// Get a list of 'hint' registers that the register allocator should try
/// first when allocating a physical register for the virtual register
/// VirtReg. These registers are effectively moved to the front of the
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index f361c956092e88..8c3ab0d1e43a89 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -189,6 +189,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// Do not update future local split artifacts.
bool ShouldUpdateLI = !IsLocalSplitArtifact;
+ unsigned Factor = TRI.getSpillWeightFactor(MRI.getRegClass(LI.reg()));
if (IsLocalSplitArtifact) {
MachineBasicBlock *LocalMBB = LIS.getMBBFromIndex(*End);
assert(LocalMBB == LIS.getMBBFromIndex(*Start) &&
@@ -199,10 +200,10 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// localLI = COPY other
// ...
// other = COPY localLI
- TotalWeight +=
- LiveIntervals::getSpillWeight(true, false, &MBFI, LocalMBB, PSI);
- TotalWeight +=
- LiveIntervals::getSpillWeight(false, true, &MBFI, LocalMBB, PSI);
+ TotalWeight += LiveIntervals::getSpillWeight(true, false, &MBFI, LocalMBB,
+ Factor, PSI);
+ TotalWeight += LiveIntervals::getSpillWeight(false, true, &MBFI, LocalMBB,
+ Factor, PSI);
NumInstr += 2;
}
@@ -274,7 +275,8 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
// Calculate instr weight.
bool Reads, Writes;
std::tie(Reads, Writes) = MI->readsWritesVirtualRegister(LI.reg());
- Weight = LiveIntervals::getSpillWeight(Writes, Reads, &MBFI, *MI, PSI);
+ Weight =
+ LiveIntervals::getSpillWeight(Writes, Reads, &MBFI, *MI, Factor, PSI);
// Give extra weight to what looks like a loop induction variable update.
if (Writes && IsExiting && LIS.isLiveOutOfMBB(LI, MBB))
diff --git a/llvm/lib/CodeGen/LiveIntervals.cpp b/llvm/lib/CodeGen/LiveIntervals.cpp
index 21a316cf99a217..48f4538122be3e 100644
--- a/llvm/lib/CodeGen/LiveIntervals.cpp
+++ b/llvm/lib/CodeGen/LiveIntervals.cpp
@@ -877,15 +877,15 @@ LiveIntervals::hasPHIKill(const LiveInterval &LI, const VNInfo *VNI) const {
float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
- const MachineInstr &MI,
+ const MachineInstr &MI, unsigned Factor,
ProfileSummaryInfo *PSI) {
- return getSpillWeight(isDef, isUse, MBFI, MI.getParent(), PSI);
+ return getSpillWeight(isDef, isUse, MBFI, MI.getParent(), Factor, PSI);
}
float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
const MachineBlockFrequencyInfo *MBFI,
const MachineBasicBlock *MBB,
- ProfileSummaryInfo *PSI) {
+ unsigned Factor, ProfileSummaryInfo *PSI) {
float Weight = isDef + isUse;
const auto *MF = MBB->getParent();
// When optimizing for size we only consider the codesize impact of spilling
@@ -893,7 +893,7 @@ float LiveIntervals::getSpillWeight(bool isDef, bool isUse,
if (PSI && (MF->getFunction().hasOptSize() ||
llvm::shouldOptimizeForSize(MF, PSI, MBFI)))
return Weight;
- return Weight * MBFI->getBlockFreqRelativeToEntryBlock(MBB);
+ return Weight * MBFI->getBlockFreqRelativeToEntryBlock(MBB) * Factor;
}
LiveRange::Segment
diff --git a/llvm/lib/CodeGen/TargetRegisterInfo.cpp b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
index ac9a3d6f0d1a60..d1f02489db62cb 100644
--- a/llvm/lib/CodeGen/TargetRegisterInfo.cpp
+++ b/llvm/lib/CodeGen/TargetRegisterInfo.cpp
@@ -415,6 +415,11 @@ bool TargetRegisterInfo::shouldRewriteCopySrc(const TargetRegisterClass *DefRC,
return shareSameRegisterFile(*this, DefRC, DefSubReg, SrcRC, SrcSubReg);
}
+unsigned
+TargetRegisterInfo::getSpillWeightFactor(const TargetRegisterClass *RC) const {
+ return 1;
+}
+
// Compute target-independent register allocator hints to help eliminate copies.
bool TargetRegisterInfo::getRegAllocationHints(
Register VirtReg, ArrayRef<MCPhysReg> Order,
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 26195ef721db39..884a62c3e70679 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -803,6 +803,11 @@ RISCVRegisterInfo::getRegisterCostTableIndex(const MachineFunction &MF) const {
: 0;
}
+unsigned
+RISCVRegisterInfo::getSpillWeightFactor(const TargetRegisterClass *RC) const {
+ return getRegClassWeight(RC).RegWeight;
+}
+
// Add two address hints to improve chances of being able to use a compressed
// instruction.
bool RISCVRegisterInfo::getRegAllocationHints(
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.h b/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
index 6ddb1eb9c14d5e..51e7f9d3b0cc1d 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.h
@@ -127,6 +127,8 @@ struct RISCVRegisterInfo : public RISCVGenRegisterInfo {
unsigned getRegisterCostTableIndex(const MachineFunction &MF) const override;
+ unsigned getSpillWeightFactor(const TargetRegisterClass *RC) const override;
+
bool getRegAllocationHints(Register VirtReg, ArrayRef<MCPhysReg> Order,
SmallVectorImpl<MCPhysReg> &Hints,
const MachineFunction &MF, const VirtRegMap *VRM,
diff --git a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
index cd2208e31eb6d3..b37454b3b24434 100644
--- a/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/abs-vp.ll
@@ -561,18 +561,7 @@ declare <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64>, i1 immarg
define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vp_abs_nxv16i64:
; CHECK: # %bb.0:
-; CHECK-NEXT: addi sp, sp, -16
-; CHECK-NEXT: .cfi_def_cfa_offset 16
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sub sp, sp, a1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT: vmv1r.v v24, v0
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vs8r.v v8, (a1) # Unknown-size Folded Spill
+; CHECK-NEXT: vmv1r.v v7, v0
; CHECK-NEXT: csrr a1, vlenb
; CHECK-NEXT: srli a2, a1, 3
; CHECK-NEXT: vsetvli a3, zero, e8, mf4, ta, ma
@@ -582,28 +571,16 @@ define <vscale x 16 x i64> @vp_abs_nxv16i64(<vscale x 16 x i64> %va, <vscale x 1
; CHECK-NEXT: addi a3, a3, -1
; CHECK-NEXT: and a2, a3, a2
; CHECK-NEXT: vsetvli zero, a2, e64, m8, ta, ma
-; CHECK-NEXT: vrsub.vi v8, v16, 0, v0.t
-; CHECK-NEXT: vmax.vv v8, v16, v8, v0.t
-; CHECK-NEXT: addi a2, sp, 16
-; CHECK-NEXT: vs8r.v v8, (a2) # Unknown-size Folded Spill
+; CHECK-NEXT: vrsub.vi v24, v16, 0, v0.t
+; CHECK-NEXT: vmax.vv v16, v16, v24, v0.t
; CHECK-NEXT: bltu a0, a1, .LBB46_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB46_2:
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vl8r.v v8, (a1) # Unknown-size Folded Reload
+; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; CHECK-NEXT: vrsub.vi v16, v8, 0, v0.t
-; CHECK-NEXT: vmax.vv v8, v8, v16, v0.t
-; CHECK-NEXT: addi a0, sp, 16
-; CHECK-NEXT: vl8r.v v16, (a0) # Unknown-size Folded Reload
-; CHECK-NEXT: csrr a0, vlenb
-; CHECK-NEXT: slli a0, a0, 4
-; CHECK-NEXT: add sp, sp, a0
-; CHECK-NEXT: addi sp, sp, 16
+; CHECK-NEXT: vrsub.vi v24, v8, 0, v0.t
+; CHECK-NEXT: vmax.vv v8, v8, v24, v0.t
; CHECK-NEXT: ret
%v = call <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64> %va, i1 false, <vscale x 16 x i1> %m, i32 %evl)
ret <vscale x 16 x i64> %v
diff --git a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
index afce04d107e728..94bc2851a6bf40 100644
--- a/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll
@@ -2307,7 +2307,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vsll.vx v24, v24, a3, v0.t
; RV32-NEXT: vor.vv v16, v16, v24, v0.t
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 4
+; RV32-NEXT: slli a4, a4, 3
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
@@ -2315,28 +2315,30 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma
; RV32-NEXT: vlse64.v v16, (a4), zero
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 3
+; RV32-NEXT: slli a4, a4, 4
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
; RV32-NEXT: lui a4, 4080
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vand.vx v24, v8, a4, v0.t
-; RV32-NEXT: vsll.vi v24, v24, 24, v0.t
-; RV32-NEXT: addi a5, sp, 16
-; RV32-NEXT: vs8r.v v24, (a5) # Unknown-size Folded Spill
-; RV32-NEXT: vand.vv v24, v8, v16, v0.t
-; RV32-NEXT: vsll.vi v16, v24, 8, v0.t
-; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
-; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: vand.vx v16, v8, a4, v0.t
+; RV32-NEXT: vsll.vi v24, v16, 24, v0.t
; RV32-NEXT: csrr a5, vlenb
; RV32-NEXT: slli a5, a5, 4
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
+; RV32-NEXT: vl8r.v v16, (a5) # Unknown-size Folded Reload
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
+; RV32-NEXT: vsll.vi v16, v16, 8, v0.t
+; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: csrr a5, vlenb
+; RV32-NEXT: slli a5, a5, 3
+; RV32-NEXT: add a5, sp, a5
+; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v16, v24, v16, v0.t
; RV32-NEXT: csrr a5, vlenb
-; RV32-NEXT: slli a5, a5, 4
+; RV32-NEXT: slli a5, a5, 3
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vs8r.v v16, (a5) # Unknown-size Folded Spill
@@ -2350,7 +2352,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vand.vx v24, v24, a4, v0.t
; RV32-NEXT: vsrl.vi v8, v8, 8, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 3
+; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2360,7 +2362,7 @@ define <vscale x 7 x i64> @vp_bitreverse_nxv7i64(<vscale x 7 x i64> %va, <vscale
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 4
+; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2668,7 +2670,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vsll.vx v24, v24, a3, v0.t
; RV32-NEXT: vor.vv v16, v16, v24, v0.t
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 4
+; RV32-NEXT: slli a4, a4, 3
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
@@ -2676,28 +2678,30 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma
; RV32-NEXT: vlse64.v v16, (a4), zero
; RV32-NEXT: csrr a4, vlenb
-; RV32-NEXT: slli a4, a4, 3
+; RV32-NEXT: slli a4, a4, 4
; RV32-NEXT: add a4, sp, a4
; RV32-NEXT: addi a4, a4, 16
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill
; RV32-NEXT: lui a4, 4080
; RV32-NEXT: vsetvli zero, a0, e64, m8, ta, ma
-; RV32-NEXT: vand.vx v24, v8, a4, v0.t
-; RV32-NEXT: vsll.vi v24, v24, 24, v0.t
-; RV32-NEXT: addi a5, sp, 16
-; RV32-NEXT: vs8r.v v24, (a5) # Unknown-size Folded Spill
-; RV32-NEXT: vand.vv v24, v8, v16, v0.t
-; RV32-NEXT: vsll.vi v16, v24, 8, v0.t
-; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
-; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: vand.vx v16, v8, a4, v0.t
+; RV32-NEXT: vsll.vi v24, v16, 24, v0.t
; RV32-NEXT: csrr a5, vlenb
; RV32-NEXT: slli a5, a5, 4
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
+; RV32-NEXT: vl8r.v v16, (a5) # Unknown-size Folded Reload
+; RV32-NEXT: vand.vv v16, v8, v16, v0.t
+; RV32-NEXT: vsll.vi v16, v16, 8, v0.t
+; RV32-NEXT: vor.vv v16, v24, v16, v0.t
+; RV32-NEXT: csrr a5, vlenb
+; RV32-NEXT: slli a5, a5, 3
+; RV32-NEXT: add a5, sp, a5
+; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vl8r.v v24, (a5) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v16, v24, v16, v0.t
; RV32-NEXT: csrr a5, vlenb
-; RV32-NEXT: slli a5, a5, 4
+; RV32-NEXT: slli a5, a5, 3
; RV32-NEXT: add a5, sp, a5
; RV32-NEXT: addi a5, a5, 16
; RV32-NEXT: vs8r.v v16, (a5) # Unknown-size Folded Spill
@@ -2711,7 +2715,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vand.vx v24, v24, a4, v0.t
; RV32-NEXT: vsrl.vi v8, v8, 8, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 3
+; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -2721,7 +2725,7 @@ define <vscale x 8 x i64> @vp_bitreverse_nxv8i64(<vscale x 8 x i64> %va, <vscale
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: vor.vv v8, v8, v16, v0.t
; RV32-NEXT: csrr a1, vlenb
-; RV32-NEXT: slli a1, a1, 4
+; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16
; RV32-NEXT: vl8r.v v16, (a1) # Unknown-size Folded Reload
@@ -3010,89 +3014,65 @@ declare <vscale x 64 x i16> @llvm.vp.bitreverse.nxv64i16(<vscale x 64 x i16>, <v
define <vscale x 64 x i16> @vp_bitreverse_nxv64i16(<vscale x 64 x i16> %va, <vscale x 64 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vp_bitreverse_nxv64i16:
; CHECK: # %bb.0:
-; CHECK-NEXT: addi sp, sp, -16
-; CHECK-NEXT: .cfi_def_cfa_offset 16
+; CHECK-NEXT: vmv1r.v v7, v0
; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 4
-; CHECK-NEXT: sub sp, sp, a1
-; CHECK-NEXT: .cfi_escape 0x0f, 0x0d, 0x72, 0x00, 0x11, 0x10, 0x22, 0x11, 0x10, 0x92, 0xa2, 0x38, 0x00, 0x1e, 0x22 # sp + 16 + 16 * vlenb
-; CHECK-NEXT: vmv1r.v v24, v0
-; CHECK-NEXT: csrr a1, vlenb
-; CHECK-NEXT: slli a1, a1, 3
-; CHECK-NEXT: add a1, sp, a1
-; CHECK-NEXT: addi a1, a1, 16
-; CHECK-NEXT: vs8r.v v8, (a1) # Unknown-size Folded Spill
-; CHECK-NEXT: csrr a2, vlenb
-; CHECK-NEXT: srli a1, a2, 1
+; CHECK-NEXT: srli a2, a1, 1
; CHECK-NEXT: vsetvli a3, zero, e8, m1, ta, ma
-; CHECK-NEXT: vslidedown.vx v0, v0, a1
-; CHECK-NEXT: slli a2, a2, 2
-; CHECK-NEXT: sub a1, a0, a2
-; CHECK-NEXT: sltu a3, a0, a1
+; CHECK-NEXT: vslidedown.vx v0, v0, a2
+; CHECK-NEXT: slli a1, a1, 2
+; CHECK-NEXT: sub a2, a0, a1
+; CHECK-NEXT: sltu a3, a0, a2
; CHECK-NEXT: addi a3, a3, -1
-; CHECK-NEXT: and a1, a3, a1
-; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, ma
-; CHECK-NEXT: vsrl.vi v8, v16, 8, v0.t
+; CHECK-NEXT: and a2, a3, a2
+; CHECK-NEXT: vsetvli zero, a2, e16, m8, ta, ma
+; CHECK-NEXT: vsrl.vi v24, v16, 8, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 8, v0.t
-; CHECK-NEXT: vor.vv v16, v16, v8, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 4, v0.t
-; CHECK-NEXT: lui a1, 1
-; CHECK-NEXT: addi a1, a1, -241
-; CHECK-NEXT: vand.vx v8, v8, a1, v0.t
-; CHECK-NEXT: vand.vx v16, v16, a1, v0.t
+; CHECK-NEXT: vor.vv v16, v16, v24, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 4, v0.t
+; CHECK-NEXT: lui a2, 1
+; CHECK-NEXT: addi a2, a2, -241
+; CHECK-NEXT: vand.vx v24, v24, a2, v0.t
+; CHECK-NEXT: vand.vx v16, v16, a2, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 4, v0.t
-; CHECK-NEXT: vor.vv v16, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 2, v0.t
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 2, v0.t
; CHECK-NEXT: lui a3, 3
; CHECK-NEXT: addi a3, a3, 819
-; CHECK-NEXT: vand.vx v8, v8, a3, v0.t
+; CHECK-NEXT: vand.vx v24, v24, a3, v0.t
; CHECK-NEXT: vand.vx v16, v16, a3, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 2, v0.t
-; CHECK-NEXT: vor.vv v16, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v8, v16, 1, v0.t
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: vsrl.vi v24, v16, 1, v0.t
; CHECK-NEXT: lui a4, 5
; CHECK-NEXT: addi a4, a4, 1365
-; CHECK-NEXT: vand.vx v8, v8, a4, v0.t
+; CHECK-NEXT: vand.vx v24, v24, a4, v0.t
; CHECK-NEXT: vand.vx v16, v16, a4, v0.t
; CHECK-NEXT: vsll.vi v16, v16, 1, v0.t
-; CHECK-NEXT: vor.vv v8, v8, v16, v0.t
-; CHECK-NEXT: addi a5, sp, 16
-; CHECK-NEXT: vs8r.v v8, (a5) # Unknown-size Folded Spill
-; CHECK-NEXT: bltu a0, a2, .LBB46_2
+; CHECK-NEXT: vor.vv v16, v24, v16, v0.t
+; CHECK-NEXT: bltu a0, a1, .LBB46_2
; CHECK-NEXT: # %bb.1:
-; CHECK-NEXT: mv a0, a2
+; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB46_2:
-; CHECK-NEXT: vmv1r.v v0, v24
-; CHECK-NEXT: csrr a2, vlenb
-; CHECK-NEXT: slli a2, a2, 3
-; CHECK-NEXT: add a2, sp, a2
-; CHECK-NEXT: addi a2, a2, 16
-; CHECK-NEXT: vl8r.v v8, (a2) # Unknown-size Folded Reload
+; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e16, m8, ta, ma
-; CHECK-NEXT: vsrl.vi v16, v8, 8, v0.t
+; CHECK-NEXT: vsrl.vi v24, v8, 8, v0.t
; CHECK-NEXT: vsll.vi v8, v8, 8, v0.t
-; CHECK-NEXT: vor.vv v8, v8, v16, v0.t
-; CHECK-NEXT: vsrl.vi v16, v8, 4, v0.t...
[truncated]
|
; RV32-NEXT: vand.vv v16, v8, v16, v0.t | ||
; RV32-NEXT: vsll.vi v16, v16, 8, v0.t | ||
; RV32-NEXT: vor.vv v16, v24, v16, v0.t | ||
; RV32-NEXT: csrr a5, vlenb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be considered as a small regression because we have more reads of vscale
(vlenb).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this, I also tried this a few months ago, and saw the same test diff where we're able to remove a lot of LMUL 8 spills.
However whilst it helps the in-tree tests a lot when I tried it out on SPEC CPU 2017 I got a 50% increase in spills generated overall, have you been able to check if you run into this too?
My approach also didn't use the reg class weight but instead I scaled the weight by the physical register size in bytes. From what I remember investigating the register class weight was related to register pressure calculation, which seemed orthogonal to spilling
Here is the result of statistics (
I don't see the increase of spills, but the decrease is also imperceptible.
Scaling the weight by bytes may be over estimated. I wanted to scale the weight by the number of physical registers, which is accidentally the register class weight. |
My bad, I was getting two things conflated. I had also tried a different approach where instead of adjusting the spill weights, I changed the eviction advisor to aggressively perform local reassignment (i.e. when trying to evict an m8 register, it would now try to evict an m1 register instead if it could be immediately assigned a register): lukel97@4075bfc. Off the top of my head I think this was the one that led to all the spilling in SPEC (but also fixed the spilling in-tree) But I checked out this PR locally and tried it with -march=rva23u64 -O3 -flto and still got a slight increase in the number of spills. I think with LTO there's more areas of high register pressure that this affects.
With that said, I'm not sure if these spills or reloads are signficant for performance. And on the llvm-test-suite loop vectorization micro benchmarks it actually removes all of the spills.
I will try and a do a run on deepsjeng and imagick to check this, but I would be overall in favour of this.
For what it's worth, I also previously tried scaling it by the number of reg units: lukel97@65bd27e But I think this might actually be the same thing that the reg class weight should reflect?
|
I am (conceptually) in favor of this direction. I haven't looked at the code yet in detail, and plan on deferring that until the empirical measurement discussion has converged. |
@lukel97 Any update on the perf measurements you mentioned in your last comment? (Also FYI @mikhailramalho) |
Ping. |
Have you tried taking LaneBitMask into account? The LaneBitMask indicates regunits occupied by the RegClass. |
// Weight override for register pressure calculation. This is the value I think it's viable. Except that regunit introduced by aliases might not be included in calculating the SpillWeight. |
Yes, I tried it before. It is just the same as lukel97@65bd27e I think. |
ping @lukel97 and @mikhailramalho re: BP3 perf measurements. |
I've kicked off a run now, will report back when the results are ready. I'm wondering if this is the right place to discourage the spilling high LMUL registers. There's multiple ways we could do this and increasing the spill weights is just one, e.g. Local reassignment is another. I have a feeling that the spill weights might really be intended to calculate the frequency of spills, not the overall cost. E.g. there is the possibility that spilling one M8 register allows 8 M1 registers to avoid spills. FWIW this is approach is also what I originally tried but I think it would be good to hear some thoughts from people that have worked more with the greedy register allocator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few off-topic observations from the diffs.
; RV32-NEXT: add a4, sp, a4 | ||
; RV32-NEXT: addi a4, a4, 16 | ||
; RV32-NEXT: vs8r.v v16, (a4) # Unknown-size Folded Spill | ||
; RV32-NEXT: addi a4, sp, 8 | ||
; RV32-NEXT: vsetvli a5, zero, e64, m8, ta, ma | ||
; RV32-NEXT: vlse64.v v16, (a4), zero | ||
; RV32-NEXT: csrr a4, vlenb | ||
; RV32-NEXT: slli a4, a4, 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off topic for this review, but just an observation.
If I'm reading this code right, we're spilling a zero strided load from a constant on the stack. I think this is coming from SPLAT_VECTOR_SPLIT_I64_VL.
This is a missed rematerialization optimization if we can prove the stack slot constant.
We could probably also just use a vector select between two constants (i.e. vmv.v.x + vmerge.vx) here instead. This form wouldn't be easy to remat.
This is also a case where we have a splat of one vreg value to a whole register group. (We don't model it that way today.) We could spill only one vreg here, and "remat" via a chain of whole register moves. This pattern comes up with e.g. vrgather.vi and a few other shuffles, so this might be interesting to explore as a way to decrease register pressure in these cases.
; RV32-NEXT: add a3, sp, a3 | ||
; RV32-NEXT: addi a3, a3, 16 | ||
; RV32-NEXT: vs8r.v v16, (a3) # Unknown-size Folded Spill | ||
; RV32-NEXT: vs8r.v v8, (a3) # Unknown-size Folded Spill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case looks suspicious. We have existing logic to rematerialize a vmv.v.x instead of spilling it. Why didn't that kick in here? Both before and after.
; CHECK-NEXT: fsrm a1 | ||
; CHECK-NEXT: addi a1, sp, 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weird? Is fsrm maybe marked as invalidating vector registers? We have a spill immediately before a fill from the same address a few lines later?
I'm not seeing any measurable runtime difference on SPEC CPU 2017 or MicroBenchmarks/LoopVectorization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direction sounds fine to me.
Ping. @arsenm Can I make it a hook now? |
It's still not a hook, or that hook already exists. You can adjust the spill weights of the classes instead of adding another layer of control on top of the existing control |
No. I mean, yes we can adjust the spill weight of a register class, but we don't count it in when calculating the spill weights of LiveIntervals. And this PR is trying to do it. |
Ok, I stumbled back into this by analyzing another case where LMUL8 spilling goes bad. I'm going to briefly explain the issue I was chasing, and then return to the status of this patch. If we have a case which uses masks intermixed with m8 code, but for some reason the mask can't always be placed in v0, we can end up with cases where the register allocator will fragment an m8, and then be unable to "fix" the mistake. In the particular example, we happen to assign v8-v15, then v16-v23, then assign two masks to v24, and v25 respective. Because one of the instructions uses a mask, v0 is unavailable for an m8 value. Next we try to assign an m8, and find no registers are available. The key thing here is that we should have evicted v24, and v25 placed the m8 in v24-v31, and then reassigned the two m1 values into v7, and v6 respectively. The key problem which prevented the eviction is that the spill weight of the m8 was perceived to be lower than either of the m1s, much less both. This is definitely backwards. Applying this patch has exactly the desired effect, and undoes the fragmentation. In terms of understanding this patch, I find it helpful to think of the scaling as applying just after the call to LiveIntervals::getSpillWeight. (We're multiply an add reduction, so this is equivalent on real numbers. On floats, it's "mostly so".) What we're doing here is increasing the cost of the spill or fill instruction. This is 100% accurate as a m8 spill does cost 8x a m1 spill. I have applied this patch locally, and manually reviewed the RISC-V changes. All but a very small handful are improvements. I want this change. I also inspected the values returned by getRegClassWeight to confirm they were reasonable for the RISC-V register classes; they are. @wangpc-pp I think it's clear that we can't enable this for all targets at this point. Given that, we need to move this back under a target hook. You seemed a bit confused by @arsenm 's comments on that. Do you wan to take a second attempt, or would you like me to put up an alternate patch? |
Big thanks for raising this up and the detailed analysis! I will make it a hook and retry it! |
977471f
to
754c792
Compare
754c792
to
b978595
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some statistics to collect the total "lmul" that is spilled and reloaded statically with this change:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 2fdf6bd36e88..f4b5a5b29e71 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -18,6 +18,7 @@
#include "RISCVSubtarget.h"
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/LiveIntervals.h"
@@ -43,6 +44,10 @@ using namespace llvm;
#define GET_INSTRINFO_NAMED_OPS
#include "RISCVGenInstrInfo.inc"
+#define DEBUG_TYPE "riscv-instr-info"
+STATISTIC(TotalLMULSpilled, "Total LMUL spilled");
+STATISTIC(TotalLMULReloaded, "Total LMUL reloaded");
+
static cl::opt<bool> PreferWholeRegisterMove(
"riscv-prefer-whole-register-move", cl::init(false), cl::Hidden,
cl::desc("Prefer whole register move for vector registers."));
@@ -615,12 +620,16 @@ void RISCVInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
IsScalableVector = false;
} else if (RISCV::VRRegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VS1R_V;
+ TotalLMULSpilled += 1;
} else if (RISCV::VRM2RegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VS2R_V;
+ TotalLMULSpilled += 2;
} else if (RISCV::VRM4RegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VS4R_V;
+ TotalLMULSpilled += 4;
} else if (RISCV::VRM8RegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VS8R_V;
+ TotalLMULSpilled += 8;
} else if (RISCV::VRN2M1RegClass.hasSubClassEq(RC))
Opcode = RISCV::PseudoVSPILL2_M1;
else if (RISCV::VRN2M2RegClass.hasSubClassEq(RC))
@@ -706,12 +715,16 @@ void RISCVInstrInfo::loadRegFromStackSlot(
IsScalableVector = false;
} else if (RISCV::VRRegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VL1RE8_V;
+ TotalLMULReloaded += 1;
} else if (RISCV::VRM2RegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VL2RE8_V;
+ TotalLMULReloaded += 2;
} else if (RISCV::VRM4RegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VL4RE8_V;
+ TotalLMULReloaded += 4;
} else if (RISCV::VRM8RegClass.hasSubClassEq(RC)) {
Opcode = RISCV::VL8RE8_V;
+ TotalLMULReloaded += 8;
} else if (RISCV::VRN2M1RegClass.hasSubClassEq(RC))
Opcode = RISCV::PseudoVRELOAD2_M1;
else if (RISCV::VRN2M2RegClass.hasSubClassEq(RC))
Running it against SPEC CPU 2017, -O3 -march=rva23u64, it looks like this helps a good bit on imagick/gcc/deepsjeng!
Program riscv-instr-info.TotalLMULReloaded riscv-instr-info.TotalLMULSpilled
lhs rhs diff lhs rhs diff
FP2017rate/508.namd_r/508.namd_r 6.00 6.00 0.0% 1.00 1.00 0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 5.00 5.00 0.0% 4.00 4.00 0.0%
INT2017spe...00.perlbench_s/600.perlbench_s 8.00 8.00 0.0% 4.00 4.00 0.0%
INT2017speed/625.x264_s/625.x264_s 43.00 43.00 0.0% 47.00 47.00 0.0%
INT2017rate/525.x264_r/525.x264_r 43.00 43.00 0.0% 47.00 47.00 0.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r 6.00 6.00 0.0% 6.00 6.00 0.0%
INT2017rate/520.omnetpp_r/520.omnetpp_r 5.00 5.00 0.0% 4.00 4.00 0.0%
INT2017rat...00.perlbench_r/500.perlbench_r 8.00 8.00 0.0% 4.00 4.00 0.0%
FP2017speed/644.nab_s/644.nab_s 25.00 25.00 0.0% 25.00 25.00 0.0%
FP2017speed/619.lbm_s/619.lbm_s 42.00 42.00 0.0% 42.00 42.00 0.0%
FP2017rate/544.nab_r/544.nab_r 25.00 25.00 0.0% 25.00 25.00 0.0%
FP2017rate/519.lbm_r/519.lbm_r 42.00 42.00 0.0% 42.00 42.00 0.0%
FP2017rate/511.povray_r/511.povray_r 122.00 122.00 0.0% 66.00 66.00 0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 6.00 6.00 0.0% 6.00 6.00 0.0%
FP2017speed/638.imagick_s/638.imagick_s 5054.00 5053.00 -0.0% 4433.00 3803.00 -14.2%
FP2017rate/538.imagick_r/538.imagick_r 5054.00 5053.00 -0.0% 4433.00 3803.00 -14.2%
FP2017rate/510.parest_r/510.parest_r 1349.00 1343.00 -0.4% 1089.00 1083.00 -0.6%
FP2017rate/526.blender_r/526.blender_r 1138.00 1127.00 -1.0% 1064.00 1025.00 -3.7%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s 284.00 274.00 -3.5% 154.00 132.00 -14.3%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r 284.00 274.00 -3.5% 154.00 132.00 -14.3%
INT2017speed/602.gcc_s/602.gcc_s 113.00 83.00 -26.5% 107.00 77.00 -28.0%
INT2017rate/502.gcc_r/502.gcc_r 113.00 83.00 -26.5% 107.00 77.00 -28.0%
INT2017rate/505.mcf_r/505.mcf_r 0.00 0.00
INT2017rate/541.leela_r/541.leela_r 0.00 0.00
INT2017rate/557.xz_r/557.xz_r 0.00 0.00
INT2017speed/605.mcf_s/605.mcf_s 0.00 0.00
INT2017speed/641.leela_s/641.leela_s 0.00 0.00
INT2017speed/657.xz_s/657.xz_s 0.00 0.00
Geomean difference -3.1% -5.8%
riscv-instr-info.TotalLMULReloaded riscv-instr-info.TotalLMULSpilled
I think I was previously skeptical because I only measured the absolute number of spills/reloads, which included scalars + didn't show much of a change:
Program regalloc.NumSpills regalloc.NumReloads
lhs rhs diff lhs rhs diff
FP2017rate/526.blender_r/526.blender_r 13411.00 13430.00 0.1% 27478.00 27509.00 0.1%
INT2017speed/602.gcc_s/602.gcc_s 11376.00 11381.00 0.0% 25795.00 25800.00 0.0%
INT2017rate/502.gcc_r/502.gcc_r 11376.00 11381.00 0.0% 25795.00 25800.00 0.0%
FP2017rate/508.namd_r/508.namd_r 6729.00 6729.00 0.0% 16370.00 16370.00 0.0%
FP2017rate/510.parest_r/510.parest_r 44293.00 44293.00 0.0% 87404.00 87404.00 0.0%
INT2017speed/641.leela_s/641.leela_s 310.00 310.00 0.0% 449.00 449.00 0.0%
INT2017speed/625.x264_s/625.x264_s 2147.00 2147.00 0.0% 4598.00 4598.00 0.0%
INT2017spe...23.xalancbmk_s/623.xalancbmk_s 1822.00 1822.00 0.0% 2969.00 2969.00 0.0%
INT2017spe...ed/620.omnetpp_s/620.omnetpp_s 719.00 719.00 0.0% 1210.00 1210.00 0.0%
INT2017speed/605.mcf_s/605.mcf_s 123.00 123.00 0.0% 372.00 372.00 0.0%
INT2017spe...00.perlbench_s/600.perlbench_s 4375.00 4375.00 0.0% 9740.00 9740.00 0.0%
INT2017rate/557.xz_r/557.xz_r 300.00 300.00 0.0% 603.00 603.00 0.0%
INT2017rate/541.leela_r/541.leela_r 310.00 310.00 0.0% 449.00 449.00 0.0%
INT2017rate/525.x264_r/525.x264_r 2147.00 2147.00 0.0% 4598.00 4598.00 0.0%
INT2017rat...23.xalancbmk_r/523.xalancbmk_r 1822.00 1822.00 0.0% 2969.00 2969.00 0.0%
INT2017rate/520.omnetpp_r/520.omnetpp_r 719.00 719.00 0.0% 1210.00 1210.00 0.0%
INT2017rate/505.mcf_r/505.mcf_r 123.00 123.00 0.0% 372.00 372.00 0.0%
INT2017rat...00.perlbench_r/500.perlbench_r 4375.00 4375.00 0.0% 9740.00 9740.00 0.0%
FP2017speed/644.nab_s/644.nab_s 713.00 713.00 0.0% 1066.00 1066.00 0.0%
FP2017speed/619.lbm_s/619.lbm_s 88.00 88.00 0.0% 90.00 90.00 0.0%
FP2017rate/544.nab_r/544.nab_r 713.00 713.00 0.0% 1066.00 1066.00 0.0%
FP2017rate/519.lbm_r/519.lbm_r 90.00 90.00 0.0% 92.00 92.00 0.0%
FP2017rate/511.povray_r/511.povray_r 1571.00 1571.00 0.0% 3043.00 3043.00 0.0%
INT2017speed/657.xz_s/657.xz_s 300.00 300.00 0.0% 603.00 603.00 0.0%
FP2017speed/638.imagick_s/638.imagick_s 4074.00 4054.00 -0.5% 10335.00 10452.00 1.1%
FP2017rate/538.imagick_r/538.imagick_r 4074.00 4054.00 -0.5% 10335.00 10452.00 1.1%
INT2017rat...31.deepsjeng_r/531.deepsjeng_r 344.00 341.00 -0.9% 690.00 691.00 0.1%
INT2017spe...31.deepsjeng_s/631.deepsjeng_s 344.00 341.00 -0.9% 690.00 691.00 0.1%
Geomean difference -0.1% 0.1%
So I guess that answers my worry that increasing LMUL 8 weights might increase smaller LMUL spills, i.e. it looks like it doesn't!
Thanks for being patient on this for so long, the code changes + test diff LGTM :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(Make sure you rebase carefully when landing - there's enough test changes I suspect you'll easily get conflicts.)
Currently, the spill weight is only determined by isDef/isUse and block frequency. However, for registers with different register classes, the costs of spilling them are different. For example, for `LMUL>1` registers (in which, several physical registers compound a bigger logical register), the costs are larger than `LMUL=1` case (in which, there is only one physical register). To solve this problem, a new target hook `getSpillWeightScaleFactor` is added. Targets can override the default factor (which is `1.0`) according to the register class. For RISC-V, the factors are set to the `RegClassWeight` which is used to track register pressure. The values of `RegClassWeight` happen to be the number of register units. I believe all of the targets with compounded registers can benefit from this change, but only RISC-V is customized in this patch since it has widely been agreed to do so. The other targets need more performance data to go further. Partially fixes llvm#113489.
b978595
to
46b38e5
Compare
I still think we should be making active effort to just make this the default. I believe this is another case where the target hook is just being used to paper over issues in other areas (I'm not objecting to this as an incremental step). In particular in the previous revision of the patch, the AMDGPU behavior showed rematerialize was totally broken. I also think that something is wrong with whatever tablegen is doing to compute the default class weight. We have quite a lot of code scattered in the allocator and tablegen that were never written to account for subregisters. The defaults are wrong for the allocation priorities (which RISCV also does not look like it's trying to set). We also lack a splitting strategy to evict only a subregister when it would help, and spilling induces new liveness to dead lanes. |
Thanks for your precious reviewing and suggestions! I also believe this should be made default. But I am not an expert on all targets and not aware of every detail in other targets, so I can only say current approach is good for RISC-V.
Yes, I felt there are some problems when dealing with register pressure. I think AMDGPU/X86 can start to enable this as well so that we can have more separate data to support making it default (and of course uncover these issues you have said). Please feel free to add me to review when every target is going to customize the spill weight. |
I agree with this, but FYI, your last clause had gotten lost in previous review comments, at least for me. I do think it's important to highlight that the thing being papered over is for the target keeping the old behavior. Long term, I do think all targets should have this particular behavior.
I've been looking into some of this, and trying to find ways to split things apart into some actionable pieces. Incrementalism here is critical for keeping the problem vaguely understandable. However, it sounds like you have a couple cases identified that I haven't yet stumbled into to. Could we compare notes offline? |
Thanks for catching this, I hadn't been aware of that mechanism. Adding in #131176 though it doesn't seem to have too much effect. |
Sure, I have a long list of regalloc issues (and each one takes forever to work through) |
The cost of a vector spill/reload may vary highly depending on the size of the vector register being spilled, i.e. LMUL, so the usual regalloc.NumSpills/regalloc.NumReloads statistics may not be an accurate reflection of the total cost. This adds two new statistics for RISCVInstrInfo that collects the total LMUL for vector register spills and reloads. It can be used to get a better idea of regalloc changes in e.g. llvm#131176 llvm#113675
Currently, the spill weight is only determined by isDef/isUse and block frequency. However, for registers with different register classes, the costs of spilling them are different. For example, for `LMUL>1` registers (in which, several physical registers compound a bigger logical register), the costs are larger than `LMUL=1` case (in which, there is only one physical register). To solve this problem, a new target hook `getSpillWeightScaleFactor` is added. Targets can override the default factor (which is `1.0`) according to the register class. For RISC-V, the factors are set to the `RegClassWeight` which is used to track register pressure. The values of `RegClassWeight` happen to be the number of register units. I believe all of the targets with compounded registers can benefit from this change, but only RISC-V is customized in this patch since it has widely been agreed to do so. The other targets need more performance data to go further. Partially fixes llvm#113489.
The cost of a vector spill/reload may vary highly depending on the size of the vector register being spilled, i.e. LMUL, so the usual regalloc.NumSpills/regalloc.NumReloads statistics may not be an accurate reflection of the total cost. This adds two new statistics for RISCVInstrInfo that collects the total number of vector registers spilled/reloaded within groups. It can be used to get a better idea of regalloc changes in e.g. #131176 #113675
Currently, the spill weight is only determined by isDef/isUse and
block frequency. However, for registers with different register
classes, the costs of spilling them are different.
For example, for
LMUL>1
registers (in which, several physicalregisters compound a bigger logical register), the costs are larger
than
LMUL=1
case (in which, there is only one physical register).To solve this problem, a new target hook
getSpillWeightScaleFactor
is added. Targets can override the default factor (which is
1.0
)according to the register class.
For RISC-V, the factors are set to the
RegClassWeight
which isused to track register pressure. The values of
RegClassWeight
happen to be the number of register units.
I believe all of the targets with compounded registers can benefit
from this change, but only RISC-V is customized in this patch since
it has widely been agreed to do so. The other targets need more
performance data to go further.
Partially fixes #113489.