Skip to content

Commit 9fa4578

Browse files
[RISCV][RegAlloc] Add getCSRFirstUseCost for RISC-V
This is based off of 63efd8e. The following table shows the percent change to the dynamic instruction count when the function in this patch returns 0 (default) versus other values. | benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16 over 0 | % speedup 64 over 0 | % speedup 128 over 0 | | --------------- | ---------------------- | --------------------- | --------------------- | -------------------- | -------------------- | | 500.perlbench_r | 0.001018570165 | 0.001049508358 | 0.001001106529 | 0.03382582818 | 0.03395354577 | | 502.gcc_r | 0.02850551412 | 0.02170512371 | 0.01453021263 | 0.06011008637 | 0.1215691521 | | 505.mcf_r | -0.00009506373338 | -0.00009090057642 | -0.0000860991497 | -0.00005027849766 | 0.00001251173791 | | 520.omnetpp_r | 0.2958940288 | 0.2959715925 | 0.2961141505 | 0.2959823497 | 0.2963124341 | | 523.xalancbmk_r | -0.0327074721 | -0.01037021046 | -0.3226810542 | 0.02127133714 | 0.02765388389 | | 525.x264_r | 0.0000001381714403 | -0.00000007041540345 | -0.00000002156399465 | 0.0000002108993364 | 0.0000002463382874 | | 531.deepsjeng_r | 0.00000000339777238 | 0.000000003874652714 | 0.000000003636212547 | 0.000000003874652714 | 0.000000003159332213 | | 541.leela_r | 0.0009186059953 | -0.000424159199 | 0.0004984456879 | 0.274948447 | 0.8135521414 | | 557.xz_r | -0.000000003547118854 | -0.00004896449559 | -0.00004910691576 | -0.0000491109983 | -0.00004895599589 | | geomean | 0.03265937388 | 0.03424232324 | -0.00107917442 | 0.07629116165 | 0.1439913192 | The following table shows the percent change to the runtime when the function in this patch returns 0 (default) versus other values. | benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16 over 0 | % speedup 64 over 0 | %speedup 128 over 0 | | --------------- | ------------------ | ------------------ | ------------------- | ------------------- | ------------------- | | 500.perlbench_r | 0.1722356761 | 0.2269681109 | 0.2596825578 | 0.361573851 | 1.15041305 | | 502.gcc_r | -0.548415855 | -0.06187002799 | -0.5553684674 | -0.8876686237 | -0.4668665535 | | 505.mcf_r | -0.8786414258 | -0.4150938441 | -1.035517726 | -0.1860770377 | -0.01904825648 | | 520.omnetpp_r | 0.4130256072 | 0.6595976188 | 0.897332171 | 0.6252625622 | 0.3869467278 | | 523.xalancbmk_r | 1.318132014 | -0.003927574 | 1.025962975 | 1.090320253 | -0.789206202 | | 525.x264_r | -0.03112871796 | -0.00167557587 | 0.06932423155 | -0.1919840015 | -0.1203585732 | | 531.deepsjeng_r | -0.259516072 | -0.01973455652 | -0.2723227894 | -0.005417022257 | -0.02222388177 | | 541.leela_r | -0.3497178495 | -0.3510447393 | 0.1274508001 | 0.6485542452 | 0.2880651727 | | 557.xz_r | 0.7683565263 | -0.2197509447 | -0.0431183874 | 0.07518130872 | 0.5236853039 | | geomean | 0.06506952742 | -0.0211865386 | 0.05072694648 | 0.1684530637 | 0.1020533557 | I chose to set the value to 64 on RISC-V because it has improvement to both the dynamic IC and the runtime and because AMDGPU set their number to 100, and callee-saved-spills are probably less expensive on RISC-V than on AMDGPU. I looked at some diff and it seems like this patch leads to two things: 1. Less spilling -- not spilling the CSR led to better register allocation and helped us avoid spills down the line 2. Avoid spilling CSR but spill more on paths that static heuristics estimate as cold.
1 parent f4d599c commit 9fa4578

File tree

3 files changed

+103
-3
lines changed

3 files changed

+103
-3
lines changed

llvm/lib/CodeGen/RegAllocGreedy.cpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2375,10 +2375,12 @@ void RAGreedy::aboutToRemoveInterval(const LiveInterval &LI) {
23752375
}
23762376

23772377
void RAGreedy::initializeCSRCost() {
2378-
// We use the larger one out of the command-line option and the value report
2379-
// by TRI.
2378+
// We use the command-line option if it is explicitly set, otherwise use the
2379+
// larger one out of the command-line option and the value reported by TRI.
23802380
CSRCost = BlockFrequency(
2381-
std::max((unsigned)CSRFirstTimeCost, TRI->getCSRFirstUseCost()));
2381+
CSRFirstTimeCost.getNumOccurrences()
2382+
? CSRFirstTimeCost
2383+
: std::max((unsigned)CSRFirstTimeCost, TRI->getCSRFirstUseCost()));
23822384
if (!CSRCost.getFrequency())
23832385
return;
23842386

llvm/lib/Target/RISCV/RISCVRegisterInfo.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,13 @@ struct RISCVRegisterInfo : public RISCVGenRegisterInfo {
6161
const uint32_t *getCallPreservedMask(const MachineFunction &MF,
6262
CallingConv::ID) const override;
6363

64+
unsigned getCSRFirstUseCost() const override {
65+
// The cost will be compared against BlockFrequency where entry has the
66+
// value of 1 << 14. A value of 64 will choose to spill or split cold
67+
// path instead of using a callee-saved register.
68+
return 64;
69+
}
70+
6471
const MCPhysReg *getCalleeSavedRegs(const MachineFunction *MF) const override;
6572

6673
const MCPhysReg *getIPRACSRegs(const MachineFunction *MF) const override;
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc %s -mtriple=riscv64 -regalloc-csr-first-time-cost=0 | FileCheck %s -check-prefix=ZERO-COST
3+
; RUN: llc %s -mtriple=riscv64 -regalloc-csr-first-time-cost=64 | FileCheck %s -check-prefix=SOME-COST
4+
5+
define fastcc void @Perl_sv_setnv(ptr %.str.54.3682) nounwind {
6+
; ZERO-COST-LABEL: Perl_sv_setnv:
7+
; ZERO-COST: # %bb.0: # %entry
8+
; ZERO-COST-NEXT: addi sp, sp, -32
9+
; ZERO-COST-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
10+
; ZERO-COST-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
11+
; ZERO-COST-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
12+
; ZERO-COST-NEXT: bnez zero, .LBB0_5
13+
; ZERO-COST-NEXT: # %bb.1: # %entry
14+
; ZERO-COST-NEXT: li a1, 1
15+
; ZERO-COST-NEXT: bnez a1, .LBB0_6
16+
; ZERO-COST-NEXT: .LBB0_2: # %entry
17+
; ZERO-COST-NEXT: mv s0, a0
18+
; ZERO-COST-NEXT: beqz zero, .LBB0_4
19+
; ZERO-COST-NEXT: # %bb.3: # %sw.bb34.i
20+
; ZERO-COST-NEXT: li s0, 0
21+
; ZERO-COST-NEXT: .LBB0_4: # %Perl_sv_reftype.exit
22+
; ZERO-COST-NEXT: li s1, 0
23+
; ZERO-COST-NEXT: li a0, 0
24+
; ZERO-COST-NEXT: li a1, 0
25+
; ZERO-COST-NEXT: jalr s1
26+
; ZERO-COST-NEXT: li a0, 0
27+
; ZERO-COST-NEXT: mv a1, s0
28+
; ZERO-COST-NEXT: li a2, 0
29+
; ZERO-COST-NEXT: jalr s1
30+
; ZERO-COST-NEXT: .LBB0_5: # %entry
31+
; ZERO-COST-NEXT: beqz zero, .LBB0_2
32+
; ZERO-COST-NEXT: .LBB0_6: # %sw.bb3
33+
; ZERO-COST-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
34+
; ZERO-COST-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
35+
; ZERO-COST-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
36+
; ZERO-COST-NEXT: addi sp, sp, 32
37+
; ZERO-COST-NEXT: ret
38+
;
39+
; SOME-COST-LABEL: Perl_sv_setnv:
40+
; SOME-COST: # %bb.0: # %entry
41+
; SOME-COST-NEXT: addi sp, sp, -32
42+
; SOME-COST-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
43+
; SOME-COST-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
44+
; SOME-COST-NEXT: bnez zero, .LBB0_5
45+
; SOME-COST-NEXT: # %bb.1: # %entry
46+
; SOME-COST-NEXT: li a1, 1
47+
; SOME-COST-NEXT: bnez a1, .LBB0_6
48+
; SOME-COST-NEXT: .LBB0_2: # %entry
49+
; SOME-COST-NEXT: sd a0, 8(sp) # 8-byte Folded Spill
50+
; SOME-COST-NEXT: beqz zero, .LBB0_4
51+
; SOME-COST-NEXT: # %bb.3: # %sw.bb34.i
52+
; SOME-COST-NEXT: sd zero, 8(sp) # 8-byte Folded Spill
53+
; SOME-COST-NEXT: .LBB0_4: # %Perl_sv_reftype.exit
54+
; SOME-COST-NEXT: li s0, 0
55+
; SOME-COST-NEXT: li a0, 0
56+
; SOME-COST-NEXT: li a1, 0
57+
; SOME-COST-NEXT: jalr s0
58+
; SOME-COST-NEXT: li a0, 0
59+
; SOME-COST-NEXT: ld a1, 8(sp) # 8-byte Folded Reload
60+
; SOME-COST-NEXT: li a2, 0
61+
; SOME-COST-NEXT: jalr s0
62+
; SOME-COST-NEXT: .LBB0_5: # %entry
63+
; SOME-COST-NEXT: beqz zero, .LBB0_2
64+
; SOME-COST-NEXT: .LBB0_6: # %sw.bb3
65+
; SOME-COST-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
66+
; SOME-COST-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
67+
; SOME-COST-NEXT: addi sp, sp, 32
68+
; SOME-COST-NEXT: ret
69+
entry:
70+
switch i8 0, label %Perl_sv_reftype.exit [
71+
i8 1, label %sw.bb4
72+
i8 12, label %sw.bb34.i
73+
i8 3, label %sw.bb3
74+
i8 0, label %sw.bb3
75+
]
76+
77+
sw.bb3: ; preds = %entry, %entry
78+
ret void
79+
80+
sw.bb4: ; preds = %entry
81+
br label %Perl_sv_reftype.exit
82+
83+
sw.bb34.i: ; preds = %entry
84+
br label %Perl_sv_reftype.exit
85+
86+
Perl_sv_reftype.exit: ; preds = %sw.bb34.i, %sw.bb4, %entry
87+
%retval.0.i = phi ptr [ null, %sw.bb34.i ], [ null, %sw.bb4 ], [ %.str.54.3682, %entry ]
88+
%call17 = tail call fastcc i64 null(ptr null, i32 0)
89+
tail call void (ptr, ...) null(ptr null, ptr %retval.0.i, ptr null)
90+
unreachable
91+
}

0 commit comments

Comments
 (0)