Commit e882246
committed
[X86] Reduce znver3/4 LoopMicroOpBufferSize to practical values
The znver3/4 scheduler models have previously associated the LoopMicroOpBufferSize with the maximum size of their op caches, and when this led to quadratic complexity issues this were reduced to a value of 512 uops, based mainly on compilation time and not its effectiveness on runtime performance.
From a runtime performance POV, a large LoopMicroOpBufferSize leads to a higher number of loop unrolls, meaning the cpu has to rely on the frontend decode rate (4 ins/cy max) for much longer to fill the op cache before looping begins and we make use of the faster op cache rate (8/9 ops/cy).
This patch proposes we instead cap the size of the LoopMicroOpBufferSize based off the maximum rate from the op cache (znver3 = 8op/cy, znver4 = 9op/cy) and the branch misprediction penalty from the opcache (~12cy) as a estimate of the useful number of ops we can unroll a loop by before mispredictions are likely to cause stalls. This isn't a perfect metric, but does try to be closer to the spirit of how we use LoopMicroOpBufferSize in the compiler vs the size of a similar naming buffer in the cpu.1 parent 72a8953 commit e882246
File tree
3 files changed
+52
-255
lines changed- llvm
- lib/Target/X86
- test/Transforms/LoopUnroll/X86
3 files changed
+52
-255
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
43 | 40 | | |
44 | 41 | | |
45 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
42 | 36 | | |
43 | 37 | | |
44 | 38 | | |
| |||
0 commit comments