Commit 3e0d138
committed
[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize
The znver3/znver4 scheduler modules are outliers, specifying very
large LoopMicroOpBufferSizes at 512, while typical values for
other subtargets are on the order of ~50. Even if this information
is micro-architecturally correct (*), this does not mean that we
want to runtime unroll all loops to a size that completely fills
the loop buffer. Unless this is the single hot loop in the entire
application, the massive code size increase will bust the micro-op
and instruction caches.
Protect against this by clamping to the default PartialThreshold
of 150, which is the same as the default full-unroll threshold
and half the aggressive full-unroll threshold. Allowing more
partial unrolling than full unrolling is certainly non-sensical.
(*) I strongly doubt that this is actually correct -- I believe
this may derive from an incorrect reading of Agner Fog's
micro-architecture guide. The number 4096 that was originally
used here is the size of the general micro-op cache, not that of
a loop buffer. A separate loop buffer is not listed for the Zen
microarchitecture. Comparing this to the listing for Skylake, it
has a 1536 micro-op buffer, but only a 64 micro-op loopback buffer,
with a note that it's rarely fully utilized. Our scheduling model
specifies LoopMicroOpBufferSize of 50 in that case.1 parent 4d5525e commit 3e0d138
File tree
2 files changed
+30
-742
lines changed- llvm
- include/llvm/CodeGen
- test/Transforms/LoopUnroll/X86
2 files changed
+30
-742
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
575 | 575 | | |
576 | 576 | | |
577 | 577 | | |
578 | | - | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
579 | 585 | | |
580 | 586 | | |
581 | 587 | | |
| |||
0 commit comments