Closed
Description
Our current tier-up threshold is 16, which was chosen a while ago because:
- in theory, it gives some of our 16-bit branch counters time to stabilize
- it seemed to work fine in practice
It turns out that we're leaving significant performance and memory improvements on the table by not using higher thresholds. Here are the results of some experiments I ran:
warmup | speedup | memory | traces created | traces executed | uops executed |
---|---|---|---|---|---|
64 | +0.3% | -1.2% | -8.0% | -0.1% | +0.2% |
256 | +1.0% | -2.6% | -22.0% | -0.7% | -1.3% |
1024 | +1.2% | -3.2% | -38.6% | -3.0% | -1.5% |
2048 | +1.1% | -3.3% | -44.9% | -12.4% | -3.8% |
4096 | +2.1% | -3.6% | -52.2% | -11.2% | -3.1% |
8192* | +2.0% | -3.4% | -59.2% | -12.8% | -3.1% |
16384* | +2.0% | -3.6% | -65.2% | -14.5% | -4.7% |
32768* | +1.8% | -3.8% | -73.1% | -18.3% | -7.1% |
65536* | +1.4% | -3.9% | -79.7% | -21.9% | -9.2% |
* For warmups above 4096, exponential backoff is disabled.
Based on these numbers, I think 4096 as a new threshold makes sense (2% faster and 3% less memory without significant hits to the amount of work we actually do in JIT code). I'll open a PR.
My next steps will be conducting similar experiments with higher side-exit warmup values, and then lastly with different JIT_CLEANUP_THRESHOLD
values.