-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Use a higher tier-up threshold for JIT code #126795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is 'speedup' comparing to result for 16? I have no idea how likely significant correlations between parameters are for this problem, but in general, when doing multidimensional optimization 1 dimension at a time, I would recheck after doing all dimensions that earlier settings are still optimal. |
Yes! Sorry if that wasn't clear.
Yeah, that's a good idea. I don't know if I'll do another full sweep, but spot-checking the "neighbors" of the current value over time seems useful. |
@brandtbucher I assume that powers of 2 are used because of the exponential backoff? |
It's nice, but they aren't needed (our exponential backoff works fine with non-power-of-two initial values). I mainly chose powers of two because it's a pretty efficient way to search a half-open range of possible values. ;)
I'd like to avoid overfitting to the benchmarks. There's not going to be some "best" number, just a range of values that work well in practice. Being in the right order of magnitude is probably good enough, especially since good warmup values are very sensitive to different workloads and platforms (and as Terry mentioned, we'll probably want to continue tweaking the values over time). |
(Plus each benchmarking run takes several hours, and there's a clear plateau near the current chosen value.) |
The results of similar experiments with the threshold for warming up side-exits (currently set at 64):
The results are less dramatic here, but it does seem like switching to 4096 here too would result in small performance improvements and memory savings, with no real hit to uops executed. Note that these new measurements were taken after the other threshold change to 4096 landed, so they accurately depict the improvements we'd see with the new values. |
Last one to tweak, the "cold executor" invalidation threshold (currently set at 100000):
This seems like it's in a good place, though we might consider higher values in the future. |
@brandtbucher for which platform are these results? |
Maybe in the future, but right now things are changing frequently enough that it's probably fine to stick with a simpler set of "ballpark" numbers for now, then do fine-tuning per-platform later. |
Our current tier-up threshold is 16, which was chosen a while ago because:
It turns out that we're leaving significant performance and memory improvements on the table by not using higher thresholds. Here are the results of some experiments I ran:
* For warmups above 4096, exponential backoff is disabled.
Based on these numbers, I think 4096 as a new threshold makes sense (2% faster and 3% less memory without significant hits to the amount of work we actually do in JIT code). I'll open a PR.
My next steps will be conducting similar experiments with higher side-exit warmup values, and then lastly with different
JIT_CLEANUP_THRESHOLD
values.Linked PRs
The text was updated successfully, but these errors were encountered: