-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Loop cloning is not a good citizen when it comes to PGO. There are two aspects to this:
- Cloning heuristics should take PGO into account (along with other things). The benefit of cloning is in part related to the relative weight of the loop body, which is knowable with PGO. It is also related to the relative trip count of the loop, which is also knowable with PGO. So a loop that executes a lot and iterates a lot is a better cloning candidate than a loop that either doesn't execute a lot or doesn't iterate a lot. Within the loop the relative frequency of the operations that can be bypassed or further optimized with cloning should also be taken into account (eg if there's a bounds check but only under some rarely true condition, the benefit of cloning is reduced).
- Cloning does not make appropriate updates to profile data (see example below). One would imagine that post cloning the profile data should show that we now think the code is more likely to execute the cloned loop; otherwise why would we have cloned in the first place? Exactly how to split up the profile data is perhaps an open question. In cases like ours where the focus is on bounds checks and control flow checks are implicit there is no obvious signal we can measure (short of tracking how often exceptions are thrown) and we may have to simply guess; something like a 0.1/0.9 split seems plausible but we might consider even more biased splits.
If we start cloning to enable explicit control flow optimization (aka "loop unswitching") then we will have data telling us roughly how to divide up the flow. For example if the loop body contains a loop-invariant test and branch (or a test that can be rendered loop invariant with a suitable upfront check) we will know how often the branches were taken from PGO data and that tells us how to split the profile data.
An example of what happens today:
Before | After |
![]() | ![]() |
Note how cloning duplicates the block weights for the loop blocks, doesn't set reasonable weights for the new blocks it adds, and messes up the edge weights even in some "remote" parts of the flow graph (suspect this is because fgUpdateChangedFlowGraph
recomputes the pred lists, which seems a bit odd).
category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:medium
Metadata
Metadata
Assignees
Labels
Type
Projects
Status