Skip to content

loop cloning and pgo #48850

@AndyAyersMS

Description

@AndyAyersMS

Loop cloning is not a good citizen when it comes to PGO. There are two aspects to this:

  1. Cloning heuristics should take PGO into account (along with other things). The benefit of cloning is in part related to the relative weight of the loop body, which is knowable with PGO. It is also related to the relative trip count of the loop, which is also knowable with PGO. So a loop that executes a lot and iterates a lot is a better cloning candidate than a loop that either doesn't execute a lot or doesn't iterate a lot. Within the loop the relative frequency of the operations that can be bypassed or further optimized with cloning should also be taken into account (eg if there's a bounds check but only under some rarely true condition, the benefit of cloning is reduced).
  2. Cloning does not make appropriate updates to profile data (see example below). One would imagine that post cloning the profile data should show that we now think the code is more likely to execute the cloned loop; otherwise why would we have cloned in the first place? Exactly how to split up the profile data is perhaps an open question. In cases like ours where the focus is on bounds checks and control flow checks are implicit there is no obvious signal we can measure (short of tracking how often exceptions are thrown) and we may have to simply guess; something like a 0.1/0.9 split seems plausible but we might consider even more biased splits.

If we start cloning to enable explicit control flow optimization (aka "loop unswitching") then we will have data telling us roughly how to divide up the flow. For example if the loop body contains a loop-invariant test and branch (or a test that can be rendered loop invariant with a suitable upfront check) we will know how often the branches were taken from PGO data and that tells us how to split the profile data.

An example of what happens today:

BeforeAfter

Note how cloning duplicates the block weights for the loop blocks, doesn't set reasonable weights for the new blocks it adds, and messes up the edge weights even in some "remote" parts of the flow graph (suspect this is because fgUpdateChangedFlowGraph recomputes the pred lists, which seems a bit odd).

cc @BruceForstall

category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    Status

    Optimizations

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions