-
Notifications
You must be signed in to change notification settings - Fork 14.6k
[LoopSink] Exit loop finding BBs to sink into early when possible (NFC) #101115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As noted in the comments, findBBsToSinkInto is O(UseBBs.size() * ColdLoopBBs.size()) A very large function with a huge loop was incurring a high compile time in this code. The size of the ColdLoopBBs set was over 14K. There is a limit on the size of the UseBBs set, but not the ColdLoopBBs (and adding a limit for the latter actually slowed down some later passes). This change exits the loop early once we detect that there is no further refinement possible for the BBsToSinkInto set. This is possible because the ColdLoopBBs set is sorted in ascending magnitude of frequency.
@llvm/pr-subscribers-llvm-transforms Author: Teresa Johnson (teresajohnson) ChangesAs noted in the comments, findBBsToSinkInto is A very large function with a huge loop was incurring a high compile time This change exits the loop early once we detect that there is no further Full diff: https://github.com/llvm/llvm-project/pull/101115.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LoopSink.cpp b/llvm/lib/Transforms/Scalar/LoopSink.cpp
index 6eedf95e7575e..5c6ed8487bbd1 100644
--- a/llvm/lib/Transforms/Scalar/LoopSink.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopSink.cpp
@@ -144,7 +144,23 @@ findBBsToSinkInto(const Loop &L, const SmallPtrSetImpl<BasicBlock *> &UseBBs,
BBsToSinkInto.erase(DominatedBB);
}
BBsToSinkInto.insert(ColdestBB);
+ continue;
}
+ // Otherwise, see if we can stop the search through the cold BBs early.
+ // Since the ColdLoopBBs list is sorted in increasing magnitude of
+ // frequency the cold BB frequencies can only get larger. The
+ // BBsToSinkInto set can only get smaller and have a smaller
+ // adjustedSumFreq, due to the earlier checking. So once we find a cold BB
+ // with a frequency at least as large as the adjustedSumFreq of the
+ // current BBsToSinkInto set, the earlier frequency check can never be
+ // true for a future iteration. Note we could do check this more
+ // aggressively earlier, but in practice this ended up being more
+ // expensive overall (added checking to the critical path through the loop
+ // that often ended up continuing early due to an empty
+ // BBsDominatedByColdestBB set, and the frequency check there was false
+ // most of the time anyway).
+ if (adjustedSumFreq(BBsToSinkInto, BFI) <= BFI.getBlockFreq(ColdestBB))
+ break;
}
// Can't sink into blocks that have no valid insertion point.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/2589 Here is the relevant piece of the build log for the reference:
|
As noted in the comments, findBBsToSinkInto is
O(UseBBs.size() * ColdLoopBBs.size())
A very large function with a huge loop was incurring a high compile time
in this code. The size of the ColdLoopBBs set was over 14K. There is a
limit on the size of the UseBBs set, but not the ColdLoopBBs (and adding
a limit for the latter actually slowed down some later passes).
This change exits the loop early once we detect that there is no further
refinement possible for the BBsToSinkInto set. This is possible because
the ColdLoopBBs set is sorted in ascending magnitude of frequency.
This cut down the LoopSinkPass time by around 33% (78s to just over 50s).