-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[MachineOutliner] Efficient Implementation of MachineOutliner::findCandidates() #90260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Xuan Zhang (xuanzh-meta) ChangesThis reduce the time complexity of the main loop of For small For one application, this reduces the runtime of the main loop from 120 seconds to 28 seconds. This is the first commit for an enhanced version of machine outliner -- see RFC. Full diff: https://github.com/llvm/llvm-project/pull/90260.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineOutliner.cpp b/llvm/lib/CodeGen/MachineOutliner.cpp
index dc2f5ef15206e8..d553c0e6d24772 100644
--- a/llvm/lib/CodeGen/MachineOutliner.cpp
+++ b/llvm/lib/CodeGen/MachineOutliner.cpp
@@ -593,7 +593,11 @@ void MachineOutliner::findCandidates(
unsigned NumDiscarded = 0;
unsigned NumKept = 0;
#endif
- for (const unsigned &StartIdx : RS.StartIndices) {
+ // Sort the start indices so that we can efficiently check if candidates
+ // overlap with each other in MachineOutliner::findCandidates().
+ SmallVector<unsigned> SortedStartIndices(RS.StartIndices);
+ llvm::sort(SortedStartIndices);
+ for (const unsigned &StartIdx : SortedStartIndices) {
// Trick: Discard some candidates that would be incompatible with the
// ones we've already found for this sequence. This will save us some
// work in candidate selection.
@@ -616,17 +620,15 @@ void MachineOutliner::findCandidates(
// * End before the other starts
// * Start after the other ends
unsigned EndIdx = StartIdx + StringLen - 1;
- auto FirstOverlap = find_if(
- CandidatesForRepeatedSeq, [StartIdx, EndIdx](const Candidate &C) {
- return EndIdx >= C.getStartIdx() && StartIdx <= C.getEndIdx();
- });
- if (FirstOverlap != CandidatesForRepeatedSeq.end()) {
+ if (CandidatesForRepeatedSeq.size() > 0 &&
+ StartIdx <= CandidatesForRepeatedSeq.back().getEndIdx()) {
#ifndef NDEBUG
++NumDiscarded;
- LLVM_DEBUG(dbgs() << " .. DISCARD candidate @ [" << StartIdx
- << ", " << EndIdx << "]; overlaps with candidate @ ["
- << FirstOverlap->getStartIdx() << ", "
- << FirstOverlap->getEndIdx() << "]\n");
+ LLVM_DEBUG(dbgs() << " .. DISCARD candidate @ [" << StartIdx << ", "
+ << EndIdx << "]; overlaps with candidate @ ["
+ << CandidatesForRepeatedSeq.back().getStartIdx()
+ << ", " << CandidatesForRepeatedSeq.back().getEndIdx()
+ << "]\n");
#endif
continue;
}
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir b/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir
index 649bb33828c32c..c6bd4c1d04d871 100644
--- a/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir
@@ -8,27 +8,27 @@
# CHECK-NEXT: Candidates discarded: 0
# CHECK-NEXT: Candidates kept: 2
# CHECK-DAG: Sequence length: 8
-# CHECK-NEXT: .. DISCARD candidate @ [5, 12]; overlaps with candidate @ [12, 19]
+# CHECK-NEXT: .. DISCARD candidate @ [12, 19]; overlaps with candidate @ [5, 12]
# CHECK-NEXT: Candidates discarded: 1
# CHECK-NEXT: Candidates kept: 1
# CHECK-DAG: Sequence length: 9
-# CHECK-NEXT: .. DISCARD candidate @ [4, 12]; overlaps with candidate @ [11, 19]
+# CHECK-NEXT: .. DISCARD candidate @ [11, 19]; overlaps with candidate @ [4, 12]
# CHECK-NEXT: Candidates discarded: 1
# CHECK-NEXT: Candidates kept: 1
# CHECK-DAG: Sequence length: 10
-# CHECK-NEXT: .. DISCARD candidate @ [3, 12]; overlaps with candidate @ [10, 19]
+# CHECK-NEXT: .. DISCARD candidate @ [10, 19]; overlaps with candidate @ [3, 12]
# CHECK-NEXT: Candidates discarded: 1
# CHECK-NEXT: Candidates kept: 1
# CHECK-DAG: Sequence length: 11
-# CHECK-NEXT: .. DISCARD candidate @ [2, 12]; overlaps with candidate @ [9, 19]
+# CHECK-NEXT: .. DISCARD candidate @ [9, 19]; overlaps with candidate @ [2, 12]
# CHECK-NEXT: Candidates discarded: 1
# CHECK-NEXT: Candidates kept: 1
# CHECK-DAG: Sequence length: 12
-# CHECK-NEXT: .. DISCARD candidate @ [1, 12]; overlaps with candidate @ [8, 19]
+# CHECK-NEXT: .. DISCARD candidate @ [8, 19]; overlaps with candidate @ [1, 12]
# CHECK-NEXT: Candidates discarded: 1
# CHECK-NEXT: Candidates kept: 1
# CHECK-DAG: Sequence length: 13
-# CHECK-NEXT: .. DISCARD candidate @ [0, 12]; overlaps with candidate @ [7, 19]
+# CHECK-NEXT: .. DISCARD candidate @ [7, 19]; overlaps with candidate @ [0, 12]
# CHECK-NEXT: Candidates discarded: 1
# CHECK-NEXT: Candidates kept: 1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good to me, but let others review it further.
Hi @ornata, I want to follow up on the review for this PR. Appreciate it if you could take a look when you get a chance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ltgm. however, I'd like to hear other opinions.
@ornata Do you have concerns or comments on this direction?
8522f88
to
632f482
Compare
llvm/lib/CodeGen/MachineOutliner.cpp
Outdated
@@ -593,6 +593,9 @@ void MachineOutliner::findCandidates( | |||
unsigned NumDiscarded = 0; | |||
unsigned NumKept = 0; | |||
#endif | |||
// Sort the start indices so that we can efficiently check if candidates | |||
// overlap with each other in MachineOutliner::findCandidates(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why mention this function's name here (MachineOutliner::findCandidates()
). From the wording it would appear that it would referring to another function. Perhaps "overlap with each other further down" will be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. Modified!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR depends on #90260 We changed the order in which functions are outlined in Machine Outliner. The formula for priority is found via a black-box Bayesian optimization toolbox. Using this formula for sorting consistently reduces the uncompressed size of large real-world mobile apps. We also ran a few benchmarks using LLVM test suites, and showed that sorting by priority consistently reduces the text segment size. |run (CTMark/) |baseline (1)|priority (2)|diff (1 -> 2)| |----------------|------------|------------|-------------| |lencod |349624 |349264 |-0.1030% | |SPASS |219672 |219480 |-0.0874% | |kc |271956 |251200 |-7.6321% | |sqlite3 |223920 |223708 |-0.0947% | |7zip-benchmark |405364 |402624 |-0.6759% | |bullet |139820 |139500 |-0.2289% | |consumer-typeset|295684 |290196 |-1.8560% | |pairlocalalign |72236 |72092 |-0.1993% | |tramp3d-v4 |189572 |189292 |-0.1477% | This is part of an enhanced version of machine outliner -- see [RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
This reduce the time complexity of the main loop of$O(n^2)$ to $O(n \log n)$ .
findCandidates()
method fromFor small$n$ , the modification does not regress the build time, but it helps significantly when $n$ is large.
For one application, this reduces the runtime of the main loop from 120 seconds to 28 seconds.
This is the first commit for an enhanced version of machine outliner -- see RFC.