Skip to content

[MachineOutliner] Efficient Implementation of MachineOutliner::findCandidates() #90260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 3, 2024

Conversation

xuanzhang816
Copy link
Contributor

@xuanzhang816 xuanzhang816 commented Apr 26, 2024

This reduce the time complexity of the main loop of findCandidates() method from $O(n^2)$ to $O(n \log n)$.

For small $n$, the modification does not regress the build time, but it helps significantly when $n$ is large.

For one application, this reduces the runtime of the main loop from 120 seconds to 28 seconds.

This is the first commit for an enhanced version of machine outliner -- see RFC.

@xuanzhang816 xuanzhang816 changed the title efficient implementation of MachineOutliner::findCandidates() Efficient Implementation of MachineOutliner::findCandidates() Apr 29, 2024
@xuanzhang816 xuanzhang816 marked this pull request as ready for review May 2, 2024 19:42
@llvmbot
Copy link
Member

llvmbot commented May 2, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Xuan Zhang (xuanzh-meta)

Changes

This reduce the time complexity of the main loop of findCandidates() method from $O(n^2)$ to $O(n \log n)$.

For small $n$, the modification does not regress the build time, but it helps significantly when $n$ is large.

For one application, this reduces the runtime of the main loop from 120 seconds to 28 seconds.

This is the first commit for an enhanced version of machine outliner -- see RFC.


Full diff: https://github.com/llvm/llvm-project/pull/90260.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineOutliner.cpp (+12-10)
  • (modified) llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir (+6-6)
diff --git a/llvm/lib/CodeGen/MachineOutliner.cpp b/llvm/lib/CodeGen/MachineOutliner.cpp
index dc2f5ef15206e8..d553c0e6d24772 100644
--- a/llvm/lib/CodeGen/MachineOutliner.cpp
+++ b/llvm/lib/CodeGen/MachineOutliner.cpp
@@ -593,7 +593,11 @@ void MachineOutliner::findCandidates(
     unsigned NumDiscarded = 0;
     unsigned NumKept = 0;
 #endif
-    for (const unsigned &StartIdx : RS.StartIndices) {
+    // Sort the start indices so that we can efficiently check if candidates
+    // overlap with each other in MachineOutliner::findCandidates().
+    SmallVector<unsigned> SortedStartIndices(RS.StartIndices);
+    llvm::sort(SortedStartIndices);
+    for (const unsigned &StartIdx : SortedStartIndices) {
       // Trick: Discard some candidates that would be incompatible with the
       // ones we've already found for this sequence. This will save us some
       // work in candidate selection.
@@ -616,17 +620,15 @@ void MachineOutliner::findCandidates(
       // * End before the other starts
       // * Start after the other ends
       unsigned EndIdx = StartIdx + StringLen - 1;
-      auto FirstOverlap = find_if(
-          CandidatesForRepeatedSeq, [StartIdx, EndIdx](const Candidate &C) {
-            return EndIdx >= C.getStartIdx() && StartIdx <= C.getEndIdx();
-          });
-      if (FirstOverlap != CandidatesForRepeatedSeq.end()) {
+      if (CandidatesForRepeatedSeq.size() > 0 &&
+          StartIdx <= CandidatesForRepeatedSeq.back().getEndIdx()) {
 #ifndef NDEBUG
         ++NumDiscarded;
-        LLVM_DEBUG(dbgs() << "    .. DISCARD candidate @ [" << StartIdx
-                          << ", " << EndIdx << "]; overlaps with candidate @ ["
-                          << FirstOverlap->getStartIdx() << ", "
-                          << FirstOverlap->getEndIdx() << "]\n");
+        LLVM_DEBUG(dbgs() << "    .. DISCARD candidate @ [" << StartIdx << ", "
+                          << EndIdx << "]; overlaps with candidate @ ["
+                          << CandidatesForRepeatedSeq.back().getStartIdx()
+                          << ", " << CandidatesForRepeatedSeq.back().getEndIdx()
+                          << "]\n");
 #endif
         continue;
       }
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir b/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir
index 649bb33828c32c..c6bd4c1d04d871 100644
--- a/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-overlap.mir
@@ -8,27 +8,27 @@
 # CHECK-NEXT:    Candidates discarded: 0
 # CHECK-NEXT:    Candidates kept: 2
 # CHECK-DAG:  Sequence length: 8
-# CHECK-NEXT:    .. DISCARD candidate @ [5, 12]; overlaps with candidate @ [12, 19]
+# CHECK-NEXT:    .. DISCARD candidate @ [12, 19]; overlaps with candidate @ [5, 12]
 # CHECK-NEXT:    Candidates discarded: 1
 # CHECK-NEXT:    Candidates kept: 1
 # CHECK-DAG:   Sequence length: 9
-# CHECK-NEXT:    .. DISCARD candidate @ [4, 12]; overlaps with candidate @ [11, 19]
+# CHECK-NEXT:    .. DISCARD candidate @ [11, 19]; overlaps with candidate @ [4, 12]
 # CHECK-NEXT:    Candidates discarded: 1
 # CHECK-NEXT:    Candidates kept: 1
 # CHECK-DAG:   Sequence length: 10
-# CHECK-NEXT:    .. DISCARD candidate @ [3, 12]; overlaps with candidate @ [10, 19]
+# CHECK-NEXT:    .. DISCARD candidate @ [10, 19]; overlaps with candidate @ [3, 12]
 # CHECK-NEXT:    Candidates discarded: 1
 # CHECK-NEXT:    Candidates kept: 1
 # CHECK-DAG:   Sequence length: 11
-# CHECK-NEXT:    .. DISCARD candidate @ [2, 12]; overlaps with candidate @ [9, 19]
+# CHECK-NEXT:    .. DISCARD candidate @ [9, 19]; overlaps with candidate @ [2, 12]
 # CHECK-NEXT:    Candidates discarded: 1
 # CHECK-NEXT:    Candidates kept: 1
 # CHECK-DAG:   Sequence length: 12
-# CHECK-NEXT:    .. DISCARD candidate @ [1, 12]; overlaps with candidate @ [8, 19]
+# CHECK-NEXT:    .. DISCARD candidate @ [8, 19]; overlaps with candidate @ [1, 12]
 # CHECK-NEXT:    Candidates discarded: 1
 # CHECK-NEXT:    Candidates kept: 1
 # CHECK-DAG:   Sequence length: 13
-# CHECK-NEXT:    .. DISCARD candidate @ [0, 12]; overlaps with candidate @ [7, 19]
+# CHECK-NEXT:    .. DISCARD candidate @ [7, 19]; overlaps with candidate @ [0, 12]
 # CHECK-NEXT:    Candidates discarded: 1
 # CHECK-NEXT:    Candidates kept: 1
 

@xuanzhang816 xuanzhang816 changed the title Efficient Implementation of MachineOutliner::findCandidates() [MachineOutliner] Efficient Implementation of MachineOutliner::findCandidates() May 2, 2024
@xuanzhang816
Copy link
Contributor Author

@kyulee-com

@kyulee-com kyulee-com requested a review from ornata May 3, 2024 14:01
Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me, but let others review it further.

@kyulee-com kyulee-com requested review from plotfi and River707 May 6, 2024 00:18
@xuanzhang816
Copy link
Contributor Author

Hi @ornata, I want to follow up on the review for this PR. Appreciate it if you could take a look when you get a chance!

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ltgm. however, I'd like to hear other opinions.
@ornata Do you have concerns or comments on this direction?

@@ -593,6 +593,9 @@ void MachineOutliner::findCandidates(
unsigned NumDiscarded = 0;
unsigned NumKept = 0;
#endif
// Sort the start indices so that we can efficiently check if candidates
// overlap with each other in MachineOutliner::findCandidates().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why mention this function's name here (MachineOutliner::findCandidates()). From the wording it would appear that it would referring to another function. Perhaps "overlap with each other further down" will be clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments. Modified!

Copy link
Contributor

@alx32 alx32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kyulee-com kyulee-com merged commit 16c925a into llvm:main Jun 3, 2024
7 checks passed
kyulee-com pushed a commit that referenced this pull request Jun 7, 2024
This PR depends on #90260

We changed the order in which functions are outlined in Machine
Outliner.

The formula for priority is found via a black-box Bayesian optimization
toolbox. Using this formula for sorting consistently reduces the
uncompressed size of large real-world mobile apps. We also ran a few
benchmarks using LLVM test suites, and showed that sorting by priority
consistently reduces the text segment size.

|run (CTMark/)   |baseline (1)|priority (2)|diff (1 -> 2)|
|----------------|------------|------------|-------------|
|lencod          |349624      |349264      |-0.1030%     |
|SPASS           |219672      |219480      |-0.0874%     |
|kc              |271956      |251200      |-7.6321%     |
|sqlite3         |223920      |223708      |-0.0947%     |
|7zip-benchmark  |405364      |402624      |-0.6759%     |
|bullet          |139820      |139500      |-0.2289%     |
|consumer-typeset|295684      |290196      |-1.8560%     |
|pairlocalalign  |72236       |72092       |-0.1993%     |
|tramp3d-v4      |189572      |189292      |-0.1477%     |

This is part of an enhanced version of machine outliner -- see
[RFC](https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-1-fulllto-part-2-thinlto-nolto-to-come/78732).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants