Skip to content

[lld]Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF #117514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

Colibrow
Copy link
Contributor

@Colibrow Colibrow commented Nov 25, 2024

Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Nov 25, 2024

@llvm/pr-subscribers-lld-wasm
@llvm/pr-subscribers-lld-coff
@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld-elf

Author: Max (Colibrow)

Changes

Extend balanced partitioning implementation to support ELF binaries, enabling the same startup time and compressed size optimizations previously available for MachO.

This allows ELF binaries to benefit from profile-guided function ordering and compression-based section ordering.

Add the lld flags --irpgo-profile-sort=<profile> and --compression-sort={function,data,both}.

Thanks to the @ellishg, @thevinster, and their team's work.


Patch is 60.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117514.diff

15 Files Affected:

  • (modified) lld/Common/CMakeLists.txt (+1)
  • (added) lld/Common/SectionOrderer.cpp (+383)
  • (added) lld/ELF/BPSectionOrderer.cpp (+50)
  • (added) lld/ELF/BPSectionOrderer.h (+140)
  • (modified) lld/ELF/CMakeLists.txt (+1)
  • (modified) lld/ELF/Config.h (+5)
  • (modified) lld/ELF/Driver.cpp (+21)
  • (modified) lld/ELF/Options.td (+14)
  • (modified) lld/ELF/Writer.cpp (+12-1)
  • (modified) lld/MachO/BPSectionOrderer.cpp (+13-397)
  • (modified) lld/MachO/BPSectionOrderer.h (+136)
  • (added) lld/include/lld/Common/SectionOrderer.h (+75)
  • (added) lld/test/ELF/bp-section-orderer-errs.s (+19)
  • (added) lld/test/ELF/bp-section-orderer-stress.s (+105)
  • (added) lld/test/ELF/bp-section-orderer.s (+123)
diff --git a/lld/Common/CMakeLists.txt b/lld/Common/CMakeLists.txt
index 4f503d04f7844f..bd5a40af41c1bc 100644
--- a/lld/Common/CMakeLists.txt
+++ b/lld/Common/CMakeLists.txt
@@ -31,6 +31,7 @@ add_lld_library(lldCommon
   Filesystem.cpp
   Memory.cpp
   Reproduce.cpp
+  SectionOrderer.cpp
   Strings.cpp
   TargetOptionsCommandFlags.cpp
   Timer.cpp
diff --git a/lld/Common/SectionOrderer.cpp b/lld/Common/SectionOrderer.cpp
new file mode 100644
index 00000000000000..64c78030f3427f
--- /dev/null
+++ b/lld/Common/SectionOrderer.cpp
@@ -0,0 +1,383 @@
+//===- SectionOrderer.cpp---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "lld/Common/SectionOrderer.h"
+#include "lld/Common/ErrorHandler.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ProfileData/InstrProfReader.h"
+#include "llvm/Support/BalancedPartitioning.h"
+#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/xxhash.h"
+
+#define DEBUG_TYPE "bp-section-orderer"
+using namespace llvm;
+using UtilityNodes = SmallVector<BPFunctionNode::UtilityNodeT>;
+
+namespace lld {
+
+static SmallVector<std::pair<unsigned, UtilityNodes>> getUnsForCompression(
+    ArrayRef<const BPSectionBase *> sections,
+    const DenseMap<const BPSectionBase *, uint64_t> &sectionToIdx,
+    ArrayRef<unsigned> sectionIdxs,
+    DenseMap<unsigned, SmallVector<unsigned>> *duplicateSectionIdxs,
+    BPFunctionNode::UtilityNodeT &maxUN) {
+  TimeTraceScope timeScope("Build nodes for compression");
+
+  SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> sectionHashes;
+  sectionHashes.reserve(sectionIdxs.size());
+  SmallVector<uint64_t> hashes;
+
+  for (unsigned sectionIdx : sectionIdxs) {
+    const auto *isec = sections[sectionIdx];
+    isec->getSectionHash(hashes, sectionToIdx);
+    sectionHashes.emplace_back(sectionIdx, std::move(hashes));
+    hashes.clear();
+  }
+
+  DenseMap<uint64_t, unsigned> hashFrequency;
+  for (auto &[sectionIdx, hashes] : sectionHashes)
+    for (auto hash : hashes)
+      ++hashFrequency[hash];
+
+  if (duplicateSectionIdxs) {
+    // Merge section that are nearly identical
+    SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> newSectionHashes;
+    DenseMap<uint64_t, unsigned> wholeHashToSectionIdx;
+    for (auto &[sectionIdx, hashes] : sectionHashes) {
+      uint64_t wholeHash = 0;
+      for (auto hash : hashes)
+        if (hashFrequency[hash] > 5)
+          wholeHash ^= hash;
+      auto [it, wasInserted] =
+          wholeHashToSectionIdx.insert(std::make_pair(wholeHash, sectionIdx));
+      if (wasInserted) {
+        newSectionHashes.emplace_back(sectionIdx, hashes);
+      } else {
+        (*duplicateSectionIdxs)[it->getSecond()].push_back(sectionIdx);
+      }
+    }
+    sectionHashes = newSectionHashes;
+
+    // Recompute hash frequencies
+    hashFrequency.clear();
+    for (auto &[sectionIdx, hashes] : sectionHashes)
+      for (auto hash : hashes)
+        ++hashFrequency[hash];
+  }
+
+  // Filter rare and common hashes and assign each a unique utility node that
+  // doesn't conflict with the trace utility nodes
+  DenseMap<uint64_t, BPFunctionNode::UtilityNodeT> hashToUN;
+  for (auto &[hash, frequency] : hashFrequency) {
+    if (frequency <= 1 || frequency * 2 > sectionHashes.size())
+      continue;
+    hashToUN[hash] = ++maxUN;
+  }
+
+  SmallVector<std::pair<unsigned, UtilityNodes>> sectionUns;
+  for (auto &[sectionIdx, hashes] : sectionHashes) {
+    UtilityNodes uns;
+    for (auto &hash : hashes) {
+      auto it = hashToUN.find(hash);
+      if (it != hashToUN.end())
+        uns.push_back(it->second);
+    }
+    sectionUns.emplace_back(sectionIdx, uns);
+  }
+  return sectionUns;
+}
+
+llvm::DenseMap<const BPSectionBase *, size_t>
+SectionOrderer::reorderSectionsByBalancedPartitioning(
+    size_t &highestAvailablePriority, llvm::StringRef profilePath,
+    bool forFunctionCompression, bool forDataCompression,
+    bool compressionSortStartupFunctions, bool verbose,
+    SmallVector<BPSectionBase *> inputSections) {
+  TimeTraceScope timeScope("Balanced Partitioning");
+  SmallVector<const BPSectionBase *> sections;
+  DenseMap<const BPSectionBase *, uint64_t> sectionToIdx;
+  StringMap<DenseSet<unsigned>> symbolToSectionIdxs;
+
+  // Process input sections
+  for (const auto *isec : inputSections) {
+    if (!isec->hasValidData())
+      continue;
+
+    unsigned sectionIdx = sections.size();
+    sectionToIdx.try_emplace(isec, sectionIdx);
+    sections.push_back(isec);
+
+    for (auto *sym : isec->getSymbols()) {
+      if (auto *d = sym->asDefinedSymbol())
+        symbolToSectionIdxs[d->getName()].insert(sectionIdx);
+    }
+  }
+  StringMap<DenseSet<unsigned>> rootSymbolToSectionIdxs;
+  for (auto &entry : symbolToSectionIdxs) {
+    StringRef name = entry.getKey();
+    auto &sectionIdxs = entry.getValue();
+    name = BPSectionBase::getRootSymbol(name);
+    rootSymbolToSectionIdxs[name].insert(sectionIdxs.begin(),
+                                         sectionIdxs.end());
+    // Linkage names can be prefixed with "_" or "l_" on Mach-O. See
+    // Mangler::getNameWithPrefix() for details.
+    if (name.consume_front("_") || name.consume_front("l_"))
+      rootSymbolToSectionIdxs[name].insert(sectionIdxs.begin(),
+                                           sectionIdxs.end());
+  }
+
+  BPFunctionNode::UtilityNodeT maxUN = 0;
+  DenseMap<unsigned, UtilityNodes> startupSectionIdxUNs;
+  // Used to define the initial order for startup functions.
+  DenseMap<unsigned, size_t> sectionIdxToTimestamp;
+  std::unique_ptr<InstrProfReader> reader;
+  if (!profilePath.empty()) {
+    auto fs = vfs::getRealFileSystem();
+    auto readerOrErr = InstrProfReader::create(profilePath, *fs);
+    lld::checkError(readerOrErr.takeError());
+
+    reader = std::move(readerOrErr.get());
+    for (auto &entry : *reader) {
+      // Read all entries
+      (void)entry;
+    }
+    auto &traces = reader->getTemporalProfTraces();
+
+    DenseMap<unsigned, BPFunctionNode::UtilityNodeT> sectionIdxToFirstUN;
+    for (size_t traceIdx = 0; traceIdx < traces.size(); traceIdx++) {
+      uint64_t currentSize = 0, cutoffSize = 1;
+      size_t cutoffTimestamp = 1;
+      auto &trace = traces[traceIdx].FunctionNameRefs;
+      for (size_t timestamp = 0; timestamp < trace.size(); timestamp++) {
+        auto [Filename, ParsedFuncName] = getParsedIRPGOName(
+            reader->getSymtab().getFuncOrVarName(trace[timestamp]));
+        ParsedFuncName = BPSectionBase::getRootSymbol(ParsedFuncName);
+
+        auto sectionIdxsIt = rootSymbolToSectionIdxs.find(ParsedFuncName);
+        if (sectionIdxsIt == rootSymbolToSectionIdxs.end())
+          continue;
+        auto &sectionIdxs = sectionIdxsIt->getValue();
+        // If the same symbol is found in multiple sections, they might be
+        // identical, so we arbitrarily use the size from the first section.
+        currentSize += sections[*sectionIdxs.begin()]->getSize();
+
+        // Since BalancedPartitioning is sensitive to the initial order, we need
+        // to explicitly define it to be ordered by earliest timestamp.
+        for (unsigned sectionIdx : sectionIdxs) {
+          auto [it, wasInserted] =
+              sectionIdxToTimestamp.try_emplace(sectionIdx, timestamp);
+          if (!wasInserted)
+            it->getSecond() = std::min<size_t>(it->getSecond(), timestamp);
+        }
+
+        if (timestamp >= cutoffTimestamp || currentSize >= cutoffSize) {
+          ++maxUN;
+          cutoffSize = 2 * currentSize;
+          cutoffTimestamp = 2 * cutoffTimestamp;
+        }
+        for (unsigned sectionIdx : sectionIdxs)
+          sectionIdxToFirstUN.try_emplace(sectionIdx, maxUN);
+      }
+      for (auto &[sectionIdx, firstUN] : sectionIdxToFirstUN)
+        for (auto un = firstUN; un <= maxUN; ++un)
+          startupSectionIdxUNs[sectionIdx].push_back(un);
+      ++maxUN;
+      sectionIdxToFirstUN.clear();
+    }
+  }
+
+  SmallVector<unsigned> sectionIdxsForFunctionCompression,
+      sectionIdxsForDataCompression;
+  for (unsigned sectionIdx = 0; sectionIdx < sections.size(); sectionIdx++) {
+    if (startupSectionIdxUNs.count(sectionIdx))
+      continue;
+    const auto *isec = sections[sectionIdx];
+    if (isec->isCodeSection()) {
+      if (forFunctionCompression)
+        sectionIdxsForFunctionCompression.push_back(sectionIdx);
+    } else {
+      if (forDataCompression)
+        sectionIdxsForDataCompression.push_back(sectionIdx);
+    }
+  }
+
+  if (compressionSortStartupFunctions) {
+    SmallVector<unsigned> startupIdxs;
+    for (auto &[sectionIdx, uns] : startupSectionIdxUNs)
+      startupIdxs.push_back(sectionIdx);
+    auto unsForStartupFunctionCompression =
+        getUnsForCompression(sections, sectionToIdx, startupIdxs,
+                             /*duplicateSectionIdxs=*/nullptr, maxUN);
+    for (auto &[sectionIdx, compressionUns] :
+         unsForStartupFunctionCompression) {
+      auto &uns = startupSectionIdxUNs[sectionIdx];
+      uns.append(compressionUns);
+      llvm::sort(uns);
+      uns.erase(std::unique(uns.begin(), uns.end()), uns.end());
+    }
+  }
+
+  // Map a section index (order directly) to a list of duplicate section indices
+  // (not ordered directly).
+  DenseMap<unsigned, SmallVector<unsigned>> duplicateSectionIdxs;
+  auto unsForFunctionCompression = getUnsForCompression(
+      sections, sectionToIdx, sectionIdxsForFunctionCompression,
+      &duplicateSectionIdxs, maxUN);
+  auto unsForDataCompression = getUnsForCompression(
+      sections, sectionToIdx, sectionIdxsForDataCompression,
+      &duplicateSectionIdxs, maxUN);
+
+  std::vector<BPFunctionNode> nodesForStartup, nodesForFunctionCompression,
+      nodesForDataCompression;
+  for (auto &[sectionIdx, uns] : startupSectionIdxUNs)
+    nodesForStartup.emplace_back(sectionIdx, uns);
+  for (auto &[sectionIdx, uns] : unsForFunctionCompression)
+    nodesForFunctionCompression.emplace_back(sectionIdx, uns);
+  for (auto &[sectionIdx, uns] : unsForDataCompression)
+    nodesForDataCompression.emplace_back(sectionIdx, uns);
+
+  // Use the first timestamp to define the initial order for startup nodes.
+  llvm::sort(nodesForStartup, [&sectionIdxToTimestamp](auto &L, auto &R) {
+    return std::make_pair(sectionIdxToTimestamp[L.Id], L.Id) <
+           std::make_pair(sectionIdxToTimestamp[R.Id], R.Id);
+  });
+  // Sort compression nodes by their Id (which is the section index) because the
+  // input linker order tends to be not bad.
+  llvm::sort(nodesForFunctionCompression,
+             [](auto &L, auto &R) { return L.Id < R.Id; });
+  llvm::sort(nodesForDataCompression,
+             [](auto &L, auto &R) { return L.Id < R.Id; });
+
+  {
+    TimeTraceScope timeScope("Balanced Partitioning");
+    BalancedPartitioningConfig config;
+    BalancedPartitioning bp(config);
+    bp.run(nodesForStartup);
+    bp.run(nodesForFunctionCompression);
+    bp.run(nodesForDataCompression);
+  }
+
+  unsigned numStartupSections = 0;
+  unsigned numCodeCompressionSections = 0;
+  unsigned numDuplicateCodeSections = 0;
+  unsigned numDataCompressionSections = 0;
+  unsigned numDuplicateDataSections = 0;
+  SetVector<const BPSectionBase *> orderedSections;
+  // Order startup functions,
+  for (auto &node : nodesForStartup) {
+    const auto *isec = sections[node.Id];
+    if (orderedSections.insert(isec))
+      ++numStartupSections;
+  }
+  // then functions for compression,
+  for (auto &node : nodesForFunctionCompression) {
+    const auto *isec = sections[node.Id];
+    if (orderedSections.insert(isec))
+      ++numCodeCompressionSections;
+
+    auto It = duplicateSectionIdxs.find(node.Id);
+    if (It == duplicateSectionIdxs.end())
+      continue;
+    for (auto dupSecIdx : It->getSecond()) {
+      const auto *dupIsec = sections[dupSecIdx];
+      if (orderedSections.insert(dupIsec))
+        ++numDuplicateCodeSections;
+    }
+  }
+  // then data for compression.
+  for (auto &node : nodesForDataCompression) {
+    const auto *isec = sections[node.Id];
+    if (orderedSections.insert(isec))
+      ++numDataCompressionSections;
+    auto It = duplicateSectionIdxs.find(node.Id);
+    if (It == duplicateSectionIdxs.end())
+      continue;
+    for (auto dupSecIdx : It->getSecond()) {
+      const auto *dupIsec = sections[dupSecIdx];
+      if (orderedSections.insert(dupIsec))
+        ++numDuplicateDataSections;
+    }
+  }
+
+  if (verbose) {
+    unsigned numTotalOrderedSections =
+        numStartupSections + numCodeCompressionSections +
+        numDuplicateCodeSections + numDataCompressionSections +
+        numDuplicateDataSections;
+    dbgs()
+        << "Ordered " << numTotalOrderedSections
+        << " sections using balanced partitioning:\n  Functions for startup: "
+        << numStartupSections
+        << "\n  Functions for compression: " << numCodeCompressionSections
+        << "\n  Duplicate functions: " << numDuplicateCodeSections
+        << "\n  Data for compression: " << numDataCompressionSections
+        << "\n  Duplicate data: " << numDuplicateDataSections << "\n";
+
+    if (!profilePath.empty()) {
+      // Evaluate this function order for startup
+      StringMap<std::pair<uint64_t, uint64_t>> symbolToPageNumbers;
+      const uint64_t pageSize = (1 << 14);
+      uint64_t currentAddress = 0;
+      for (const auto *isec : orderedSections) {
+        for (auto *sym : isec->getSymbols()) {
+          if (auto *d = sym->asDefinedSymbol()) {
+            uint64_t startAddress = currentAddress + d->getValue();
+            uint64_t endAddress = startAddress + d->getSize();
+            uint64_t firstPage = startAddress / pageSize;
+            // I think the kernel might pull in a few pages when one it touched,
+            // so it might be more accurate to force lastPage to be aligned by
+            // 4?
+            uint64_t lastPage = endAddress / pageSize;
+            StringRef rootSymbol = d->getName();
+            rootSymbol = BPSectionBase::getRootSymbol(rootSymbol);
+            symbolToPageNumbers.try_emplace(rootSymbol, firstPage, lastPage);
+            if (rootSymbol.consume_front("_") || rootSymbol.consume_front("l_"))
+              symbolToPageNumbers.try_emplace(rootSymbol, firstPage, lastPage);
+          }
+        }
+        currentAddress += isec->getSize();
+      }
+
+      // The area under the curve F where F(t) is the total number of page
+      // faults at step t.
+      unsigned area = 0;
+      for (auto &trace : reader->getTemporalProfTraces()) {
+        SmallSet<uint64_t, 0> touchedPages;
+        for (unsigned step = 0; step < trace.FunctionNameRefs.size(); step++) {
+          auto traceId = trace.FunctionNameRefs[step];
+          auto [Filename, ParsedFuncName] =
+              getParsedIRPGOName(reader->getSymtab().getFuncOrVarName(traceId));
+          ParsedFuncName = BPSectionBase::getRootSymbol(ParsedFuncName);
+          auto it = symbolToPageNumbers.find(ParsedFuncName);
+          if (it != symbolToPageNumbers.end()) {
+            auto &[firstPage, lastPage] = it->getValue();
+            for (uint64_t i = firstPage; i <= lastPage; i++)
+              touchedPages.insert(i);
+          }
+          area += touchedPages.size();
+        }
+      }
+      dbgs() << "Total area under the page fault curve: " << (float)area
+             << "\n";
+    }
+  }
+
+  DenseMap<const BPSectionBase *, size_t> sectionPriorities;
+  for (const auto *isec : orderedSections)
+    sectionPriorities[isec] = --highestAvailablePriority;
+  return sectionPriorities;
+}
+
+} // namespace lld
diff --git a/lld/ELF/BPSectionOrderer.cpp b/lld/ELF/BPSectionOrderer.cpp
new file mode 100644
index 00000000000000..ac3024a69e681a
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.cpp
@@ -0,0 +1,50 @@
+//===- BPSectionOrderer.cpp--------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BPSectionOrderer.h"
+#include "Config.h"
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "lld/Common/CommonLinkerContext.h"
+#include "lld/Common/SectionOrderer.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Support/BalancedPartitioning.h"
+#include "llvm/Support/TimeProfiler.h"
+
+using namespace llvm;
+using namespace lld::elf;
+
+llvm::DenseMap<const lld::elf::InputSectionBase *, int>
+lld::elf::runBalancedPartitioning(Ctx &ctx, llvm::StringRef profilePath,
+                                  bool forFunctionCompression,
+                                  bool forDataCompression,
+                                  bool compressionSortStartupFunctions,
+                                  bool verbose) {
+  size_t highestAvailablePriority = std::numeric_limits<int>::max();
+  SmallVector<lld::BPSectionBase *> sections;
+  for (auto *isec : ctx.inputSections) {
+    if (!isec || isec->content().empty())
+      continue;
+    sections.push_back(new ELFSection(isec));
+  }
+
+  auto reorderedSections =
+      lld::SectionOrderer::reorderSectionsByBalancedPartitioning(
+          highestAvailablePriority, profilePath, forFunctionCompression,
+          forDataCompression, compressionSortStartupFunctions, verbose,
+          sections);
+
+  DenseMap<const InputSectionBase *, int> result;
+  for (const auto &[BPSectionBase, priority] : reorderedSections) {
+    if (auto *elfSection = dyn_cast<ELFSection>(BPSectionBase)) {
+      result[elfSection->getSection()] = static_cast<int>(priority);
+      delete elfSection;
+    }
+  }
+  return result;
+}
diff --git a/lld/ELF/BPSectionOrderer.h b/lld/ELF/BPSectionOrderer.h
new file mode 100644
index 00000000000000..c24f8d1277c108
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.h
@@ -0,0 +1,140 @@
+//===- BPSectionOrderer.h ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// This file uses Balanced Partitioning to order sections to improve startup
+/// time and compressed size.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLD_ELF_BPSECTION_ORDERER_H
+#define LLD_ELF_BPSECTION_ORDERER_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "lld/Common/SectionOrderer.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/ELF.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/xxhash.h"
+
+namespace lld::elf {
+
+class InputSection;
+
+class ELFSymbol : public BPSymbol {
+  const Symbol *sym;
+
+public:
+  explicit ELFSymbol(const Symbol *s) : sym(s) {}
+
+  llvm::StringRef getName() const override { return sym->getName(); }
+
+  BPSymbol *asDefinedSymbol() override {
+    if (auto *d = llvm::dyn_cast<Defined>(sym))
+      return this;
+    return nullptr;
+  }
+
+  uint64_t getValue() const override {
+    if (auto *d = llvm::dyn_cast<Defined>(sym))
+      return d->value;
+    return 0;
+  }
+
+  uint64_t getSize() const override {
+    if (auto *d = llvm::dyn_cast<Defined>(sym))
+      return d->size;
+    return 0;
+  }
+
+  const Symbol *getSymbol() const { return sym; }
+};
+
+class ELFSection : public BPSectionBase {
+  const InputSectionBase *isec;
+  mutable std::vector<std::unique_ptr<ELFSymbol>> symbolCache;
+
+public:
+  explicit ELFSection(const InputSectionB...
[truncated]

@ellishg
Copy link
Contributor

ellishg commented Nov 25, 2024

Thanks for the PR! I'll try to take a look, but I might not get to it until next week because Thursday is Thanksgiving. I'll also assign a few reviewers that might be interested in taking a look.

@smithp35
Copy link
Collaborator

Just a quick drive by comment on the ELF side, I've not had a chance to go through the code in detail. I think the .subsections_via_symbols is MachO specific. Not quite sure what it does in ELF. I think the equivalent in clang at least is -ffunction-sections and for data -fdata-sections. For assembler the user has to be careful to make a section per function.

If I'm right it could be worth updating comments and the assembler test.

@Colibrow
Copy link
Contributor Author

@smithp35 You're correct. In the standard ELF format, we typically use -ffunction-sections and -fdata-sections flags to achieve balanced section partitioning. The original comment was indeed copied from the Mach-O specific implementation. I've updated the code to reflect this, and if you have any additional suggestions, I'd be happy to hear them. Appreciate your review!

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@Colibrow Colibrow force-pushed the elf-support-bp branch 2 times, most recently from c06ac10 to 9e453fe Compare December 2, 2024 11:23
@@ -0,0 +1,104 @@
# REQUIRES: aarch64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we didn't have to duplicate this test in ELF and Mach-O, but I can't think of a better solution right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication like this fine as sharing the test could make updating either port more difficult.

Copy link
Contributor

@thevinster thevinster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits on top of everything else.

@Colibrow
Copy link
Contributor Author

@ellishg @MaskRay Hi, I've rebased this PR with commit 118594 and addressed the code review comments. Could you please take another look?

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to change the flag spelling in the PR summary

Comment on lines 100 to 105
const llvm::DenseMap<const BPSectionBase *, uint64_t>
&sectionToIdx) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Colibrow What do you think of this?

@Colibrow
Copy link
Contributor Author

@ellishg I deleted the function that passed the map everywhere, and the file lld/ELF/BPSectionOrderer.h has been updated. Does this refer to the previous implementation?

@Colibrow Colibrow requested a review from thevinster December 11, 2024 08:34
Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking better! After we resolve the section index issue in getRelocHash() I want to test this on my codebase to make sure we don't have a regression.

// TODO: Calculate relocation hashes.
// Since in ELF, relocations are complex, but the effect without them are
// good enough, we just use 0 as their hash.
for (const auto &r : isec->relocations) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellishg @MaskRay Hi, I've updated the relocations hashes by the isec->relocations which seems an common interface to access relocation. Could you please review these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it seems good. Did you see an uncompressed or compressed size win with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I just saw this now. I'll give a test and post the results asap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ellis,

I tested this on my project and realized the code I shared earlier was incorrect. When iterating over the window + relocHash, the loop needs to end at the relocation code size, e.g., 2 for functions and 3 for data in Mach-O.

I ran the test on my project, which builds an AArch64 ELF with the following relocation distribution:

Relocation Type Count
R_AARCH64_ABS64 387
R_AARCH64_GLOB_DAT 34
R_AARCH64_JUMP_SLOT 711
R_AARCH64_RELATIVE 9396

For testing, I hardcoded r.length as 3, and here are the results:

Binary Name Size (bytes) Gzipped Size (bytes)
libsample-aarch64.so 3,181,560 1,512,245
libsample-aarch64-noreloc-compressed-function.so 3,181,560 1,487,043
libsample-aarch64-reloc-compressed-function.so 3,181,560 1,487,032

Since different relType values require various relocation forms and the size optimization is minimal, I decided to revert the relocation-hash commit.

Do you have any ideas on how to proceed?

@ellishg
Copy link
Contributor

ellishg commented Dec 14, 2024

I built this PR locally and I needed these cmake changes for it to work and some of the includes are not necessary

diff --git a/lld/Common/BPSectionOrdererBase.cpp b/lld/Common/BPSectionOrdererBase.cpp
index f7b460d51a16..51d56606ec92 100644
--- a/lld/Common/BPSectionOrdererBase.cpp
+++ b/lld/Common/BPSectionOrdererBase.cpp
@@ -19,7 +19,6 @@
 #include "llvm/Support/BalancedPartitioning.h"
 #include "llvm/Support/TimeProfiler.h"
 #include "llvm/Support/VirtualFileSystem.h"
-#include "llvm/Support/xxhash.h"

 #define DEBUG_TYPE "bp-section-orderer"
 using namespace llvm;
diff --git a/lld/Common/CMakeLists.txt b/lld/Common/CMakeLists.txt
index 2ab5093bf688..43e91b85821d 100644
--- a/lld/Common/CMakeLists.txt
+++ b/lld/Common/CMakeLists.txt
@@ -48,6 +48,7 @@ add_lld_library(lldCommon
   Demangle
   MC
   Option
+  ProfileData
   Support
   Target
   TargetParser
diff --git a/lld/ELF/BPSectionOrderer.cpp b/lld/ELF/BPSectionOrderer.cpp
index 86605b1a7626..1fc6036e5dd9 100644
--- a/lld/ELF/BPSectionOrderer.cpp
+++ b/lld/ELF/BPSectionOrderer.cpp
@@ -11,10 +11,7 @@
 #include "InputFiles.h"
 #include "InputSection.h"
 #include "lld/Common/BPSectionOrdererBase.h"
-#include "lld/Common/CommonLinkerContext.h"
 #include "llvm/ADT/DenseMap.h"
-#include "llvm/Support/BalancedPartitioning.h"
-#include "llvm/Support/TimeProfiler.h"

 #include "SymbolTable.h"
 #include "Symbols.h"
diff --git a/lld/ELF/BPSectionOrderer.h b/lld/ELF/BPSectionOrderer.h
index 7f32c26bdd3e..502b7cfbdd2f 100644
--- a/lld/ELF/BPSectionOrderer.h
+++ b/lld/ELF/BPSectionOrderer.h
@@ -21,7 +21,6 @@
 #include "lld/Common/BPSectionOrdererBase.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/STLFunctionalExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/Support/ErrorHandling.h"
diff --git a/lld/ELF/CMakeLists.txt b/lld/ELF/CMakeLists.txt
index 298443cd6ea4..ec3f6382282b 100644
--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -73,6 +73,7 @@ add_lld_library(lldELF
   Object
   Option
   Passes
+  ProfileData
   Support
   TargetParser
   TransformUtils
diff --git a/lld/MachO/BPSectionOrderer.cpp b/lld/MachO/BPSectionOrderer.cpp
index 5e9fd5248b2e..0ffbf16007fd 100644
--- a/lld/MachO/BPSectionOrderer.cpp
+++ b/lld/MachO/BPSectionOrderer.cpp
@@ -8,11 +8,7 @@

 #include "BPSectionOrderer.h"
 #include "InputSection.h"
-#include "lld/Common/ErrorHandler.h"
 #include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/StringMap.h"
-#include "llvm/Support/BalancedPartitioning.h"
-#include "llvm/Support/TimeProfiler.h"

 #define DEBUG_TYPE "bp-section-orderer"

diff --git a/lld/MachO/BPSectionOrderer.h b/lld/MachO/BPSectionOrderer.h
index fa71f739ecb3..29b20c781c6b 100644
--- a/lld/MachO/BPSectionOrderer.h
+++ b/lld/MachO/BPSectionOrderer.h
@@ -20,7 +20,6 @@
 #include "lld/Common/BPSectionOrdererBase.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/StringRef.h"
-#include "llvm/ADT/TinyPtrVector.h"

 namespace lld::macho {

diff --git a/lld/include/lld/Common/BPSectionOrdererBase.h b/lld/include/lld/Common/BPSectionOrdererBase.h
index 78c51cd9786b..a22bfcafd79c 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.h
+++ b/lld/include/lld/Common/BPSectionOrdererBase.h
@@ -14,7 +14,6 @@
 #ifndef LLD_COMMON_BP_SECTION_ORDERER_BASE_H
 #define LLD_COMMON_BP_SECTION_ORDERER_BASE_H

-#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"

Copy link
Contributor

@ellishg ellishg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for ensuring this works for both targets! I tested that MachO/bp-section-orderer-stress.s produced the same order before and after that change, so it gives me some confidence that this won't regress anything. And I'm curious what size results you get for ELF, so please publish that when it is available.

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need time to look at the ELF port changes.

If you drop ELF changes from this PR, and just move MachO code to Common to be reused, this PR will be good to go and I'll approve.

You need a dependent ELF patch.

@@ -0,0 +1,104 @@
# REQUIRES: aarch64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication like this fine as sharing the test could make updating either port more difficult.

@Colibrow Colibrow changed the title [lld][ELF] Extend profile guided function ordering to ELF binaries [lld]Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF Dec 18, 2024
@Colibrow
Copy link
Contributor Author

@MaskRay I dropped the ELF parts and will post the ELF patch as a new PR once this PR is merged.

@Colibrow
Copy link
Contributor Author

@MaskRay Hi, Can you merge this PR? I don't have merge button for this repository.

@MaskRay MaskRay merged commit 79e859e into llvm:main Dec 18, 2024
8 checks passed
Copy link

@Colibrow Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@lerno
Copy link
Contributor

lerno commented Jan 13, 2025

Could this have been causing this issue #122655? I've been hoping this would be discovered and fixed, but it's been broken now for almost a month.

@MaskRay
Copy link
Member

MaskRay commented Jan 26, 2025

Spent more time on BP. I believe that the virtual classes as done by this PR BPSectionBase/BPSymbol is quite wasteful. BPSectionBase/BPSymbol are thin wrappers that contain just one pointer. This is quite wasteful. I understand that the performance difference isn't critical, but the memory consumption is still important.

Perhaps we should treat Symbol and InputSection as void * in BPSectionOrderer.cpp and use the following class hierarchy instead.

class BPBase {
  virtual getSectionSize(void *);
  virtual isCodeSection(void *);
  
  virtual StringRef getSymbolName(void *);
  virtual uint64_t getSymbolValue(void *); // optional<uint64_t> is unnecessary as we use value_or(0)
};
class BPMachO;
class BPELF; // future

Then we can avoid allocating a large section and inputSections (expensive for large ELF applications).

We could also leverage Curiously Recurring Template Pattern if we want to retain type information of symbols and sections.

template <class D>
class BPBase {
  uint64_t getSize(D::Section *);
  bool isCodeSection(D::Section *);
  
  StringRef getName(D::Symbol *);
  uint64_t getSymbolValue(void *); // optional<uint64_t> is unnecessary as we use value_or(0)
};
class BPMachO : BPBase<BPMachO>;

@MaskRay
Copy link
Member

MaskRay commented Jan 26, 2025

@Colibrow I am taking a stab on the CRTP refactoring.

@Colibrow
Copy link
Contributor Author

I've been using the template method to implement the project, but I'm stuck. In the first version, I placed the base template in lld/common, and during compilation, the lld/ELF and lld/MachO templates couldn't find the symbols. I might have made a mistake in my implementation, and I'm hoping to learn how to refactor this correctly. Nice work.

MaskRay added a commit that referenced this pull request Jan 28, 2025
PR #117514 refactored BPSectionOrderer to be used by the ELF port
but introduced some inefficiency:

* BPSectionBase/BPSymbol are wrappers around a single pointer.
  The numbers of sections and symbols could be huge, and the extra
  allocations are memory inefficient.
* Reconstructing the returned DenseMap (since BPSectionBase != InputSectin)
  is wasteful.

This patch refactors BPSectionOrderer with Curiously Recurring Template
Pattern and eliminates the inefficiency. In addition,
`symbolToSectionIdxs` is removed and `rootSymbolToSectionIdxs` building
is moved to lld/MachO: while getting sections for symbols is cheap in
Mach-O, it is awkward and inefficient in the ELF port.

While here, add a file-level comment and replace some `StringMap<*>`
(which copies strings) with `DenseMap<CachedHashStringRef, *>`.

Pull Request: #124482
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants