[ELF] Add BPSectionOrderer options (#120514) #125559

MaskRay · 2025-02-03T19:16:18Z

Reland #120514 after 2f6e3df fixed
iteration order issue and libstdc++/libc++ differences.

Both options instruct the linker to optimize section layout with the following goals:

--bp-compression-sort=[data|function|both]: Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size.
--bp-startup-sort=function --irpgo-profile=<file>: Utilize a temporal profile file to reduce page faults during program startup.

The linker determines the section order by considering three groups:

Function sections ordered according to the temporal profile (--irpgo-profile=), prioritizing early-accessed and frequently accessed functions.
Function sections. Sections containing similar functions are placed together, maximizing compression opportunities.
Data sections. Similar data sections are placed together.

Within each group, the sections are ordered using the Balanced Partitioning algorithm.

The linker constructs a bipartite graph with two sets of vertices: sections and utility vertices.

For profile-guided function sections:
- The number of utility vertices is determined by the symbol order within the profile file.
- If --bp-compression-sort-startup-functions is specified, extra utility vertices are allocated to prioritize nearby function similarity.
For sections ordered for compression: Utility vertices are determined by analyzing k-mers of the section content and relocations.

The call graph profile is disabled during this optimization.

When --symbol-ordering-file= is specified, sections described in that file are placed earlier.

@ellishg

Add new ELF linker options for profile-guided section ordering optimizations: - `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup and compression optimizations - `--bp-startup-sort={none,function}`: Order sections based on profile data to improve star tup time - `--bp-compression-sort={none,function,data,both}`: Order sections using balanced partitioning to improve compressed size - `--bp-compression-sort-startup-functions`: Additionally optimize startup functions for compression - `--verbose-bp-section-orderer`: Print statistics about balanced partitioning section ordering Thanks to the @ellishg, @thevinster, and their team's work. --------- Reland after 2f6e3df fixed iteration order issue and libstdc++/libc++ differences. Co-authored-by: Fangrui Song <[email protected]>

MaskRay · 2025-02-03T19:16:29Z

@Colibrow @ellishg

llvmbot · 2025-02-03T19:16:50Z

@llvm/pr-subscribers-lld-elf

@llvm/pr-subscribers-lld

Author: Fangrui Song (MaskRay)

Changes

Add new ELF linker options for profile-guided section ordering optimizations:

--irpgo-profile=<file>: Read IRPGO profile data for use with startup and compression optimizations
--bp-startup-sort={none,function}: Order sections based on profile data to improve star tup time
--bp-compression-sort={none,function,data,both}: Order sections using balanced partitioning to improve compressed size
--bp-compression-sort-startup-functions: Additionally optimize startup functions for compression
--verbose-bp-section-orderer: Print statistics about balanced partitioning section ordering

Thanks to the @ellishg, @thevinster, and their team's work.

Reland after 2f6e3df fixed iteration order issue and libstdc++/libc++ differences.

Patch is 25.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125559.diff

10 Files Affected:

(added) lld/ELF/BPSectionOrderer.cpp (+95)
(added) lld/ELF/BPSectionOrderer.h (+37)
(modified) lld/ELF/CMakeLists.txt (+2)
(modified) lld/ELF/Config.h (+6)
(modified) lld/ELF/Driver.cpp (+48)
(modified) lld/ELF/Options.td (+13)
(modified) lld/ELF/Writer.cpp (+12-1)
(modified) lld/include/lld/Common/BPSectionOrdererBase.inc (+6-1)
(added) lld/test/ELF/bp-section-orderer-stress.s (+104)
(added) lld/test/ELF/bp-section-orderer.s (+335)

diff --git a/lld/ELF/BPSectionOrderer.cpp b/lld/ELF/BPSectionOrderer.cpp
new file mode 100644
index 00000000000000..01f77b33926f75
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.cpp
@@ -0,0 +1,95 @@
+//===- BPSectionOrderer.cpp -----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BPSectionOrderer.h"
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "SymbolTable.h"
+#include "Symbols.h"
+#include "lld/Common/BPSectionOrdererBase.inc"
+#include "llvm/Support/Endian.h"
+
+using namespace llvm;
+using namespace lld::elf;
+
+namespace {
+struct BPOrdererELF;
+}
+template <> struct lld::BPOrdererTraits<struct BPOrdererELF> {
+  using Section = elf::InputSectionBase;
+  using Defined = elf::Defined;
+};
+namespace {
+struct BPOrdererELF : lld::BPOrderer<BPOrdererELF> {
+  DenseMap<const InputSectionBase *, Defined *> secToSym;
+
+  static uint64_t getSize(const Section &sec) { return sec.getSize(); }
+  static bool isCodeSection(const Section &sec) {
+    return sec.flags & ELF::SHF_EXECINSTR;
+  }
+  ArrayRef<Defined *> getSymbols(const Section &sec) {
+    auto it = secToSym.find(&sec);
+    if (it == secToSym.end())
+      return {};
+    return ArrayRef(it->second);
+  }
+
+  static void
+  getSectionHashes(const Section &sec, SmallVectorImpl<uint64_t> &hashes,
+                   const DenseMap<const void *, uint64_t> &sectionToIdx) {
+    constexpr unsigned windowSize = 4;
+
+    // Calculate content hashes: k-mers and the last k-1 bytes.
+    ArrayRef<uint8_t> data = sec.content();
+    if (data.size() >= windowSize)
+      for (size_t i = 0; i <= data.size() - windowSize; ++i)
+        hashes.push_back(support::endian::read32le(data.data() + i));
+    for (uint8_t byte : data.take_back(windowSize - 1))
+      hashes.push_back(byte);
+
+    llvm::sort(hashes);
+    hashes.erase(std::unique(hashes.begin(), hashes.end()), hashes.end());
+  }
+
+  static StringRef getSymName(const Defined &sym) { return sym.getName(); }
+  static uint64_t getSymValue(const Defined &sym) { return sym.value; }
+  static uint64_t getSymSize(const Defined &sym) { return sym.size; }
+};
+} // namespace
+
+DenseMap<const InputSectionBase *, int> elf::runBalancedPartitioning(
+    Ctx &ctx, StringRef profilePath, bool forFunctionCompression,
+    bool forDataCompression, bool compressionSortStartupFunctions,
+    bool verbose) {
+  // Collect candidate sections and associated symbols.
+  SmallVector<InputSectionBase *> sections;
+  DenseMap<CachedHashStringRef, std::set<unsigned>> rootSymbolToSectionIdxs;
+  BPOrdererELF orderer;
+
+  auto addSection = [&](Symbol &sym) {
+    auto *d = dyn_cast<Defined>(&sym);
+    if (!d)
+      return;
+    auto *sec = dyn_cast_or_null<InputSectionBase>(d->section);
+    if (!sec || sec->size == 0 || !orderer.secToSym.try_emplace(sec, d).second)
+      return;
+    rootSymbolToSectionIdxs[CachedHashStringRef(getRootSymbol(sym.getName()))]
+        .insert(sections.size());
+    sections.emplace_back(sec);
+  };
+
+  for (Symbol *sym : ctx.symtab->getSymbols())
+    addSection(*sym);
+  for (ELFFileBase *file : ctx.objectFiles)
+    for (Symbol *sym : file->getLocalSymbols())
+      addSection(*sym);
+  return orderer.computeOrder(profilePath, forFunctionCompression,
+                              forDataCompression,
+                              compressionSortStartupFunctions, verbose,
+                              sections, rootSymbolToSectionIdxs);
+}
diff --git a/lld/ELF/BPSectionOrderer.h b/lld/ELF/BPSectionOrderer.h
new file mode 100644
index 00000000000000..a0cb1360005a6b
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.h
@@ -0,0 +1,37 @@
+//===- BPSectionOrderer.h -------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// This file uses Balanced Partitioning to order sections to improve startup
+/// time and compressed size.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLD_ELF_BPSECTION_ORDERER_H
+#define LLD_ELF_BPSECTION_ORDERER_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringRef.h"
+
+namespace lld::elf {
+struct Ctx;
+class InputSectionBase;
+
+/// Run Balanced Partitioning to find the optimal function and data order to
+/// improve startup time and compressed size.
+///
+/// It is important that -ffunction-sections and -fdata-sections compiler flags
+/// are used to ensure functions and data are in their own sections and thus
+/// can be reordered.
+llvm::DenseMap<const InputSectionBase *, int>
+runBalancedPartitioning(Ctx &ctx, llvm::StringRef profilePath,
+                        bool forFunctionCompression, bool forDataCompression,
+                        bool compressionSortStartupFunctions, bool verbose);
+
+} // namespace lld::elf
+
+#endif
diff --git a/lld/ELF/CMakeLists.txt b/lld/ELF/CMakeLists.txt
index 83d816ddb0601e..ec3f6382282b1f 100644
--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -37,6 +37,7 @@ add_lld_library(lldELF
   Arch/X86.cpp
   Arch/X86_64.cpp
   ARMErrataFix.cpp
+  BPSectionOrderer.cpp
   CallGraphSort.cpp
   DWARF.cpp
   Driver.cpp
@@ -72,6 +73,7 @@ add_lld_library(lldELF
   Object
   Option
   Passes
+  ProfileData
   Support
   TargetParser
   TransformUtils
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index c2aadb2cef5200..a1ac0973bcc402 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -264,6 +264,12 @@ struct Config {
   bool armBe8 = false;
   BsymbolicKind bsymbolic = BsymbolicKind::None;
   CGProfileSortKind callGraphProfileSort;
+  llvm::StringRef irpgoProfilePath;
+  bool bpStartupFunctionSort = false;
+  bool bpCompressionSortStartupFunctions = false;
+  bool bpFunctionOrderForCompression = false;
+  bool bpDataOrderForCompression = false;
+  bool bpVerboseSectionOrderer = false;
   bool checkSections;
   bool checkDynamicRelocs;
   std::optional<llvm::DebugCompressionType> compressDebugSections;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index acbc97b331b0e1..a14d3629f6eec0 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1121,6 +1121,53 @@ static CGProfileSortKind getCGProfileSortKind(Ctx &ctx,
   return CGProfileSortKind::None;
 }
 
+static void parseBPOrdererOptions(Ctx &ctx, opt::InputArgList &args) {
+  if (auto *arg = args.getLastArg(OPT_bp_compression_sort)) {
+    StringRef s = arg->getValue();
+    if (s == "function") {
+      ctx.arg.bpFunctionOrderForCompression = true;
+    } else if (s == "data") {
+      ctx.arg.bpDataOrderForCompression = true;
+    } else if (s == "both") {
+      ctx.arg.bpFunctionOrderForCompression = true;
+      ctx.arg.bpDataOrderForCompression = true;
+    } else if (s != "none") {
+      ErrAlways(ctx) << arg->getSpelling()
+                     << ": expected [none|function|data|both]";
+    }
+    if (s != "none" && args.hasArg(OPT_call_graph_ordering_file))
+      ErrAlways(ctx) << "--bp-compression-sort is incompatible with "
+                        "--call-graph-ordering-file";
+  }
+  if (auto *arg = args.getLastArg(OPT_bp_startup_sort)) {
+    StringRef s = arg->getValue();
+    if (s == "function") {
+      ctx.arg.bpStartupFunctionSort = true;
+    } else if (s != "none") {
+      ErrAlways(ctx) << arg->getSpelling() << ": expected [none|function]";
+    }
+    if (s != "none" && args.hasArg(OPT_call_graph_ordering_file))
+      ErrAlways(ctx) << "--bp-startup-sort=function is incompatible with "
+                        "--call-graph-ordering-file";
+  }
+
+  ctx.arg.bpCompressionSortStartupFunctions =
+      args.hasFlag(OPT_bp_compression_sort_startup_functions,
+                   OPT_no_bp_compression_sort_startup_functions, false);
+  ctx.arg.bpVerboseSectionOrderer = args.hasArg(OPT_verbose_bp_section_orderer);
+
+  ctx.arg.irpgoProfilePath = args.getLastArgValue(OPT_irpgo_profile);
+  if (ctx.arg.irpgoProfilePath.empty()) {
+    if (ctx.arg.bpStartupFunctionSort)
+      ErrAlways(ctx) << "--bp-startup-sort=function must be used with "
+                        "--irpgo-profile";
+    if (ctx.arg.bpCompressionSortStartupFunctions)
+      ErrAlways(ctx)
+          << "--bp-compression-sort-startup-functions must be used with "
+             "--irpgo-profile";
+  }
+}
+
 static DebugCompressionType getCompressionType(Ctx &ctx, StringRef s,
                                                StringRef option) {
   DebugCompressionType type = StringSwitch<DebugCompressionType>(s)
@@ -1262,6 +1309,7 @@ static void readConfigs(Ctx &ctx, opt::InputArgList &args) {
       ctx.arg.bsymbolic = BsymbolicKind::All;
   }
   ctx.arg.callGraphProfileSort = getCGProfileSortKind(ctx, args);
+  parseBPOrdererOptions(ctx, args);
   ctx.arg.checkSections =
       args.hasFlag(OPT_check_sections, OPT_no_check_sections, true);
   ctx.arg.chroot = args.getLastArgValue(OPT_chroot);
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index c31875305952fb..80032490da0de4 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -141,6 +141,19 @@ def call_graph_profile_sort: JJ<"call-graph-profile-sort=">,
 def : FF<"no-call-graph-profile-sort">, Alias<call_graph_profile_sort>, AliasArgs<["none"]>,
   Flags<[HelpHidden]>;
 
+defm irpgo_profile: EEq<"irpgo-profile",
+  "Read a temporary profile file for use with --bp-startup-sort=">;
+def bp_compression_sort: JJ<"bp-compression-sort=">, MetaVarName<"[none,function,data,both]">,
+  HelpText<"Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size">;
+def bp_startup_sort: JJ<"bp-startup-sort=">, MetaVarName<"[none,function]">,
+  HelpText<"Utilize a temporal profile file to reduce page faults during program startup">;
+
+// Auxiliary options related to balanced partition
+defm bp_compression_sort_startup_functions: BB<"bp-compression-sort-startup-functions",
+  "When --irpgo-profile is pecified, prioritize function similarity for compression in addition to startup time", "">;
+def verbose_bp_section_orderer: FF<"verbose-bp-section-orderer">,
+  HelpText<"Print information on balanced partitioning">;
+
 // --chroot doesn't have a help text because it is an internal option.
 def chroot: Separate<["--"], "chroot">;
 
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 3ba1cdbce572b7..49077055e84fbf 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -9,6 +9,7 @@
 #include "Writer.h"
 #include "AArch64ErrataFix.h"
 #include "ARMErrataFix.h"
+#include "BPSectionOrderer.h"
 #include "CallGraphSort.h"
 #include "Config.h"
 #include "InputFiles.h"
@@ -1080,8 +1081,18 @@ static void maybeShuffle(Ctx &ctx,
 // that don't appear in the order file.
 static DenseMap<const InputSectionBase *, int> buildSectionOrder(Ctx &ctx) {
   DenseMap<const InputSectionBase *, int> sectionOrder;
-  if (!ctx.arg.callGraphProfile.empty())
+  if (ctx.arg.bpStartupFunctionSort || ctx.arg.bpFunctionOrderForCompression ||
+      ctx.arg.bpDataOrderForCompression) {
+    TimeTraceScope timeScope("Balanced Partitioning Section Orderer");
+    sectionOrder = runBalancedPartitioning(
+        ctx, ctx.arg.bpStartupFunctionSort ? ctx.arg.irpgoProfilePath : "",
+        ctx.arg.bpFunctionOrderForCompression,
+        ctx.arg.bpDataOrderForCompression,
+        ctx.arg.bpCompressionSortStartupFunctions,
+        ctx.arg.bpVerboseSectionOrderer);
+  } else if (!ctx.arg.callGraphProfile.empty()) {
     sectionOrder = computeCallGraphProfileOrder(ctx);
+  }
 
   if (ctx.arg.symbolOrderingFile.empty())
     return sectionOrder;
diff --git a/lld/include/lld/Common/BPSectionOrdererBase.inc b/lld/include/lld/Common/BPSectionOrdererBase.inc
index 6c7c4a188d1320..83b87f4c7817db 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.inc
+++ b/lld/include/lld/Common/BPSectionOrdererBase.inc
@@ -64,6 +64,10 @@ template <class D> struct BPOrderer {
                     const DenseMap<CachedHashStringRef, std::set<unsigned>>
                         &rootSymbolToSectionIdxs)
       -> llvm::DenseMap<const Section *, int>;
+
+  std::optional<StringRef> static getResolvedLinkageName(StringRef name) {
+    return {};
+  }
 };
 } // namespace lld
 
@@ -98,10 +102,11 @@ static SmallVector<std::pair<unsigned, UtilityNodes>> getUnsForCompression(
     // Merge sections that are nearly identical
     SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> newSectionHashes;
     DenseMap<uint64_t, unsigned> wholeHashToSectionIdx;
+    unsigned threshold = sectionHashes.size() > 10000 ? 5 : 0;
     for (auto &[sectionIdx, hashes] : sectionHashes) {
       uint64_t wholeHash = 0;
       for (auto hash : hashes)
-        if (hashFrequency[hash] > 5)
+        if (hashFrequency[hash] > threshold)
           wholeHash ^= hash;
       auto [it, wasInserted] =
           wholeHashToSectionIdx.insert(std::make_pair(wholeHash, sectionIdx));
diff --git a/lld/test/ELF/bp-section-orderer-stress.s b/lld/test/ELF/bp-section-orderer-stress.s
new file mode 100644
index 00000000000000..da9670933949f9
--- /dev/null
+++ b/lld/test/ELF/bp-section-orderer-stress.s
@@ -0,0 +1,104 @@
+# REQUIRES: aarch64
+
+## Generate a large test case and check that the output is deterministic.
+
+# RUN: %python %s %t.s %t.proftext
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64 %t.s -o %t.o
+# RUN: llvm-profdata merge %t.proftext -o %t.profdata
+
+# RUN: ld.lld --icf=all -o %t1.o %t.o --irpgo-profile=%t.profdata --bp-startup-sort=function --bp-compression-sort-startup-functions --bp-compression-sort=both
+# RUN: ld.lld --icf=all -o %t2.o %t.o --irpgo-profile=%t.profdata --bp-startup-sort=function --bp-compression-sort-startup-functions --bp-compression-sort=both
+# RUN: cmp %t1.o %t2.o
+
+import random
+import sys
+
+assembly_filepath = sys.argv[1]
+proftext_filepath = sys.argv[2]
+
+random.seed(1234)
+num_functions = 1000
+num_data = 100
+num_traces = 10
+
+function_names = [f"f{n}" for n in range(num_functions)]
+data_names = [f"d{n}" for n in range(num_data)]
+profiled_functions = function_names[: int(num_functions / 2)]
+
+function_contents = [
+    f"""
+{name}:
+  add w0, w0, #{i % 4096}
+  add w1, w1, #{i % 10}
+  add w2, w0, #{i % 20}
+  adrp x3, {name}
+  ret
+"""
+    for i, name in enumerate(function_names)
+]
+
+data_contents = [
+      f"""
+{name}:
+  .ascii "s{i % 2}-{i % 3}-{i % 5}"
+  .xword {name}
+"""
+    for i, name in enumerate(data_names)
+]
+
+trace_contents = [
+    f"""
+# Weight
+1
+{", ".join(random.sample(profiled_functions, len(profiled_functions)))}
+"""
+    for i in range(num_traces)
+]
+
+profile_contents = [
+    f"""
+{name}
+# Func Hash:
+{i}
+# Num Counters:
+1
+# Counter Values:
+1
+"""
+    for i, name in enumerate(profiled_functions)
+]
+
+with open(assembly_filepath, "w") as f:
+    f.write(
+        f"""
+.text
+.globl _start
+
+_start:
+  ret
+
+{"".join(function_contents)}
+
+.data
+{"".join(data_contents)}
+
+"""
+    )
+
+with open(proftext_filepath, "w") as f:
+    f.write(
+        f"""
+:ir
+:temporal_prof_traces
+
+# Num Traces
+{num_traces}
+# Trace Stream Size:
+{num_traces}
+
+{"".join(trace_contents)}
+
+{"".join(profile_contents)}
+"""
+    )
diff --git a/lld/test/ELF/bp-section-orderer.s b/lld/test/ELF/bp-section-orderer.s
new file mode 100644
index 00000000000000..2e18107c02ca30
--- /dev/null
+++ b/lld/test/ELF/bp-section-orderer.s
@@ -0,0 +1,335 @@
+# REQUIRES: aarch64
+# RUN: rm -rf %t && split-file %s %t && cd %t
+
+## Check for incompatible cases
+# RUN: not ld.lld %t --irpgo-profile=/dev/null --bp-startup-sort=function --call-graph-ordering-file=/dev/null 2>&1 | FileCheck %s --check-prefix=BP-STARTUP-CALLGRAPH-ERR
+# RUN: not ld.lld --bp-compression-sort=function --call-graph-ordering-file /dev/null 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-CALLGRAPH-ERR
+# RUN: not ld.lld --bp-startup-sort=function 2>&1 | FileCheck %s --check-prefix=BP-STARTUP-ERR
+# RUN: not ld.lld --bp-compression-sort-startup-functions 2>&1 | FileCheck %s --check-prefix=BP-STARTUP-COMPRESSION-ERR
+# RUN: not ld.lld --bp-startup-sort=invalid --bp-compression-sort=invalid 2>&1 | FileCheck %s --check-prefix=BP-INVALID
+
+# BP-STARTUP-CALLGRAPH-ERR: error: --bp-startup-sort=function is incompatible with --call-graph-ordering-file
+# BP-COMPRESSION-CALLGRAPH-ERR: error: --bp-compression-sort is incompatible with --call-graph-ordering-file
+# BP-STARTUP-ERR: error: --bp-startup-sort=function must be used with --irpgo-profile
+# BP-STARTUP-COMPRESSION-ERR: error: --bp-compression-sort-startup-functions must be used with --irpgo-profile
+
+# BP-INVALID: error: --bp-compression-sort=: expected [none|function|data|both]
+# BP-INVALID: error: --bp-startup-sort=: expected [none|function]
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64 a.s -o a.o
+# RUN: llvm-profdata merge a.proftext -o a.profdata
+# RUN: ld.lld a.o --irpgo-profile=a.profdata --bp-startup-sort=function --verbose-bp-section-orderer --icf=all 2>&1 | FileCheck %s --check-prefix=STARTUP-FUNC-ORDER
+
+# STARTUP-FUNC-ORDER: Ordered 3 sections using balanced partitioning
+# STARTUP-FUNC-ORDER: Total area under the page fault curve: 3.
+
+# RUN: ld.lld -o out.s a.o --irpgo-profile=a.profdata --bp-startup-sort=function
+# RUN: llvm-nm -jn out.s | tr '\n' , | FileCheck %s --check-prefix=STARTUP
+# STARTUP: s5,s4,s3,s2,s1,A,B,C,F,E,D,_start,d4,d3,d2,d1,{{$}}
+
+# RUN: ld.lld -o out.os a.o --irpgo-profile=a.profdata --bp-startup-sort=function --symbol-ordering-file a.txt
+# RUN: llvm-nm -jn out.os | tr '\n' , | FileCheck %s --check-prefix=ORDER-STARTUP
+# ORDER-STARTUP: s2,s1,s5,s4,s3,A,F,E,D,B,C,_start,d3,d2,d4,d1,{{$}}
+
+# RUN: ld.lld -o out.cf a.o --verbose-bp-section-orderer --bp-compression-sort=function 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-FUNC
+# RUN: llvm-nm -jn out.cf | tr '\n' , | FileCheck %s --check-prefix=CFUNC
+# CFUNC: s5,s4,s3,s2,s1,F,C,E,D,B,A,_start,d4,d3,d2,d1,{{$}}
+
+# RUN: ld.lld -o out.cd a.o --verbose-bp-section-orderer --bp-compression-sort=data 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-DATA
+# RUN: llvm-nm -jn out.cd | tr '\n' , | FileCheck %s --check-prefix=CDATA
+# CDATA: s5,s3,s4,s2,s1,F,C,E,D,B,A,_start,d4,d1,d3,d2,{{$}}
+
+# RUN: ld.lld -o out.cb a.o --verbose-bp-section-orderer --bp-compression-sort=both 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-BOTH
+# RUN: llvm-nm -jn out.cb | tr '\n' , | FileCheck %s --check-prefix=CDATA
+
+# RUN: ld.lld -o out.cbs a.o --verbose-bp-section-orderer --bp-compression-sort=both --irpgo-profile=a.profdata --bp-startup-sort=function 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-BOTH
+# RUN: llvm-nm -jn out.cbs | tr '\n' , | FileCheck %s --check-prefix=CBOTH-STARTUP
+# CBOTH-STARTUP: s5,s3,s4,s2,s1,A,B,C,F,E,D,_start,d4,d1,d3,d2,{{$}}
+
+# BP-COMPRESSION-FUNC: Ordered 7 sections using balanced partitioning
+# BP-COMPRESSION-DATA: Ordered 9 sections using balanced partitioning
+# BP-COMPRESSION-BOTH: Ordered 16 sections using balanced partitioning
+
+#--- a.proftext
+:ir
+:temporal_prof_traces
+# Num Traces
+1
+# Trace Stream Size:
+1
+# Weight
+1
+A, B, C
+
+A
+# Func Hash:
+1111
+# Num Counters:
+1
+# Counter Values:
+1
+
+B
+# Func Hash:
+2222
+# Num Counters:
+1
+# Counter Values:
+1
+
+C
+# Func Hash:
+3333
+# Num Counters:
+1
+# Counter Values:
+1
+
+D
+# Func Hash:
+4444
+# Num Counters:
+1
+# Counter Values:
+1
+
+#--- a.txt
+A
+F
+E
+D
+s2
+s1
+d3
+d2
+
+#--- a.c
+const char s5[] = "engineering";
+const char s4[] = "computer program";
+const char s3[] = "hardware engineer";
+const char s2[] = "computer software";
+const char s1[] = "hello world program";
+int d4[] = {1,2,3,4,5,6};
+int d3[] = {5,6,7,8};
+int d2[] = {7,8,9,10};
+int d1[] = {3,4,5,6};
+
+int C(int a);
+int B(int a);
+void A();
+
+int F(int a) { return C(a + 3); }
+int E(int a) { return C(a + 2); }
+int D(int a) { return B(a + 2); }
+int C(int a) { A(); return a + 2; }
+int B(int a) { A(); return a + 1; }
+void A() {}
+
+int _start() { return 0; }
+
+#--- gen
+clang --target=aarch64-linux-gnu -O0 -ffunction-sect...
[truncated]

MaskRay · 2025-02-03T19:17:57Z

lld/include/lld/Common/BPSectionOrdererBase.inc

@@ -98,10 +102,11 @@ static SmallVector<std::pair<unsigned, UtilityNodes>> getUnsForCompression(
    // Merge sections that are nearly identical
    SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> newSectionHashes;
    DenseMap<uint64_t, unsigned> wholeHashToSectionIdx;
+    unsigned threshold = sectionHashes.size() > 10000 ? 5 : 0;


@ellishg With the existing threshold 5, wholeHash is almost always 0 for smaller applications. I guess 5 was picked to improve the performance of larger applications.

Yeah, 5 was picked because we saw good results for large binaries. If we wanted to generalize this we could compute something like the P90 of the hash frequency, which would be the set of hashes that are repeated more than 90% of the hashes. But I would like to experiment with this.

I think it's important for tests to show that --bp-compression-sort=data actually does something. Therefore, the threshold has to be lowered so that data sections will not all be considered duplicates....

I think sectionHashes.size() > 10000 will work for large applications like yours.

(In the test, --bp-compression-sort=both and --bp-compression-sort=data exhibit the same behavior. Ideally 'both' should actually do something with functions. Reserved for a future patch.)

I built some large binaries with this change and I didn't see any large differences, so I think this is ok. I also realized that there is a bug. If wholeHash is zero, we should not insert into duplicateSectionIdxs because they have no hashes in common.

The more I think about this the more I realize this might be able to be improved quite a bit. I can imagine a set of small outlined functions that differ in one instructions. Ideally these should be in duplicateSectionIdxs, but this code won't catch that case.

llvm-ci · 2025-02-04T17:33:51Z

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building lld at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/14070

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Analysis/live-stmts.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/21/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -w -analyzer-checker=debug.DumpLiveExprs /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp 2>&1   | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/clang -cc1 -internal-isystem /Users/buildbot/buildbot-root/aarch64-darwin/build/lib/clang/21/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -w -analyzer-checker=debug.DumpLiveExprs /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp:239:16: �[0m�[0;1;31merror: �[0m�[1mCHECK-EMPTY: is not on the line after the previous match
�[0m// CHECK-EMPTY:
�[0;1;32m               ^
�[0m�[1m<stdin>:180:1: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m
�[0;1;32m^
�[0m�[1m<stdin>:177:1: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0m
�[0;1;32m^
�[0m�[1m<stdin>:178:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0mImplicitCastExpr 0x13a113b78 '_Bool' <LValueToRValue>
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/clang/test/Analysis/live-stmts.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m           1: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m           2: �[0m�[1m�[0;1;46m�[0m[ B0 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:21      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           3: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:22      ^
�[0m�[0;1;30m           4: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:23      ^
�[0m�[0;1;30m           5: �[0m�[1m�[0;1;46m�[0m[ B1 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:24      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           6: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:25      ^
�[0m�[0;1;30m           7: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:26      ^
�[0m�[0;1;30m           8: �[0m�[1m�[0;1;46m�[0m[ B2 (live expressions at block exit) ]�[0;1;46m �[0m
�[0;1;32mcheck:27      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m           9: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:28      ^
�[0m�[0;1;30m          10: �[0m�[1m�[0;1;46m�[0mDeclRefExpr 0x13a1116e0 'int' lvalue ParmVar 0x13a0f9070 'y' 'int'�[0;1;46m �[0m
�[0;1;32mnext:29       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;30m          11: �[0m�[1m�[0;1;46m�[0m �[0m
�[0;1;32mempty:30      ^
�[0m�[0;1;30m          12: �[0m�[1m�[0;1;46m�[0mDeclRefExpr 0x13a111700 'int' lvalue ParmVar 0x13a0f90f0 'z' 'int'�[0;1;46m �[0m
...

llvm-ci · 2025-02-04T20:37:15Z

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building lld at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/21880

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
...
PASS: UBSan-Standalone-lld-x86_64 :: TestCases/Integer/div-overflow.cpp (94076 of 98120)
PASS: UBSan-Standalone-lld-x86_64 :: TestCases/Integer/bit-int.c (94077 of 98120)
PASS: UBSan-Standalone-lld-x86_64 :: TestCases/Integer/mul-overflow.cpp (94078 of 98120)
UNSUPPORTED: UBSan-Standalone-lld-x86_64 :: TestCases/Integer/summary.cpp (94079 of 98120)
PASS: UBSan-Minimal-x86_64 :: TestCases/local_bounds.cpp (94080 of 98120)
PASS: UBSan-MemorySanitizer-lld-x86_64 :: TestCases/Pointer/nullptr-and-nonzero-offset-variable.cpp (94081 of 98120)
PASS: UBSan-MemorySanitizer-lld-x86_64 :: TestCases/ImplicitConversion/signed-integer-truncation-ignorelist.c (94082 of 98120)
PASS: UBSan-Standalone-lld-x86_64 :: TestCases/Integer/add-overflow.cpp (94083 of 98120)
PASS: UBSan-MemorySanitizer-x86_64 :: TestCases/Misc/coverage-levels.cpp (94084 of 98120)
TIMEOUT: MLIR :: Examples/standalone/test.toy (94085 of 98120)
******************** TEST 'MLIR :: Examples/standalone/test.toy' FAILED ********************
Exit Code: 1
Timeout: Reached timeout of 60 seconds

Command Output (stdout):
--
# RUN: at line 1
"/etc/cmake/bin/cmake" "/build/buildbot/premerge-monolithic-linux/llvm-project/mlir/examples/standalone" -G "Ninja"  -DCMAKE_CXX_COMPILER=/usr/bin/clang++  -DCMAKE_C_COMPILER=/usr/bin/clang   -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/build/buildbot/premerge-monolithic-linux/build/lib/cmake/mlir  -DLLVM_USE_LINKER=lld  -DPython3_EXECUTABLE="/usr/bin/python3.10"
# executed command: /etc/cmake/bin/cmake /build/buildbot/premerge-monolithic-linux/llvm-project/mlir/examples/standalone -G Ninja -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/build/buildbot/premerge-monolithic-linux/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld -DPython3_EXECUTABLE=/usr/bin/python3.10
# .---command stdout------------
# | -- The CXX compiler identification is Clang 16.0.6
# | -- The C compiler identification is Clang 16.0.6
# | -- Detecting CXX compiler ABI info
# | -- Detecting CXX compiler ABI info - done
# | -- Check for working CXX compiler: /usr/bin/clang++ - skipped
# | -- Detecting CXX compile features
# | -- Detecting CXX compile features - done
# | -- Detecting C compiler ABI info
# | -- Detecting C compiler ABI info - done
# | -- Check for working C compiler: /usr/bin/clang - skipped
# | -- Detecting C compile features
# | -- Detecting C compile features - done
# | -- Looking for histedit.h
# | -- Looking for histedit.h - found
# | -- Found LibEdit: /usr/include (found version "2.11") 
# | -- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
# | -- Found LibXml2: /usr/lib/x86_64-linux-gnu/libxml2.so (found version "2.9.13") 
# | -- Using MLIRConfig.cmake in: /build/buildbot/premerge-monolithic-linux/build/lib/cmake/mlir
# | -- Using LLVMConfig.cmake in: /build/buildbot/premerge-monolithic-linux/build/lib/cmake/llvm
# | -- Linker detection: unknown
# | -- Performing Test LLVM_LIBSTDCXX_MIN
# | -- Performing Test LLVM_LIBSTDCXX_MIN - Success
# | -- Performing Test LLVM_LIBSTDCXX_SOFT_ERROR
# | -- Performing Test LLVM_LIBSTDCXX_SOFT_ERROR - Success
# | -- Performing Test CXX_SUPPORTS_CUSTOM_LINKER
# | -- Performing Test CXX_SUPPORTS_CUSTOM_LINKER - Success
# | -- Performing Test C_SUPPORTS_FPIC
# | -- Performing Test C_SUPPORTS_FPIC - Success
# | -- Performing Test CXX_SUPPORTS_FPIC

Reland llvm#120514 after 2f6e3df fixed iteration order issue and libstdc++/libc++ differences. --- Both options instruct the linker to optimize section layout with the following goals: * `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size. * `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a temporal profile file to reduce page faults during program startup. The linker determines the section order by considering three groups: * Function sections ordered according to the temporal profile (`--irpgo-profile=`), prioritizing early-accessed and frequently accessed functions. * Function sections. Sections containing similar functions are placed together, maximizing compression opportunities. * Data sections. Similar data sections are placed together. Within each group, the sections are ordered using the Balanced Partitioning algorithm. The linker constructs a bipartite graph with two sets of vertices: sections and utility vertices. * For profile-guided function sections: + The number of utility vertices is determined by the symbol order within the profile file. + If `--bp-compression-sort-startup-functions` is specified, extra utility vertices are allocated to prioritize nearby function similarity. * For sections ordered for compression: Utility vertices are determined by analyzing k-mers of the section content and relocations. The call graph profile is disabled during this optimization. When `--symbol-ordering-file=` is specified, sections described in that file are placed earlier. Co-authored-by: Pengying Xu <[email protected]>

llvmbot added lld lld:ELF labels Feb 3, 2025

MaskRay commented Feb 3, 2025

View reviewed changes

Colibrow approved these changes Feb 4, 2025

View reviewed changes

MaskRay merged commit 6ab034b into llvm:main Feb 4, 2025
11 checks passed

MaskRay deleted the lld-bp branch February 4, 2025 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ELF] Add BPSectionOrderer options (#120514) #125559

[ELF] Add BPSectionOrderer options (#120514) #125559

MaskRay commented Feb 3, 2025 •

edited

Loading

MaskRay commented Feb 3, 2025

llvmbot commented Feb 3, 2025 •

edited

Loading

MaskRay Feb 3, 2025

ellishg Feb 3, 2025

MaskRay Feb 4, 2025

ellishg Feb 4, 2025

llvm-ci commented Feb 4, 2025

llvm-ci commented Feb 4, 2025

[ELF] Add BPSectionOrderer options (#120514) #125559

[ELF] Add BPSectionOrderer options (#120514) #125559

Conversation

MaskRay commented Feb 3, 2025 • edited Loading

MaskRay commented Feb 3, 2025

llvmbot commented Feb 3, 2025 • edited Loading

MaskRay Feb 3, 2025

Choose a reason for hiding this comment

ellishg Feb 3, 2025

Choose a reason for hiding this comment

MaskRay Feb 4, 2025

Choose a reason for hiding this comment

ellishg Feb 4, 2025

Choose a reason for hiding this comment

llvm-ci commented Feb 4, 2025

llvm-ci commented Feb 4, 2025

MaskRay commented Feb 3, 2025 •

edited

Loading

llvmbot commented Feb 3, 2025 •

edited

Loading