-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[ELF] Add BPSectionOrderer options (#120514) #125559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add new ELF linker options for profile-guided section ordering optimizations: - `--irpgo-profile=<file>`: Read IRPGO profile data for use with startup and compression optimizations - `--bp-startup-sort={none,function}`: Order sections based on profile data to improve star tup time - `--bp-compression-sort={none,function,data,both}`: Order sections using balanced partitioning to improve compressed size - `--bp-compression-sort-startup-functions`: Additionally optimize startup functions for compression - `--verbose-bp-section-orderer`: Print statistics about balanced partitioning section ordering Thanks to the @ellishg, @thevinster, and their team's work. --------- Reland after 2f6e3df fixed iteration order issue and libstdc++/libc++ differences. Co-authored-by: Fangrui Song <[email protected]>
@llvm/pr-subscribers-lld-elf @llvm/pr-subscribers-lld Author: Fangrui Song (MaskRay) ChangesAdd new ELF linker options for profile-guided section ordering optimizations:
Thanks to the @ellishg, @thevinster, and their team's work. Reland after 2f6e3df fixed iteration order issue and libstdc++/libc++ differences. Patch is 25.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125559.diff 10 Files Affected:
diff --git a/lld/ELF/BPSectionOrderer.cpp b/lld/ELF/BPSectionOrderer.cpp
new file mode 100644
index 00000000000000..01f77b33926f75
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.cpp
@@ -0,0 +1,95 @@
+//===- BPSectionOrderer.cpp -----------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BPSectionOrderer.h"
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "SymbolTable.h"
+#include "Symbols.h"
+#include "lld/Common/BPSectionOrdererBase.inc"
+#include "llvm/Support/Endian.h"
+
+using namespace llvm;
+using namespace lld::elf;
+
+namespace {
+struct BPOrdererELF;
+}
+template <> struct lld::BPOrdererTraits<struct BPOrdererELF> {
+ using Section = elf::InputSectionBase;
+ using Defined = elf::Defined;
+};
+namespace {
+struct BPOrdererELF : lld::BPOrderer<BPOrdererELF> {
+ DenseMap<const InputSectionBase *, Defined *> secToSym;
+
+ static uint64_t getSize(const Section &sec) { return sec.getSize(); }
+ static bool isCodeSection(const Section &sec) {
+ return sec.flags & ELF::SHF_EXECINSTR;
+ }
+ ArrayRef<Defined *> getSymbols(const Section &sec) {
+ auto it = secToSym.find(&sec);
+ if (it == secToSym.end())
+ return {};
+ return ArrayRef(it->second);
+ }
+
+ static void
+ getSectionHashes(const Section &sec, SmallVectorImpl<uint64_t> &hashes,
+ const DenseMap<const void *, uint64_t> §ionToIdx) {
+ constexpr unsigned windowSize = 4;
+
+ // Calculate content hashes: k-mers and the last k-1 bytes.
+ ArrayRef<uint8_t> data = sec.content();
+ if (data.size() >= windowSize)
+ for (size_t i = 0; i <= data.size() - windowSize; ++i)
+ hashes.push_back(support::endian::read32le(data.data() + i));
+ for (uint8_t byte : data.take_back(windowSize - 1))
+ hashes.push_back(byte);
+
+ llvm::sort(hashes);
+ hashes.erase(std::unique(hashes.begin(), hashes.end()), hashes.end());
+ }
+
+ static StringRef getSymName(const Defined &sym) { return sym.getName(); }
+ static uint64_t getSymValue(const Defined &sym) { return sym.value; }
+ static uint64_t getSymSize(const Defined &sym) { return sym.size; }
+};
+} // namespace
+
+DenseMap<const InputSectionBase *, int> elf::runBalancedPartitioning(
+ Ctx &ctx, StringRef profilePath, bool forFunctionCompression,
+ bool forDataCompression, bool compressionSortStartupFunctions,
+ bool verbose) {
+ // Collect candidate sections and associated symbols.
+ SmallVector<InputSectionBase *> sections;
+ DenseMap<CachedHashStringRef, std::set<unsigned>> rootSymbolToSectionIdxs;
+ BPOrdererELF orderer;
+
+ auto addSection = [&](Symbol &sym) {
+ auto *d = dyn_cast<Defined>(&sym);
+ if (!d)
+ return;
+ auto *sec = dyn_cast_or_null<InputSectionBase>(d->section);
+ if (!sec || sec->size == 0 || !orderer.secToSym.try_emplace(sec, d).second)
+ return;
+ rootSymbolToSectionIdxs[CachedHashStringRef(getRootSymbol(sym.getName()))]
+ .insert(sections.size());
+ sections.emplace_back(sec);
+ };
+
+ for (Symbol *sym : ctx.symtab->getSymbols())
+ addSection(*sym);
+ for (ELFFileBase *file : ctx.objectFiles)
+ for (Symbol *sym : file->getLocalSymbols())
+ addSection(*sym);
+ return orderer.computeOrder(profilePath, forFunctionCompression,
+ forDataCompression,
+ compressionSortStartupFunctions, verbose,
+ sections, rootSymbolToSectionIdxs);
+}
diff --git a/lld/ELF/BPSectionOrderer.h b/lld/ELF/BPSectionOrderer.h
new file mode 100644
index 00000000000000..a0cb1360005a6b
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.h
@@ -0,0 +1,37 @@
+//===- BPSectionOrderer.h -------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// This file uses Balanced Partitioning to order sections to improve startup
+/// time and compressed size.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLD_ELF_BPSECTION_ORDERER_H
+#define LLD_ELF_BPSECTION_ORDERER_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringRef.h"
+
+namespace lld::elf {
+struct Ctx;
+class InputSectionBase;
+
+/// Run Balanced Partitioning to find the optimal function and data order to
+/// improve startup time and compressed size.
+///
+/// It is important that -ffunction-sections and -fdata-sections compiler flags
+/// are used to ensure functions and data are in their own sections and thus
+/// can be reordered.
+llvm::DenseMap<const InputSectionBase *, int>
+runBalancedPartitioning(Ctx &ctx, llvm::StringRef profilePath,
+ bool forFunctionCompression, bool forDataCompression,
+ bool compressionSortStartupFunctions, bool verbose);
+
+} // namespace lld::elf
+
+#endif
diff --git a/lld/ELF/CMakeLists.txt b/lld/ELF/CMakeLists.txt
index 83d816ddb0601e..ec3f6382282b1f 100644
--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -37,6 +37,7 @@ add_lld_library(lldELF
Arch/X86.cpp
Arch/X86_64.cpp
ARMErrataFix.cpp
+ BPSectionOrderer.cpp
CallGraphSort.cpp
DWARF.cpp
Driver.cpp
@@ -72,6 +73,7 @@ add_lld_library(lldELF
Object
Option
Passes
+ ProfileData
Support
TargetParser
TransformUtils
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index c2aadb2cef5200..a1ac0973bcc402 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -264,6 +264,12 @@ struct Config {
bool armBe8 = false;
BsymbolicKind bsymbolic = BsymbolicKind::None;
CGProfileSortKind callGraphProfileSort;
+ llvm::StringRef irpgoProfilePath;
+ bool bpStartupFunctionSort = false;
+ bool bpCompressionSortStartupFunctions = false;
+ bool bpFunctionOrderForCompression = false;
+ bool bpDataOrderForCompression = false;
+ bool bpVerboseSectionOrderer = false;
bool checkSections;
bool checkDynamicRelocs;
std::optional<llvm::DebugCompressionType> compressDebugSections;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index acbc97b331b0e1..a14d3629f6eec0 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1121,6 +1121,53 @@ static CGProfileSortKind getCGProfileSortKind(Ctx &ctx,
return CGProfileSortKind::None;
}
+static void parseBPOrdererOptions(Ctx &ctx, opt::InputArgList &args) {
+ if (auto *arg = args.getLastArg(OPT_bp_compression_sort)) {
+ StringRef s = arg->getValue();
+ if (s == "function") {
+ ctx.arg.bpFunctionOrderForCompression = true;
+ } else if (s == "data") {
+ ctx.arg.bpDataOrderForCompression = true;
+ } else if (s == "both") {
+ ctx.arg.bpFunctionOrderForCompression = true;
+ ctx.arg.bpDataOrderForCompression = true;
+ } else if (s != "none") {
+ ErrAlways(ctx) << arg->getSpelling()
+ << ": expected [none|function|data|both]";
+ }
+ if (s != "none" && args.hasArg(OPT_call_graph_ordering_file))
+ ErrAlways(ctx) << "--bp-compression-sort is incompatible with "
+ "--call-graph-ordering-file";
+ }
+ if (auto *arg = args.getLastArg(OPT_bp_startup_sort)) {
+ StringRef s = arg->getValue();
+ if (s == "function") {
+ ctx.arg.bpStartupFunctionSort = true;
+ } else if (s != "none") {
+ ErrAlways(ctx) << arg->getSpelling() << ": expected [none|function]";
+ }
+ if (s != "none" && args.hasArg(OPT_call_graph_ordering_file))
+ ErrAlways(ctx) << "--bp-startup-sort=function is incompatible with "
+ "--call-graph-ordering-file";
+ }
+
+ ctx.arg.bpCompressionSortStartupFunctions =
+ args.hasFlag(OPT_bp_compression_sort_startup_functions,
+ OPT_no_bp_compression_sort_startup_functions, false);
+ ctx.arg.bpVerboseSectionOrderer = args.hasArg(OPT_verbose_bp_section_orderer);
+
+ ctx.arg.irpgoProfilePath = args.getLastArgValue(OPT_irpgo_profile);
+ if (ctx.arg.irpgoProfilePath.empty()) {
+ if (ctx.arg.bpStartupFunctionSort)
+ ErrAlways(ctx) << "--bp-startup-sort=function must be used with "
+ "--irpgo-profile";
+ if (ctx.arg.bpCompressionSortStartupFunctions)
+ ErrAlways(ctx)
+ << "--bp-compression-sort-startup-functions must be used with "
+ "--irpgo-profile";
+ }
+}
+
static DebugCompressionType getCompressionType(Ctx &ctx, StringRef s,
StringRef option) {
DebugCompressionType type = StringSwitch<DebugCompressionType>(s)
@@ -1262,6 +1309,7 @@ static void readConfigs(Ctx &ctx, opt::InputArgList &args) {
ctx.arg.bsymbolic = BsymbolicKind::All;
}
ctx.arg.callGraphProfileSort = getCGProfileSortKind(ctx, args);
+ parseBPOrdererOptions(ctx, args);
ctx.arg.checkSections =
args.hasFlag(OPT_check_sections, OPT_no_check_sections, true);
ctx.arg.chroot = args.getLastArgValue(OPT_chroot);
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index c31875305952fb..80032490da0de4 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -141,6 +141,19 @@ def call_graph_profile_sort: JJ<"call-graph-profile-sort=">,
def : FF<"no-call-graph-profile-sort">, Alias<call_graph_profile_sort>, AliasArgs<["none"]>,
Flags<[HelpHidden]>;
+defm irpgo_profile: EEq<"irpgo-profile",
+ "Read a temporary profile file for use with --bp-startup-sort=">;
+def bp_compression_sort: JJ<"bp-compression-sort=">, MetaVarName<"[none,function,data,both]">,
+ HelpText<"Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size">;
+def bp_startup_sort: JJ<"bp-startup-sort=">, MetaVarName<"[none,function]">,
+ HelpText<"Utilize a temporal profile file to reduce page faults during program startup">;
+
+// Auxiliary options related to balanced partition
+defm bp_compression_sort_startup_functions: BB<"bp-compression-sort-startup-functions",
+ "When --irpgo-profile is pecified, prioritize function similarity for compression in addition to startup time", "">;
+def verbose_bp_section_orderer: FF<"verbose-bp-section-orderer">,
+ HelpText<"Print information on balanced partitioning">;
+
// --chroot doesn't have a help text because it is an internal option.
def chroot: Separate<["--"], "chroot">;
diff --git a/lld/ELF/Writer.cpp b/lld/ELF/Writer.cpp
index 3ba1cdbce572b7..49077055e84fbf 100644
--- a/lld/ELF/Writer.cpp
+++ b/lld/ELF/Writer.cpp
@@ -9,6 +9,7 @@
#include "Writer.h"
#include "AArch64ErrataFix.h"
#include "ARMErrataFix.h"
+#include "BPSectionOrderer.h"
#include "CallGraphSort.h"
#include "Config.h"
#include "InputFiles.h"
@@ -1080,8 +1081,18 @@ static void maybeShuffle(Ctx &ctx,
// that don't appear in the order file.
static DenseMap<const InputSectionBase *, int> buildSectionOrder(Ctx &ctx) {
DenseMap<const InputSectionBase *, int> sectionOrder;
- if (!ctx.arg.callGraphProfile.empty())
+ if (ctx.arg.bpStartupFunctionSort || ctx.arg.bpFunctionOrderForCompression ||
+ ctx.arg.bpDataOrderForCompression) {
+ TimeTraceScope timeScope("Balanced Partitioning Section Orderer");
+ sectionOrder = runBalancedPartitioning(
+ ctx, ctx.arg.bpStartupFunctionSort ? ctx.arg.irpgoProfilePath : "",
+ ctx.arg.bpFunctionOrderForCompression,
+ ctx.arg.bpDataOrderForCompression,
+ ctx.arg.bpCompressionSortStartupFunctions,
+ ctx.arg.bpVerboseSectionOrderer);
+ } else if (!ctx.arg.callGraphProfile.empty()) {
sectionOrder = computeCallGraphProfileOrder(ctx);
+ }
if (ctx.arg.symbolOrderingFile.empty())
return sectionOrder;
diff --git a/lld/include/lld/Common/BPSectionOrdererBase.inc b/lld/include/lld/Common/BPSectionOrdererBase.inc
index 6c7c4a188d1320..83b87f4c7817db 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.inc
+++ b/lld/include/lld/Common/BPSectionOrdererBase.inc
@@ -64,6 +64,10 @@ template <class D> struct BPOrderer {
const DenseMap<CachedHashStringRef, std::set<unsigned>>
&rootSymbolToSectionIdxs)
-> llvm::DenseMap<const Section *, int>;
+
+ std::optional<StringRef> static getResolvedLinkageName(StringRef name) {
+ return {};
+ }
};
} // namespace lld
@@ -98,10 +102,11 @@ static SmallVector<std::pair<unsigned, UtilityNodes>> getUnsForCompression(
// Merge sections that are nearly identical
SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> newSectionHashes;
DenseMap<uint64_t, unsigned> wholeHashToSectionIdx;
+ unsigned threshold = sectionHashes.size() > 10000 ? 5 : 0;
for (auto &[sectionIdx, hashes] : sectionHashes) {
uint64_t wholeHash = 0;
for (auto hash : hashes)
- if (hashFrequency[hash] > 5)
+ if (hashFrequency[hash] > threshold)
wholeHash ^= hash;
auto [it, wasInserted] =
wholeHashToSectionIdx.insert(std::make_pair(wholeHash, sectionIdx));
diff --git a/lld/test/ELF/bp-section-orderer-stress.s b/lld/test/ELF/bp-section-orderer-stress.s
new file mode 100644
index 00000000000000..da9670933949f9
--- /dev/null
+++ b/lld/test/ELF/bp-section-orderer-stress.s
@@ -0,0 +1,104 @@
+# REQUIRES: aarch64
+
+## Generate a large test case and check that the output is deterministic.
+
+# RUN: %python %s %t.s %t.proftext
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64 %t.s -o %t.o
+# RUN: llvm-profdata merge %t.proftext -o %t.profdata
+
+# RUN: ld.lld --icf=all -o %t1.o %t.o --irpgo-profile=%t.profdata --bp-startup-sort=function --bp-compression-sort-startup-functions --bp-compression-sort=both
+# RUN: ld.lld --icf=all -o %t2.o %t.o --irpgo-profile=%t.profdata --bp-startup-sort=function --bp-compression-sort-startup-functions --bp-compression-sort=both
+# RUN: cmp %t1.o %t2.o
+
+import random
+import sys
+
+assembly_filepath = sys.argv[1]
+proftext_filepath = sys.argv[2]
+
+random.seed(1234)
+num_functions = 1000
+num_data = 100
+num_traces = 10
+
+function_names = [f"f{n}" for n in range(num_functions)]
+data_names = [f"d{n}" for n in range(num_data)]
+profiled_functions = function_names[: int(num_functions / 2)]
+
+function_contents = [
+ f"""
+{name}:
+ add w0, w0, #{i % 4096}
+ add w1, w1, #{i % 10}
+ add w2, w0, #{i % 20}
+ adrp x3, {name}
+ ret
+"""
+ for i, name in enumerate(function_names)
+]
+
+data_contents = [
+ f"""
+{name}:
+ .ascii "s{i % 2}-{i % 3}-{i % 5}"
+ .xword {name}
+"""
+ for i, name in enumerate(data_names)
+]
+
+trace_contents = [
+ f"""
+# Weight
+1
+{", ".join(random.sample(profiled_functions, len(profiled_functions)))}
+"""
+ for i in range(num_traces)
+]
+
+profile_contents = [
+ f"""
+{name}
+# Func Hash:
+{i}
+# Num Counters:
+1
+# Counter Values:
+1
+"""
+ for i, name in enumerate(profiled_functions)
+]
+
+with open(assembly_filepath, "w") as f:
+ f.write(
+ f"""
+.text
+.globl _start
+
+_start:
+ ret
+
+{"".join(function_contents)}
+
+.data
+{"".join(data_contents)}
+
+"""
+ )
+
+with open(proftext_filepath, "w") as f:
+ f.write(
+ f"""
+:ir
+:temporal_prof_traces
+
+# Num Traces
+{num_traces}
+# Trace Stream Size:
+{num_traces}
+
+{"".join(trace_contents)}
+
+{"".join(profile_contents)}
+"""
+ )
diff --git a/lld/test/ELF/bp-section-orderer.s b/lld/test/ELF/bp-section-orderer.s
new file mode 100644
index 00000000000000..2e18107c02ca30
--- /dev/null
+++ b/lld/test/ELF/bp-section-orderer.s
@@ -0,0 +1,335 @@
+# REQUIRES: aarch64
+# RUN: rm -rf %t && split-file %s %t && cd %t
+
+## Check for incompatible cases
+# RUN: not ld.lld %t --irpgo-profile=/dev/null --bp-startup-sort=function --call-graph-ordering-file=/dev/null 2>&1 | FileCheck %s --check-prefix=BP-STARTUP-CALLGRAPH-ERR
+# RUN: not ld.lld --bp-compression-sort=function --call-graph-ordering-file /dev/null 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-CALLGRAPH-ERR
+# RUN: not ld.lld --bp-startup-sort=function 2>&1 | FileCheck %s --check-prefix=BP-STARTUP-ERR
+# RUN: not ld.lld --bp-compression-sort-startup-functions 2>&1 | FileCheck %s --check-prefix=BP-STARTUP-COMPRESSION-ERR
+# RUN: not ld.lld --bp-startup-sort=invalid --bp-compression-sort=invalid 2>&1 | FileCheck %s --check-prefix=BP-INVALID
+
+# BP-STARTUP-CALLGRAPH-ERR: error: --bp-startup-sort=function is incompatible with --call-graph-ordering-file
+# BP-COMPRESSION-CALLGRAPH-ERR: error: --bp-compression-sort is incompatible with --call-graph-ordering-file
+# BP-STARTUP-ERR: error: --bp-startup-sort=function must be used with --irpgo-profile
+# BP-STARTUP-COMPRESSION-ERR: error: --bp-compression-sort-startup-functions must be used with --irpgo-profile
+
+# BP-INVALID: error: --bp-compression-sort=: expected [none|function|data|both]
+# BP-INVALID: error: --bp-startup-sort=: expected [none|function]
+
+# RUN: llvm-mc -filetype=obj -triple=aarch64 a.s -o a.o
+# RUN: llvm-profdata merge a.proftext -o a.profdata
+# RUN: ld.lld a.o --irpgo-profile=a.profdata --bp-startup-sort=function --verbose-bp-section-orderer --icf=all 2>&1 | FileCheck %s --check-prefix=STARTUP-FUNC-ORDER
+
+# STARTUP-FUNC-ORDER: Ordered 3 sections using balanced partitioning
+# STARTUP-FUNC-ORDER: Total area under the page fault curve: 3.
+
+# RUN: ld.lld -o out.s a.o --irpgo-profile=a.profdata --bp-startup-sort=function
+# RUN: llvm-nm -jn out.s | tr '\n' , | FileCheck %s --check-prefix=STARTUP
+# STARTUP: s5,s4,s3,s2,s1,A,B,C,F,E,D,_start,d4,d3,d2,d1,{{$}}
+
+# RUN: ld.lld -o out.os a.o --irpgo-profile=a.profdata --bp-startup-sort=function --symbol-ordering-file a.txt
+# RUN: llvm-nm -jn out.os | tr '\n' , | FileCheck %s --check-prefix=ORDER-STARTUP
+# ORDER-STARTUP: s2,s1,s5,s4,s3,A,F,E,D,B,C,_start,d3,d2,d4,d1,{{$}}
+
+# RUN: ld.lld -o out.cf a.o --verbose-bp-section-orderer --bp-compression-sort=function 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-FUNC
+# RUN: llvm-nm -jn out.cf | tr '\n' , | FileCheck %s --check-prefix=CFUNC
+# CFUNC: s5,s4,s3,s2,s1,F,C,E,D,B,A,_start,d4,d3,d2,d1,{{$}}
+
+# RUN: ld.lld -o out.cd a.o --verbose-bp-section-orderer --bp-compression-sort=data 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-DATA
+# RUN: llvm-nm -jn out.cd | tr '\n' , | FileCheck %s --check-prefix=CDATA
+# CDATA: s5,s3,s4,s2,s1,F,C,E,D,B,A,_start,d4,d1,d3,d2,{{$}}
+
+# RUN: ld.lld -o out.cb a.o --verbose-bp-section-orderer --bp-compression-sort=both 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-BOTH
+# RUN: llvm-nm -jn out.cb | tr '\n' , | FileCheck %s --check-prefix=CDATA
+
+# RUN: ld.lld -o out.cbs a.o --verbose-bp-section-orderer --bp-compression-sort=both --irpgo-profile=a.profdata --bp-startup-sort=function 2>&1 | FileCheck %s --check-prefix=BP-COMPRESSION-BOTH
+# RUN: llvm-nm -jn out.cbs | tr '\n' , | FileCheck %s --check-prefix=CBOTH-STARTUP
+# CBOTH-STARTUP: s5,s3,s4,s2,s1,A,B,C,F,E,D,_start,d4,d1,d3,d2,{{$}}
+
+# BP-COMPRESSION-FUNC: Ordered 7 sections using balanced partitioning
+# BP-COMPRESSION-DATA: Ordered 9 sections using balanced partitioning
+# BP-COMPRESSION-BOTH: Ordered 16 sections using balanced partitioning
+
+#--- a.proftext
+:ir
+:temporal_prof_traces
+# Num Traces
+1
+# Trace Stream Size:
+1
+# Weight
+1
+A, B, C
+
+A
+# Func Hash:
+1111
+# Num Counters:
+1
+# Counter Values:
+1
+
+B
+# Func Hash:
+2222
+# Num Counters:
+1
+# Counter Values:
+1
+
+C
+# Func Hash:
+3333
+# Num Counters:
+1
+# Counter Values:
+1
+
+D
+# Func Hash:
+4444
+# Num Counters:
+1
+# Counter Values:
+1
+
+#--- a.txt
+A
+F
+E
+D
+s2
+s1
+d3
+d2
+
+#--- a.c
+const char s5[] = "engineering";
+const char s4[] = "computer program";
+const char s3[] = "hardware engineer";
+const char s2[] = "computer software";
+const char s1[] = "hello world program";
+int d4[] = {1,2,3,4,5,6};
+int d3[] = {5,6,7,8};
+int d2[] = {7,8,9,10};
+int d1[] = {3,4,5,6};
+
+int C(int a);
+int B(int a);
+void A();
+
+int F(int a) { return C(a + 3); }
+int E(int a) { return C(a + 2); }
+int D(int a) { return B(a + 2); }
+int C(int a) { A(); return a + 2; }
+int B(int a) { A(); return a + 1; }
+void A() {}
+
+int _start() { return 0; }
+
+#--- gen
+clang --target=aarch64-linux-gnu -O0 -ffunction-sect...
[truncated]
|
@@ -98,10 +102,11 @@ static SmallVector<std::pair<unsigned, UtilityNodes>> getUnsForCompression( | |||
// Merge sections that are nearly identical | |||
SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> newSectionHashes; | |||
DenseMap<uint64_t, unsigned> wholeHashToSectionIdx; | |||
unsigned threshold = sectionHashes.size() > 10000 ? 5 : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellishg With the existing threshold 5, wholeHash is almost always 0 for smaller applications. I guess 5 was picked to improve the performance of larger applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, 5 was picked because we saw good results for large binaries. If we wanted to generalize this we could compute something like the P90 of the hash frequency, which would be the set of hashes that are repeated more than 90% of the hashes. But I would like to experiment with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important for tests to show that --bp-compression-sort=data
actually does something. Therefore, the threshold has to be lowered so that data sections will not all be considered duplicates....
I think sectionHashes.size() > 10000
will work for large applications like yours.
(In the test, --bp-compression-sort=both and --bp-compression-sort=data exhibit the same behavior. Ideally 'both' should actually do something with functions. Reserved for a future patch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I built some large binaries with this change and I didn't see any large differences, so I think this is ok. I also realized that there is a bug. If wholeHash
is zero, we should not insert into duplicateSectionIdxs
because they have no hashes in common.
The more I think about this the more I realize this might be able to be improved quite a bit. I can imagine a set of small outlined functions that differ in one instructions. Ideally these should be in duplicateSectionIdxs
, but this code won't catch that case.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/14070 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/21880 Here is the relevant piece of the build log for the reference
|
Reland llvm#120514 after 2f6e3df fixed iteration order issue and libstdc++/libc++ differences. --- Both options instruct the linker to optimize section layout with the following goals: * `--bp-compression-sort=[data|function|both]`: Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size. * `--bp-startup-sort=function --irpgo-profile=<file>`: Utilize a temporal profile file to reduce page faults during program startup. The linker determines the section order by considering three groups: * Function sections ordered according to the temporal profile (`--irpgo-profile=`), prioritizing early-accessed and frequently accessed functions. * Function sections. Sections containing similar functions are placed together, maximizing compression opportunities. * Data sections. Similar data sections are placed together. Within each group, the sections are ordered using the Balanced Partitioning algorithm. The linker constructs a bipartite graph with two sets of vertices: sections and utility vertices. * For profile-guided function sections: + The number of utility vertices is determined by the symbol order within the profile file. + If `--bp-compression-sort-startup-functions` is specified, extra utility vertices are allocated to prioritize nearby function similarity. * For sections ordered for compression: Utility vertices are determined by analyzing k-mers of the section content and relocations. The call graph profile is disabled during this optimization. When `--symbol-ordering-file=` is specified, sections described in that file are placed earlier. Co-authored-by: Pengying Xu <[email protected]>
Reland #120514 after 2f6e3df fixed
iteration order issue and libstdc++/libc++ differences.
Both options instruct the linker to optimize section layout with the following goals:
--bp-compression-sort=[data|function|both]
: Improve Lempel-Ziv compression by grouping similar sections together, resulting in a smaller compressed app size.--bp-startup-sort=function --irpgo-profile=<file>
: Utilize a temporal profile file to reduce page faults during program startup.The linker determines the section order by considering three groups:
--irpgo-profile=
), prioritizing early-accessed and frequently accessed functions.Within each group, the sections are ordered using the Balanced Partitioning algorithm.
The linker constructs a bipartite graph with two sets of vertices: sections and utility vertices.
--bp-compression-sort-startup-functions
is specified, extra utility vertices are allocated to prioritize nearby function similarity.The call graph profile is disabled during this optimization.
When
--symbol-ordering-file=
is specified, sections described in that file are placed earlier.