-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lld]Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF #117514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lld-wasm @llvm/pr-subscribers-lld-elf Author: Max (Colibrow) ChangesExtend balanced partitioning implementation to support ELF binaries, enabling the same startup time and compressed size optimizations previously available for MachO. This allows ELF binaries to benefit from profile-guided function ordering and compression-based section ordering. Add the lld flags Thanks to the @ellishg, @thevinster, and their team's work. Patch is 60.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117514.diff 15 Files Affected:
diff --git a/lld/Common/CMakeLists.txt b/lld/Common/CMakeLists.txt
index 4f503d04f7844f..bd5a40af41c1bc 100644
--- a/lld/Common/CMakeLists.txt
+++ b/lld/Common/CMakeLists.txt
@@ -31,6 +31,7 @@ add_lld_library(lldCommon
Filesystem.cpp
Memory.cpp
Reproduce.cpp
+ SectionOrderer.cpp
Strings.cpp
TargetOptionsCommandFlags.cpp
Timer.cpp
diff --git a/lld/Common/SectionOrderer.cpp b/lld/Common/SectionOrderer.cpp
new file mode 100644
index 00000000000000..64c78030f3427f
--- /dev/null
+++ b/lld/Common/SectionOrderer.cpp
@@ -0,0 +1,383 @@
+//===- SectionOrderer.cpp---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "lld/Common/SectionOrderer.h"
+#include "lld/Common/ErrorHandler.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ProfileData/InstrProfReader.h"
+#include "llvm/Support/BalancedPartitioning.h"
+#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/VirtualFileSystem.h"
+#include "llvm/Support/xxhash.h"
+
+#define DEBUG_TYPE "bp-section-orderer"
+using namespace llvm;
+using UtilityNodes = SmallVector<BPFunctionNode::UtilityNodeT>;
+
+namespace lld {
+
+static SmallVector<std::pair<unsigned, UtilityNodes>> getUnsForCompression(
+ ArrayRef<const BPSectionBase *> sections,
+ const DenseMap<const BPSectionBase *, uint64_t> §ionToIdx,
+ ArrayRef<unsigned> sectionIdxs,
+ DenseMap<unsigned, SmallVector<unsigned>> *duplicateSectionIdxs,
+ BPFunctionNode::UtilityNodeT &maxUN) {
+ TimeTraceScope timeScope("Build nodes for compression");
+
+ SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> sectionHashes;
+ sectionHashes.reserve(sectionIdxs.size());
+ SmallVector<uint64_t> hashes;
+
+ for (unsigned sectionIdx : sectionIdxs) {
+ const auto *isec = sections[sectionIdx];
+ isec->getSectionHash(hashes, sectionToIdx);
+ sectionHashes.emplace_back(sectionIdx, std::move(hashes));
+ hashes.clear();
+ }
+
+ DenseMap<uint64_t, unsigned> hashFrequency;
+ for (auto &[sectionIdx, hashes] : sectionHashes)
+ for (auto hash : hashes)
+ ++hashFrequency[hash];
+
+ if (duplicateSectionIdxs) {
+ // Merge section that are nearly identical
+ SmallVector<std::pair<unsigned, SmallVector<uint64_t>>> newSectionHashes;
+ DenseMap<uint64_t, unsigned> wholeHashToSectionIdx;
+ for (auto &[sectionIdx, hashes] : sectionHashes) {
+ uint64_t wholeHash = 0;
+ for (auto hash : hashes)
+ if (hashFrequency[hash] > 5)
+ wholeHash ^= hash;
+ auto [it, wasInserted] =
+ wholeHashToSectionIdx.insert(std::make_pair(wholeHash, sectionIdx));
+ if (wasInserted) {
+ newSectionHashes.emplace_back(sectionIdx, hashes);
+ } else {
+ (*duplicateSectionIdxs)[it->getSecond()].push_back(sectionIdx);
+ }
+ }
+ sectionHashes = newSectionHashes;
+
+ // Recompute hash frequencies
+ hashFrequency.clear();
+ for (auto &[sectionIdx, hashes] : sectionHashes)
+ for (auto hash : hashes)
+ ++hashFrequency[hash];
+ }
+
+ // Filter rare and common hashes and assign each a unique utility node that
+ // doesn't conflict with the trace utility nodes
+ DenseMap<uint64_t, BPFunctionNode::UtilityNodeT> hashToUN;
+ for (auto &[hash, frequency] : hashFrequency) {
+ if (frequency <= 1 || frequency * 2 > sectionHashes.size())
+ continue;
+ hashToUN[hash] = ++maxUN;
+ }
+
+ SmallVector<std::pair<unsigned, UtilityNodes>> sectionUns;
+ for (auto &[sectionIdx, hashes] : sectionHashes) {
+ UtilityNodes uns;
+ for (auto &hash : hashes) {
+ auto it = hashToUN.find(hash);
+ if (it != hashToUN.end())
+ uns.push_back(it->second);
+ }
+ sectionUns.emplace_back(sectionIdx, uns);
+ }
+ return sectionUns;
+}
+
+llvm::DenseMap<const BPSectionBase *, size_t>
+SectionOrderer::reorderSectionsByBalancedPartitioning(
+ size_t &highestAvailablePriority, llvm::StringRef profilePath,
+ bool forFunctionCompression, bool forDataCompression,
+ bool compressionSortStartupFunctions, bool verbose,
+ SmallVector<BPSectionBase *> inputSections) {
+ TimeTraceScope timeScope("Balanced Partitioning");
+ SmallVector<const BPSectionBase *> sections;
+ DenseMap<const BPSectionBase *, uint64_t> sectionToIdx;
+ StringMap<DenseSet<unsigned>> symbolToSectionIdxs;
+
+ // Process input sections
+ for (const auto *isec : inputSections) {
+ if (!isec->hasValidData())
+ continue;
+
+ unsigned sectionIdx = sections.size();
+ sectionToIdx.try_emplace(isec, sectionIdx);
+ sections.push_back(isec);
+
+ for (auto *sym : isec->getSymbols()) {
+ if (auto *d = sym->asDefinedSymbol())
+ symbolToSectionIdxs[d->getName()].insert(sectionIdx);
+ }
+ }
+ StringMap<DenseSet<unsigned>> rootSymbolToSectionIdxs;
+ for (auto &entry : symbolToSectionIdxs) {
+ StringRef name = entry.getKey();
+ auto §ionIdxs = entry.getValue();
+ name = BPSectionBase::getRootSymbol(name);
+ rootSymbolToSectionIdxs[name].insert(sectionIdxs.begin(),
+ sectionIdxs.end());
+ // Linkage names can be prefixed with "_" or "l_" on Mach-O. See
+ // Mangler::getNameWithPrefix() for details.
+ if (name.consume_front("_") || name.consume_front("l_"))
+ rootSymbolToSectionIdxs[name].insert(sectionIdxs.begin(),
+ sectionIdxs.end());
+ }
+
+ BPFunctionNode::UtilityNodeT maxUN = 0;
+ DenseMap<unsigned, UtilityNodes> startupSectionIdxUNs;
+ // Used to define the initial order for startup functions.
+ DenseMap<unsigned, size_t> sectionIdxToTimestamp;
+ std::unique_ptr<InstrProfReader> reader;
+ if (!profilePath.empty()) {
+ auto fs = vfs::getRealFileSystem();
+ auto readerOrErr = InstrProfReader::create(profilePath, *fs);
+ lld::checkError(readerOrErr.takeError());
+
+ reader = std::move(readerOrErr.get());
+ for (auto &entry : *reader) {
+ // Read all entries
+ (void)entry;
+ }
+ auto &traces = reader->getTemporalProfTraces();
+
+ DenseMap<unsigned, BPFunctionNode::UtilityNodeT> sectionIdxToFirstUN;
+ for (size_t traceIdx = 0; traceIdx < traces.size(); traceIdx++) {
+ uint64_t currentSize = 0, cutoffSize = 1;
+ size_t cutoffTimestamp = 1;
+ auto &trace = traces[traceIdx].FunctionNameRefs;
+ for (size_t timestamp = 0; timestamp < trace.size(); timestamp++) {
+ auto [Filename, ParsedFuncName] = getParsedIRPGOName(
+ reader->getSymtab().getFuncOrVarName(trace[timestamp]));
+ ParsedFuncName = BPSectionBase::getRootSymbol(ParsedFuncName);
+
+ auto sectionIdxsIt = rootSymbolToSectionIdxs.find(ParsedFuncName);
+ if (sectionIdxsIt == rootSymbolToSectionIdxs.end())
+ continue;
+ auto §ionIdxs = sectionIdxsIt->getValue();
+ // If the same symbol is found in multiple sections, they might be
+ // identical, so we arbitrarily use the size from the first section.
+ currentSize += sections[*sectionIdxs.begin()]->getSize();
+
+ // Since BalancedPartitioning is sensitive to the initial order, we need
+ // to explicitly define it to be ordered by earliest timestamp.
+ for (unsigned sectionIdx : sectionIdxs) {
+ auto [it, wasInserted] =
+ sectionIdxToTimestamp.try_emplace(sectionIdx, timestamp);
+ if (!wasInserted)
+ it->getSecond() = std::min<size_t>(it->getSecond(), timestamp);
+ }
+
+ if (timestamp >= cutoffTimestamp || currentSize >= cutoffSize) {
+ ++maxUN;
+ cutoffSize = 2 * currentSize;
+ cutoffTimestamp = 2 * cutoffTimestamp;
+ }
+ for (unsigned sectionIdx : sectionIdxs)
+ sectionIdxToFirstUN.try_emplace(sectionIdx, maxUN);
+ }
+ for (auto &[sectionIdx, firstUN] : sectionIdxToFirstUN)
+ for (auto un = firstUN; un <= maxUN; ++un)
+ startupSectionIdxUNs[sectionIdx].push_back(un);
+ ++maxUN;
+ sectionIdxToFirstUN.clear();
+ }
+ }
+
+ SmallVector<unsigned> sectionIdxsForFunctionCompression,
+ sectionIdxsForDataCompression;
+ for (unsigned sectionIdx = 0; sectionIdx < sections.size(); sectionIdx++) {
+ if (startupSectionIdxUNs.count(sectionIdx))
+ continue;
+ const auto *isec = sections[sectionIdx];
+ if (isec->isCodeSection()) {
+ if (forFunctionCompression)
+ sectionIdxsForFunctionCompression.push_back(sectionIdx);
+ } else {
+ if (forDataCompression)
+ sectionIdxsForDataCompression.push_back(sectionIdx);
+ }
+ }
+
+ if (compressionSortStartupFunctions) {
+ SmallVector<unsigned> startupIdxs;
+ for (auto &[sectionIdx, uns] : startupSectionIdxUNs)
+ startupIdxs.push_back(sectionIdx);
+ auto unsForStartupFunctionCompression =
+ getUnsForCompression(sections, sectionToIdx, startupIdxs,
+ /*duplicateSectionIdxs=*/nullptr, maxUN);
+ for (auto &[sectionIdx, compressionUns] :
+ unsForStartupFunctionCompression) {
+ auto &uns = startupSectionIdxUNs[sectionIdx];
+ uns.append(compressionUns);
+ llvm::sort(uns);
+ uns.erase(std::unique(uns.begin(), uns.end()), uns.end());
+ }
+ }
+
+ // Map a section index (order directly) to a list of duplicate section indices
+ // (not ordered directly).
+ DenseMap<unsigned, SmallVector<unsigned>> duplicateSectionIdxs;
+ auto unsForFunctionCompression = getUnsForCompression(
+ sections, sectionToIdx, sectionIdxsForFunctionCompression,
+ &duplicateSectionIdxs, maxUN);
+ auto unsForDataCompression = getUnsForCompression(
+ sections, sectionToIdx, sectionIdxsForDataCompression,
+ &duplicateSectionIdxs, maxUN);
+
+ std::vector<BPFunctionNode> nodesForStartup, nodesForFunctionCompression,
+ nodesForDataCompression;
+ for (auto &[sectionIdx, uns] : startupSectionIdxUNs)
+ nodesForStartup.emplace_back(sectionIdx, uns);
+ for (auto &[sectionIdx, uns] : unsForFunctionCompression)
+ nodesForFunctionCompression.emplace_back(sectionIdx, uns);
+ for (auto &[sectionIdx, uns] : unsForDataCompression)
+ nodesForDataCompression.emplace_back(sectionIdx, uns);
+
+ // Use the first timestamp to define the initial order for startup nodes.
+ llvm::sort(nodesForStartup, [§ionIdxToTimestamp](auto &L, auto &R) {
+ return std::make_pair(sectionIdxToTimestamp[L.Id], L.Id) <
+ std::make_pair(sectionIdxToTimestamp[R.Id], R.Id);
+ });
+ // Sort compression nodes by their Id (which is the section index) because the
+ // input linker order tends to be not bad.
+ llvm::sort(nodesForFunctionCompression,
+ [](auto &L, auto &R) { return L.Id < R.Id; });
+ llvm::sort(nodesForDataCompression,
+ [](auto &L, auto &R) { return L.Id < R.Id; });
+
+ {
+ TimeTraceScope timeScope("Balanced Partitioning");
+ BalancedPartitioningConfig config;
+ BalancedPartitioning bp(config);
+ bp.run(nodesForStartup);
+ bp.run(nodesForFunctionCompression);
+ bp.run(nodesForDataCompression);
+ }
+
+ unsigned numStartupSections = 0;
+ unsigned numCodeCompressionSections = 0;
+ unsigned numDuplicateCodeSections = 0;
+ unsigned numDataCompressionSections = 0;
+ unsigned numDuplicateDataSections = 0;
+ SetVector<const BPSectionBase *> orderedSections;
+ // Order startup functions,
+ for (auto &node : nodesForStartup) {
+ const auto *isec = sections[node.Id];
+ if (orderedSections.insert(isec))
+ ++numStartupSections;
+ }
+ // then functions for compression,
+ for (auto &node : nodesForFunctionCompression) {
+ const auto *isec = sections[node.Id];
+ if (orderedSections.insert(isec))
+ ++numCodeCompressionSections;
+
+ auto It = duplicateSectionIdxs.find(node.Id);
+ if (It == duplicateSectionIdxs.end())
+ continue;
+ for (auto dupSecIdx : It->getSecond()) {
+ const auto *dupIsec = sections[dupSecIdx];
+ if (orderedSections.insert(dupIsec))
+ ++numDuplicateCodeSections;
+ }
+ }
+ // then data for compression.
+ for (auto &node : nodesForDataCompression) {
+ const auto *isec = sections[node.Id];
+ if (orderedSections.insert(isec))
+ ++numDataCompressionSections;
+ auto It = duplicateSectionIdxs.find(node.Id);
+ if (It == duplicateSectionIdxs.end())
+ continue;
+ for (auto dupSecIdx : It->getSecond()) {
+ const auto *dupIsec = sections[dupSecIdx];
+ if (orderedSections.insert(dupIsec))
+ ++numDuplicateDataSections;
+ }
+ }
+
+ if (verbose) {
+ unsigned numTotalOrderedSections =
+ numStartupSections + numCodeCompressionSections +
+ numDuplicateCodeSections + numDataCompressionSections +
+ numDuplicateDataSections;
+ dbgs()
+ << "Ordered " << numTotalOrderedSections
+ << " sections using balanced partitioning:\n Functions for startup: "
+ << numStartupSections
+ << "\n Functions for compression: " << numCodeCompressionSections
+ << "\n Duplicate functions: " << numDuplicateCodeSections
+ << "\n Data for compression: " << numDataCompressionSections
+ << "\n Duplicate data: " << numDuplicateDataSections << "\n";
+
+ if (!profilePath.empty()) {
+ // Evaluate this function order for startup
+ StringMap<std::pair<uint64_t, uint64_t>> symbolToPageNumbers;
+ const uint64_t pageSize = (1 << 14);
+ uint64_t currentAddress = 0;
+ for (const auto *isec : orderedSections) {
+ for (auto *sym : isec->getSymbols()) {
+ if (auto *d = sym->asDefinedSymbol()) {
+ uint64_t startAddress = currentAddress + d->getValue();
+ uint64_t endAddress = startAddress + d->getSize();
+ uint64_t firstPage = startAddress / pageSize;
+ // I think the kernel might pull in a few pages when one it touched,
+ // so it might be more accurate to force lastPage to be aligned by
+ // 4?
+ uint64_t lastPage = endAddress / pageSize;
+ StringRef rootSymbol = d->getName();
+ rootSymbol = BPSectionBase::getRootSymbol(rootSymbol);
+ symbolToPageNumbers.try_emplace(rootSymbol, firstPage, lastPage);
+ if (rootSymbol.consume_front("_") || rootSymbol.consume_front("l_"))
+ symbolToPageNumbers.try_emplace(rootSymbol, firstPage, lastPage);
+ }
+ }
+ currentAddress += isec->getSize();
+ }
+
+ // The area under the curve F where F(t) is the total number of page
+ // faults at step t.
+ unsigned area = 0;
+ for (auto &trace : reader->getTemporalProfTraces()) {
+ SmallSet<uint64_t, 0> touchedPages;
+ for (unsigned step = 0; step < trace.FunctionNameRefs.size(); step++) {
+ auto traceId = trace.FunctionNameRefs[step];
+ auto [Filename, ParsedFuncName] =
+ getParsedIRPGOName(reader->getSymtab().getFuncOrVarName(traceId));
+ ParsedFuncName = BPSectionBase::getRootSymbol(ParsedFuncName);
+ auto it = symbolToPageNumbers.find(ParsedFuncName);
+ if (it != symbolToPageNumbers.end()) {
+ auto &[firstPage, lastPage] = it->getValue();
+ for (uint64_t i = firstPage; i <= lastPage; i++)
+ touchedPages.insert(i);
+ }
+ area += touchedPages.size();
+ }
+ }
+ dbgs() << "Total area under the page fault curve: " << (float)area
+ << "\n";
+ }
+ }
+
+ DenseMap<const BPSectionBase *, size_t> sectionPriorities;
+ for (const auto *isec : orderedSections)
+ sectionPriorities[isec] = --highestAvailablePriority;
+ return sectionPriorities;
+}
+
+} // namespace lld
diff --git a/lld/ELF/BPSectionOrderer.cpp b/lld/ELF/BPSectionOrderer.cpp
new file mode 100644
index 00000000000000..ac3024a69e681a
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.cpp
@@ -0,0 +1,50 @@
+//===- BPSectionOrderer.cpp--------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "BPSectionOrderer.h"
+#include "Config.h"
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "lld/Common/CommonLinkerContext.h"
+#include "lld/Common/SectionOrderer.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Support/BalancedPartitioning.h"
+#include "llvm/Support/TimeProfiler.h"
+
+using namespace llvm;
+using namespace lld::elf;
+
+llvm::DenseMap<const lld::elf::InputSectionBase *, int>
+lld::elf::runBalancedPartitioning(Ctx &ctx, llvm::StringRef profilePath,
+ bool forFunctionCompression,
+ bool forDataCompression,
+ bool compressionSortStartupFunctions,
+ bool verbose) {
+ size_t highestAvailablePriority = std::numeric_limits<int>::max();
+ SmallVector<lld::BPSectionBase *> sections;
+ for (auto *isec : ctx.inputSections) {
+ if (!isec || isec->content().empty())
+ continue;
+ sections.push_back(new ELFSection(isec));
+ }
+
+ auto reorderedSections =
+ lld::SectionOrderer::reorderSectionsByBalancedPartitioning(
+ highestAvailablePriority, profilePath, forFunctionCompression,
+ forDataCompression, compressionSortStartupFunctions, verbose,
+ sections);
+
+ DenseMap<const InputSectionBase *, int> result;
+ for (const auto &[BPSectionBase, priority] : reorderedSections) {
+ if (auto *elfSection = dyn_cast<ELFSection>(BPSectionBase)) {
+ result[elfSection->getSection()] = static_cast<int>(priority);
+ delete elfSection;
+ }
+ }
+ return result;
+}
diff --git a/lld/ELF/BPSectionOrderer.h b/lld/ELF/BPSectionOrderer.h
new file mode 100644
index 00000000000000..c24f8d1277c108
--- /dev/null
+++ b/lld/ELF/BPSectionOrderer.h
@@ -0,0 +1,140 @@
+//===- BPSectionOrderer.h ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// This file uses Balanced Partitioning to order sections to improve startup
+/// time and compressed size.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLD_ELF_BPSECTION_ORDERER_H
+#define LLD_ELF_BPSECTION_ORDERER_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "lld/Common/SectionOrderer.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/BinaryFormat/ELF.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/xxhash.h"
+
+namespace lld::elf {
+
+class InputSection;
+
+class ELFSymbol : public BPSymbol {
+ const Symbol *sym;
+
+public:
+ explicit ELFSymbol(const Symbol *s) : sym(s) {}
+
+ llvm::StringRef getName() const override { return sym->getName(); }
+
+ BPSymbol *asDefinedSymbol() override {
+ if (auto *d = llvm::dyn_cast<Defined>(sym))
+ return this;
+ return nullptr;
+ }
+
+ uint64_t getValue() const override {
+ if (auto *d = llvm::dyn_cast<Defined>(sym))
+ return d->value;
+ return 0;
+ }
+
+ uint64_t getSize() const override {
+ if (auto *d = llvm::dyn_cast<Defined>(sym))
+ return d->size;
+ return 0;
+ }
+
+ const Symbol *getSymbol() const { return sym; }
+};
+
+class ELFSection : public BPSectionBase {
+ const InputSectionBase *isec;
+ mutable std::vector<std::unique_ptr<ELFSymbol>> symbolCache;
+
+public:
+ explicit ELFSection(const InputSectionB...
[truncated]
|
Thanks for the PR! I'll try to take a look, but I might not get to it until next week because Thursday is Thanksgiving. I'll also assign a few reviewers that might be interested in taking a look. |
7c17fff
to
7d04f92
Compare
Just a quick drive by comment on the ELF side, I've not had a chance to go through the code in detail. I think the .subsections_via_symbols is MachO specific. Not quite sure what it does in ELF. I think the equivalent in clang at least is -ffunction-sections and for data -fdata-sections. For assembler the user has to be careful to make a section per function. If I'm right it could be worth updating comments and the assembler test. |
@smithp35 You're correct. In the standard ELF format, we typically use -ffunction-sections and -fdata-sections flags to achieve balanced section partitioning. The original comment was indeed copied from the Mach-O specific implementation. I've updated the code to reflect this, and if you have any additional suggestions, I'd be happy to hear them. Appreciate your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
c06ac10
to
9e453fe
Compare
@@ -0,0 +1,104 @@ | |||
# REQUIRES: aarch64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we didn't have to duplicate this test in ELF and Mach-O, but I can't think of a better solution right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplication like this fine as sharing the test could make updating either port more difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits on top of everything else.
858c648
to
7355aae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to change the flag spelling in the PR summary
lld/ELF/BPSectionOrderer.h
Outdated
const llvm::DenseMap<const BPSectionBase *, uint64_t> | ||
§ionToIdx) const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Colibrow What do you think of this?
@ellishg I deleted the function that passed the map everywhere, and the file lld/ELF/BPSectionOrderer.h has been updated. Does this refer to the previous implementation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking better! After we resolve the section index issue in getRelocHash()
I want to test this on my codebase to make sure we don't have a regression.
lld/ELF/BPSectionOrderer.h
Outdated
// TODO: Calculate relocation hashes. | ||
// Since in ELF, relocations are complex, but the effect without them are | ||
// good enough, we just use 0 as their hash. | ||
for (const auto &r : isec->relocations) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, it seems good. Did you see an uncompressed or compressed size win with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I just saw this now. I'll give a test and post the results asap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ellis,
I tested this on my project and realized the code I shared earlier was incorrect. When iterating over the window + relocHash
, the loop needs to end at the relocation code size, e.g., 2
for functions and 3
for data in Mach-O.
I ran the test on my project, which builds an AArch64 ELF with the following relocation distribution:
Relocation Type | Count |
---|---|
R_AARCH64_ABS64 |
387 |
R_AARCH64_GLOB_DAT |
34 |
R_AARCH64_JUMP_SLOT |
711 |
R_AARCH64_RELATIVE |
9396 |
For testing, I hardcoded r.length
as 3
, and here are the results:
Binary Name | Size (bytes) | Gzipped Size (bytes) |
---|---|---|
libsample-aarch64.so |
3,181,560 | 1,512,245 |
libsample-aarch64-noreloc-compressed-function.so |
3,181,560 | 1,487,043 |
libsample-aarch64-reloc-compressed-function.so |
3,181,560 | 1,487,032 |
Since different relType
values require various relocation forms and the size optimization is minimal, I decided to revert the relocation-hash commit.
Do you have any ideas on how to proceed?
I built this PR locally and I needed these cmake changes for it to work and some of the includes are not necessary diff --git a/lld/Common/BPSectionOrdererBase.cpp b/lld/Common/BPSectionOrdererBase.cpp
index f7b460d51a16..51d56606ec92 100644
--- a/lld/Common/BPSectionOrdererBase.cpp
+++ b/lld/Common/BPSectionOrdererBase.cpp
@@ -19,7 +19,6 @@
#include "llvm/Support/BalancedPartitioning.h"
#include "llvm/Support/TimeProfiler.h"
#include "llvm/Support/VirtualFileSystem.h"
-#include "llvm/Support/xxhash.h"
#define DEBUG_TYPE "bp-section-orderer"
using namespace llvm;
diff --git a/lld/Common/CMakeLists.txt b/lld/Common/CMakeLists.txt
index 2ab5093bf688..43e91b85821d 100644
--- a/lld/Common/CMakeLists.txt
+++ b/lld/Common/CMakeLists.txt
@@ -48,6 +48,7 @@ add_lld_library(lldCommon
Demangle
MC
Option
+ ProfileData
Support
Target
TargetParser
diff --git a/lld/ELF/BPSectionOrderer.cpp b/lld/ELF/BPSectionOrderer.cpp
index 86605b1a7626..1fc6036e5dd9 100644
--- a/lld/ELF/BPSectionOrderer.cpp
+++ b/lld/ELF/BPSectionOrderer.cpp
@@ -11,10 +11,7 @@
#include "InputFiles.h"
#include "InputSection.h"
#include "lld/Common/BPSectionOrdererBase.h"
-#include "lld/Common/CommonLinkerContext.h"
#include "llvm/ADT/DenseMap.h"
-#include "llvm/Support/BalancedPartitioning.h"
-#include "llvm/Support/TimeProfiler.h"
#include "SymbolTable.h"
#include "Symbols.h"
diff --git a/lld/ELF/BPSectionOrderer.h b/lld/ELF/BPSectionOrderer.h
index 7f32c26bdd3e..502b7cfbdd2f 100644
--- a/lld/ELF/BPSectionOrderer.h
+++ b/lld/ELF/BPSectionOrderer.h
@@ -21,7 +21,6 @@
#include "lld/Common/BPSectionOrdererBase.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Support/ErrorHandling.h"
diff --git a/lld/ELF/CMakeLists.txt b/lld/ELF/CMakeLists.txt
index 298443cd6ea4..ec3f6382282b 100644
--- a/lld/ELF/CMakeLists.txt
+++ b/lld/ELF/CMakeLists.txt
@@ -73,6 +73,7 @@ add_lld_library(lldELF
Object
Option
Passes
+ ProfileData
Support
TargetParser
TransformUtils
diff --git a/lld/MachO/BPSectionOrderer.cpp b/lld/MachO/BPSectionOrderer.cpp
index 5e9fd5248b2e..0ffbf16007fd 100644
--- a/lld/MachO/BPSectionOrderer.cpp
+++ b/lld/MachO/BPSectionOrderer.cpp
@@ -8,11 +8,7 @@
#include "BPSectionOrderer.h"
#include "InputSection.h"
-#include "lld/Common/ErrorHandler.h"
#include "llvm/ADT/DenseMap.h"
-#include "llvm/ADT/StringMap.h"
-#include "llvm/Support/BalancedPartitioning.h"
-#include "llvm/Support/TimeProfiler.h"
#define DEBUG_TYPE "bp-section-orderer"
diff --git a/lld/MachO/BPSectionOrderer.h b/lld/MachO/BPSectionOrderer.h
index fa71f739ecb3..29b20c781c6b 100644
--- a/lld/MachO/BPSectionOrderer.h
+++ b/lld/MachO/BPSectionOrderer.h
@@ -20,7 +20,6 @@
#include "lld/Common/BPSectionOrdererBase.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/StringRef.h"
-#include "llvm/ADT/TinyPtrVector.h"
namespace lld::macho {
diff --git a/lld/include/lld/Common/BPSectionOrdererBase.h b/lld/include/lld/Common/BPSectionOrdererBase.h
index 78c51cd9786b..a22bfcafd79c 100644
--- a/lld/include/lld/Common/BPSectionOrdererBase.h
+++ b/lld/include/lld/Common/BPSectionOrdererBase.h
@@ -14,7 +14,6 @@
#ifndef LLD_COMMON_BP_SECTION_ORDERER_BASE_H
#define LLD_COMMON_BP_SECTION_ORDERER_BASE_H
-#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for ensuring this works for both targets! I tested that MachO/bp-section-orderer-stress.s
produced the same order before and after that change, so it gives me some confidence that this won't regress anything. And I'm curious what size results you get for ELF, so please publish that when it is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need time to look at the ELF port changes.
If you drop ELF changes from this PR, and just move MachO code to Common to be reused, this PR will be good to go and I'll approve.
You need a dependent ELF patch.
@@ -0,0 +1,104 @@ | |||
# REQUIRES: aarch64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplication like this fine as sharing the test could make updating either port more difficult.
82d28ac
to
fb24028
Compare
@MaskRay I dropped the ELF parts and will post the ELF patch as a new PR once this PR is merged. |
@MaskRay Hi, Can you merge this PR? I don't have merge button for this repository. |
@Colibrow Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
Could this have been causing this issue #122655? I've been hoping this would be discovered and fixed, but it's been broken now for almost a month. |
Spent more time on BP. I believe that the virtual classes as done by this PR Perhaps we should treat Symbol and InputSection as
Then we can avoid allocating a large section and We could also leverage Curiously Recurring Template Pattern if we want to retain type information of symbols and sections.
|
@Colibrow I am taking a stab on the CRTP refactoring. |
I've been using the template method to implement the project, but I'm stuck. In the first version, I placed the base template in |
PR #117514 refactored BPSectionOrderer to be used by the ELF port but introduced some inefficiency: * BPSectionBase/BPSymbol are wrappers around a single pointer. The numbers of sections and symbols could be huge, and the extra allocations are memory inefficient. * Reconstructing the returned DenseMap (since BPSectionBase != InputSectin) is wasteful. This patch refactors BPSectionOrderer with Curiously Recurring Template Pattern and eliminates the inefficiency. In addition, `symbolToSectionIdxs` is removed and `rootSymbolToSectionIdxs` building is moved to lld/MachO: while getting sections for symbols is cheap in Mach-O, it is awkward and inefficient in the ELF port. While here, add a file-level comment and replace some `StringMap<*>` (which copies strings) with `DenseMap<CachedHashStringRef, *>`. Pull Request: #124482
Add lld/Common/BPSectionOrdererBase from MachO for reuse in ELF