-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[lld][COFF] Remove duplicate strtab entries #141197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
String table size is too big for large binary when symbol table is enabled. Some strings in strtab is same so it can be reused.
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-lld-coff Author: Haohai Wen (HaohaiWen) ChangesString table size is too big for large binary when symbol table is Full diff: https://github.com/llvm/llvm-project/pull/141197.diff 1 Files Affected:
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index db6133e20a037..6ec83a5f77e5a 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -285,6 +285,7 @@ class Writer {
std::unique_ptr<FileOutputBuffer> &buffer;
std::map<PartialSectionKey, PartialSection *> partialSections;
std::vector<char> strtab;
+ StringMap<size_t> strtabMap;
std::vector<llvm::object::coff_symbol16> outputSymtab;
std::vector<ECCodeMapEntry> codeMap;
IdataContents idata;
@@ -1439,10 +1440,13 @@ void Writer::assignOutputSectionIndices() {
size_t Writer::addEntryToStringTable(StringRef str) {
assert(str.size() > COFF::NameSize);
- size_t offsetOfEntry = strtab.size() + 4; // +4 for the size field
- strtab.insert(strtab.end(), str.begin(), str.end());
- strtab.push_back('\0');
- return offsetOfEntry;
+ size_t newOffsetOfEntry = strtab.size() + 4; // +4 for the size field
+ auto res = strtabMap.try_emplace(str, newOffsetOfEntry);
+ if (res.second) {
+ strtab.insert(strtab.end(), str.begin(), str.end());
+ strtab.push_back('\0');
+ }
+ return res.first->getValue();
}
std::optional<coff_symbol16> Writer::createSymbol(Defined *def) {
|
@llvm/pr-subscribers-lld Author: Haohai Wen (HaohaiWen) ChangesString table size is too big for large binary when symbol table is Full diff: https://github.com/llvm/llvm-project/pull/141197.diff 1 Files Affected:
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index db6133e20a037..6ec83a5f77e5a 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -285,6 +285,7 @@ class Writer {
std::unique_ptr<FileOutputBuffer> &buffer;
std::map<PartialSectionKey, PartialSection *> partialSections;
std::vector<char> strtab;
+ StringMap<size_t> strtabMap;
std::vector<llvm::object::coff_symbol16> outputSymtab;
std::vector<ECCodeMapEntry> codeMap;
IdataContents idata;
@@ -1439,10 +1440,13 @@ void Writer::assignOutputSectionIndices() {
size_t Writer::addEntryToStringTable(StringRef str) {
assert(str.size() > COFF::NameSize);
- size_t offsetOfEntry = strtab.size() + 4; // +4 for the size field
- strtab.insert(strtab.end(), str.begin(), str.end());
- strtab.push_back('\0');
- return offsetOfEntry;
+ size_t newOffsetOfEntry = strtab.size() + 4; // +4 for the size field
+ auto res = strtabMap.try_emplace(str, newOffsetOfEntry);
+ if (res.second) {
+ strtab.insert(strtab.end(), str.begin(), str.end());
+ strtab.push_back('\0');
+ }
+ return res.first->getValue();
}
std::optional<coff_symbol16> Writer::createSymbol(Defined *def) {
|
Test needed. In ELF we can use |
Instead of just using This was attempted before in 9ffeaaa, but I had to revert it in 4d2eda2; see the revert commit for an explanation of the issues I ran into, including a suggestion on what one could do to reapply it. Anyway, using a Do you have any numbers on how much binary size you save with this? It could be interesting to compare with the |
Thanks for reminding. |
How about using two StringTableBuilder, one for long section name and another for others? |
I don't plan on reviving it myself right now, but if you're working in this area, I would appreciate if you'd try to reapply that patch (revert the revert) and see if it can be made to work again on current git main. Then with that in place, it would be interesting to see if you see any further significant size reductions on your blender testcase. If it doesn't really reduce the binary size any much further (perhaps cases where tail merging makes any difference are rare?), then perhaps it's best to just go with the much simpler approach of your current patch here. |
Blender strtab size for three methods.
stringmap reduced -40.82% comparing to unoptimized strtab. |
Thanks for looking into these numbers! So using the StringTableBuilder does make a notable extra benefit on top of this, but using it would require a bit more extra effort than what I tried before. With that in mind, I think this PR in itself looks like a straightforward improvement. We could consider switching to StringTableBuilder, if someone is willing to make the effort to handle the section name strings separately in one way or another - but it is a smaller improvement than this, and probably comes at a bigger runtime cost.
Not sure if I think a test strictly is needed here; whether the strings are deduplicated or not is mostly a nonfunctional difference. (The previous similar change in 9ffeaaa didn't have any test.) I'm not aware of any current |
Let's first support --string-table for COFF. |
Test will be added after landing of this PR. |
strtab size after this PR: 342,999,676 |
Any comments? |
I think this looks reasonable, thanks! I'd prefer to have an ack from @MaskRay for the additions to the shared MC level StringTableBuilder class though, but if he doesn't follow up in a few days I guess I can approve this as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MC looks good, modulo some comments. LGTM once @mstorsjo approves.
llvm/lib/MC/StringTableBuilder.cpp
Outdated
std::vector<StringPair *> Strings; | ||
Strings.reserve(StringIndexMap.size()); | ||
for (StringPair &P : StringIndexMap) | ||
Strings.push_back(&P); | ||
|
||
multikeySort(Strings, 0); | ||
auto getStringPriority = [this](const CachedHashStringRef Str) -> uint8_t { | ||
if (StringPriorityMap.contains(Str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringPriorityMap.lookup(Str)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lambda can be moved inside if (StringPriorityMap.size()) {
/// table is finalized. Priority is only useful with reordering. Strings with | ||
/// same priority will be put together. Strings with higher priority are | ||
/// placed closer to the begin of string table. | ||
LLVM_ABI size_t add(CachedHashStringRef S, uint8_t Priority = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should mention what if a string is added with different priorities. (std::max below)
llvm/lib/MC/StringTableBuilder.cpp
Outdated
}); | ||
// Make sure ArrayRef is valid. Although std::sort implementaion is | ||
// normally in-place , it is not guaranteed by standard. | ||
StringsRef = Strings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unnecessary. std::sort will sort the elements within the range. If StringRef
applies to the whole range, it will not be invalidated even if std::sort requires external storage (internally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the lld side.
ping @MaskRay |
llvm/lib/ObjCopy/COFF/COFFWriter.cpp
Outdated
@@ -121,7 +121,7 @@ void COFFWriter::layoutSections() { | |||
Expected<size_t> COFFWriter::finalizeStringTable() { | |||
for (const auto &S : Obj.getSections()) | |||
if (S.Name.size() > COFF::NameSize) | |||
StrTabBuilder.add(S.Name); | |||
StrTabBuilder.add(S.Name, /*Priority=*/UINT8_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like an unrelated change. Is it even necessary? If it is important, should it have a test and should it be in its own PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same potential issue like 4d2eda2.
We need to put long section name in the front of string table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but a) the PR title implies this PR is for lld, so it needs updating or this bit splitting into a second PR and b) llvm-objcopy needs a test that shows that this bit of code has been added.
Address comments Co-authored-by: James Henderson <[email protected]>
@MaskRay, any more comments? |
@@ -0,0 +1,29 @@ | |||
# RUN: llvm-mc -triple=x86_64-windows-msvc %s -filetype=obj -o %t.obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work if x86 is not configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have marked the test as requiring x86 in 757c80d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I didn't aware of that.
String table size is too big for large binary when symbol table is enabled. Some strings in strtab is same so it can be reused. This patch revives 9ffeaaa authored by mstorsjo with the prioritized string table builder to fix debug section name issue (see 4d2eda2 for more details). --------- Co-authored-by: Wen Haohai <[email protected]> Co-authored-by: James Henderson <[email protected]>
String table size is too big for large binary when symbol table is
enabled. Some strings in strtab is same so it can be reused.
This patch revives 9ffeaaa authored by mstorsjo with the prioritized
string table builder to fix debug section name issue (see 4d2eda2
for more details).