Skip to content

[lld/mac] Crash with thinlto and --start-lib / --end-lib #59162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nico opened this issue Nov 23, 2022 · 14 comments
Closed

[lld/mac] Crash with thinlto and --start-lib / --end-lib #59162

nico opened this issue Nov 23, 2022 · 14 comments
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:MachO

Comments

@nico
Copy link
Contributor

nico commented Nov 23, 2022

https://drive.google.com/file/d/1ju1O-uXLu4JGq_FbVb3wu4ugiZ_MAWPR/view?usp=sharing

Assertion failed: (file.lazy), function extract, file InputFiles.cpp, line 2224.
Process 12725 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #4: 0x00000001007dae48 ld64.lld`lld::macho::extract(lld::macho::InputFile&, llvm::StringRef) + 300
ld64.lld`lld::macho::extract:
->  0x1007dae48 <+300>: bl     0x106b3f788               ; symbol stub for: __asan_handle_no_return
    0x1007dae4c <+304>: adrp   x0, 27829
    0x1007dae50 <+308>: add    x0, x0, #0xb80            ; __func__._ZN4llvm4castIN3lld5macho18ConcatInputSectionENS2_12InputSectionEEEDcPT0_
    0x1007dae54 <+312>: adrp   x1, 27829
Target 0: (ld64.lld) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #0: 0x0000000185632d98 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000185667ee0 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x00000001855a2340 libsystem_c.dylib`abort + 168
    frame #3: 0x00000001855a1754 libsystem_c.dylib`__assert_rtn + 272
  * frame #4: 0x00000001007dae48 ld64.lld`lld::macho::extract(lld::macho::InputFile&, llvm::StringRef) + 300
    frame #5: 0x0000000100835dc0 ld64.lld`lld::macho::SymbolTable::addUndefined(llvm::StringRef, lld::macho::InputFile*, bool) + 1096
    frame #6: 0x00000001007da798 ld64.lld`lld::macho::BitcodeFile::parse() + 2056
    frame #7: 0x0000000100835dc0 ld64.lld`lld::macho::SymbolTable::addUndefined(llvm::StringRef, lld::macho::InputFile*, bool) + 1096
    frame #8: 0x00000001007da798 ld64.lld`lld::macho::BitcodeFile::parse() + 2056
    frame #9: 0x0000000100837168 ld64.lld`lld::macho::SymbolTable::addLazyObject(llvm::StringRef, lld::macho::InputFile&) + 592
    frame #10: 0x00000001007d9ecc ld64.lld`lld::macho::BitcodeFile::parseLazy() + 504
    frame #11: 0x00000001007d967c ld64.lld`lld::macho::BitcodeFile::BitcodeFile(llvm::MemoryBufferRef, llvm::StringRef, unsigned long long, bool, bool) + 3960
    frame #12: 0x000000010078109c ld64.lld`addFile(llvm::StringRef, LoadType, bool, bool, bool, bool) + 2464
    frame #13: 0x0000000100777da8 ld64.lld`lld::macho::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) + 64400
    frame #14: 0x00000001000022fc ld64.lld`lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) + 3068
    frame #15: 0x00000001000013d0 ld64.lld`lld_main(int, char**) + 696
    frame #16: 0x000000010b1c108c dyld`start + 520

If you link with -t -why_load, it prints

...
obj/content/shell/content_shell_app/shell_crash_reporter_client.o
__ZN7content24ShellCrashReporterClientC1Ev forced load of obj/content/shell/content_shell_app/shell_crash_reporter_client.o
obj/content/shell/content_shell_app/shell_crash_reporter_client.o
__ZTVN7content24ShellCrashReporterClientE forced load of obj/content/shell/content_shell_app/shell_crash_reporter_client.o

So the file forces a load of itself!

We should definitely do bd448f0 in the MachO port (see also D106293), but doing just that isn't enough: We hit some other assert later on then.

Need to understand what's going on first and make a reduced repro.

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

I'm not an LTO expert, but it looks surprising to me that shell_crash_reporter_client.o has __ZTVN7content24ShellCrashReporterClientE twice in its symbol table:

% ~/src/llvm-project/out/gn/bin/llvm-nm -m obj/content/shell/content_shell_app/shell_crash_reporter_client.o | rg __ZTVN7content24ShellCrashReporterClientE
---------------- (LTO,RODATA) private external __ZTVN7content24ShellCrashReporterClientE
                 (undefined) external __ZTVN7content24ShellCrashReporterClientE

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

(On the third hand, this same link completes without problems when using thin archives instead of --start-lib / --end-lib.)

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

adding @MaskRay since he implemented --start-lib in lld/MachO.

@EugeneZelenko EugeneZelenko added lld:MachO crash Prefer [crash-on-valid] or [crash-on-invalid] and removed new issue labels Nov 23, 2022
@llvmbot
Copy link
Member

llvmbot commented Nov 23, 2022

@llvm/issue-subscribers-lld-macho

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

From llvm-modextract -n 0 -o - obj/content/shell/content_shell_app/shell_crash_reporter_client.o | llvm-dis -o -:

@_ZTVN7content24ShellCrashReporterClientE = external hidden unnamed_addr constant { [13 x ptr] }, align 8

From llvm-modextract -n 1 -o - obj/content/shell/content_shell_app/shell_crash_reporter_client.o | llvm-dis -o -:

@_ZTVN7content24ShellCrashReporterClientE = hidden unnamed_addr constant { [13 x ptr] } { [13 x ptr] [ptr null, ...

As far as I know, -n 0 is the actual bitcode, while -n 1 is the thinlto index of the file.

So the vtable is an undef external in the actual bitcode, but it's defined in the index (?)

That seems surprising, but maybe it's normal? (@pcc too because thinlto)

Anyways, llvm-nm uses SymbolicFile and IRObjectFile just gets the symbols from all the file's modules, which explains why llvm-nm prints the symbol twice, once defined and once not.

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

llvm-lto2 dump-symtab obj/content/shell/content_shell_app/shell_crash_reporter_client.o uses the same codepath for dumping the symbol table as the one that lld uses:

% llvm-lto2 dump-symtab obj/content/shell/content_shell_app/shell_crash_reporter_client.o | rg __ZTVN7content24ShellCrashReporterClientE
HU------ __ZTVN7content24ShellCrashReporterClientE
H------- __ZTVN7content24ShellCrashReporterClientE

Other symbols in that file are also present twice, but for others the defined symbol comes before the undefined one (i.e. the definition is in the main bitcode file, instead of in the index.)

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

Note to self: clang calls llvm/include/llvm/Transforms/IPO/ThinLTOBitcodeWriter.h from clang/lib/CodeGen/BackendUtil.cpp to write a thinlto'd bitcode output file, with separate index module.

lld calls into llvm/lib/LTO/ThinLTOCodeGenerator.cpp to link those thin bitcodes.

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

Looks like it's normal that the vtable def goes in the index:

% cat vtable.cc                                                           
struct S {
  S();

  virtual void f();
};

S::S() {}

void S::f() {}
% clang -c vtable.cc -flto=thin -fsplit-lto-unit

% out/gn/bin/llvm-modextract -n 0 -o - vtable.o | out/gn/bin/llvm-dis -o - | rg _ZTV1S
@_ZTV1S = external unnamed_addr constant { [3 x i8*] }, align 8
  store i32 (...)** bitcast (i8** getelementptr inbounds ({ [3 x i8*] }, { [3 x i8*] }* @_ZTV1S, i32 0, inrange i32 0, i32 2) to i32 (...)**), i32 (...)*** %4, align 8

% out/gn/bin/llvm-modextract -n 1 -o - vtable.o | out/gn/bin/llvm-dis -o - | rg _ZTV1S
@_ZTV1S = unnamed_addr constant { [3 x i8*] } { [3 x i8*] [i8* null, i8* bitcast ({ i8*, i8* }* @_ZTI1S to i8*), i8* bitcast (void ()* @_ZN1S1fEv to i8*)] }, align 8, !type !0, !type !1

@nico
Copy link
Contributor Author

nico commented Nov 23, 2022

Here's a reduced repro:

% cat vtable.cc 
struct S {
  S();

  virtual void f();
};

S::S() {}

void S::f() {}

% cat vtable_use.cc 
struct S {
  S();

  virtual void f();
};

int main() {
  S s;
}

% out/gn/bin/clang -c vtable_use.cc vtable.cc -flto=thin -fsplit-lto-unit          

% out/gn/bin/clang++ -Wl,--start-lib vtable.o -Wl,--end-lib vtable_use.o -isysroot $(xcrun -show-sdk-path) -fuse-ld=lld -fsplit-lto-unit                            
Assertion failed: (file.lazy), function extract, file InputFiles.cpp, line 2224.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /Users/thakis/src/llvm-project/out/gn/bin/ld64.lld -demangle -dynamic -arch arm64 -platform_version macos 12.0.0 13.0 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -o a.out --start-lib vtable.o --end-lib vtable_use.o -lc++ -lSystem
clang: error: unable to execute command: Abort trap: 6
clang: error: linker command failed due to signal (use -v to see invocation)

@nico
Copy link
Contributor Author

nico commented Nov 24, 2022

This is almost certainly the right fix:

% git diff
diff --git a/lld/MachO/InputFiles.cpp b/lld/MachO/InputFiles.cpp
index bedc273c5283..1af42b337065 100644
--- a/lld/MachO/InputFiles.cpp
+++ b/lld/MachO/InputFiles.cpp
@@ -2212,8 +2212,10 @@ void BitcodeFile::parseLazy() {
 }
 
 void macho::extract(InputFile &file, StringRef reason) {
+  if (!file.lazy)
+    return;
+
   printArchiveMemberLoad(reason, &file);
-  assert(file.lazy);
   file.lazy = false;
   if (auto *bitcode = dyn_cast<BitcodeFile>(&file)) {
     bitcode->parse();

@nico
Copy link
Contributor Author

nico commented Dec 2, 2022

https://reviews.llvm.org/D139199 is that plus a test. The test asserts with out the fix but not with it, so that's good.

However, porting bd448f0 instead of that fix also makes it pass, but as mentioned above, porting bd448f0 isn't enough to make my larger, unreduced repro work.

So we should add a more involved test for that too at some point.

@nico
Copy link
Contributor Author

nico commented Dec 2, 2022

Turns out the repro in #59162 (comment) also shows that bd448f0 isn't sufficient. Doing just that:

th exit code 1 (use -v to see invocation)
thakis@Nicos-MacBook-Pro llvm-project % out/gn/bin/clang++ -Wl,--start-lib vtable.cc -Wl,--end-lib vtable_use.cc -fuse-ld=lld  -isysroot $(xcrun -show-sdk-path) -arch x86_64 -flto=thin -fsplit-lto-unit
ld64.lld: error: duplicate symbol: typeinfo for S
>>> defined in /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/vtable-479c58.o
>>> defined in /var/folders/w6/wpbtszrs7jl9dc9l5qtdkvg00000gn/T/thinlto-809a0f/1.x86_64.lto.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)

(and it works fine with the proposed patch)

I'm hopeful that a minor tweak to https://reviews.llvm.org/D139199 will show this difference too – just need to find the right tweak.

@MaskRay
Copy link
Member

MaskRay commented Dec 3, 2022

The patch LGTM. Commented there.

@nico nico closed this as completed in 92f8a6e Dec 5, 2022
@nico
Copy link
Contributor Author

nico commented Dec 5, 2022

This is fixed. Still to do:

  • Improve test case to show that just bd448f0 isn't enough
  • Do port bd448f0 since it's a good change
  • Port some more of the lld/ELF start-lib changes (perf improvements, replace .a codepath with it, etc)

nico added a commit to nico/llvm-project that referenced this issue Sep 26, 2023
Ports https://reviews.llvm.org/D106293 to bitcode, or
llvm@bd448f01a6 from ELF to MachO.

See also llvm#59162 for some vaguely related discussion.
nico added a commit that referenced this issue Sep 26, 2023
…#67445)

Ports https://reviews.llvm.org/D106293 to bitcode, or
bd448f01a6 from ELF to
MachO.

See also #59162 for some vaguely related discussion.
legrosbuffle pushed a commit to legrosbuffle/llvm-project that referenced this issue Sep 29, 2023
nico added a commit that referenced this issue Jan 24, 2025
…434 (#124294)

This is a follow-up to #120452 in a way.

Since lld/COFF does not yet insert all defined in an obj file before all
undefineds (ELF and MachO do this, see #67445 and things linked from
there), it's possible that:

1. We add an obj file a.obj
2. a.obj contains an undefined that's in b.obj, causing b.obj to be
added
3. b.obj contains an undefined that's in a part of a.obj that's not yet
in the symbol table, causing a recursive load of a.obj, which adds the
symbols in there twice, leading to duplicate symbol errors.

For normal archives, `ArchiveFile::addMember()` has a `seen` check to
prevent this. For start-lib lazy objects, we can just check if the
archive is still lazy at the recursive call.

This bug is similar to issue #59162.

(Eventually, we'll probably want to do what the MachO and ELF ports do.)

Includes a test that caused duplicate symbol diagnostics before this
code change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:MachO
Projects
None yet
Development

No branches or pull requests

4 participants