Skip to content

Runtime Crash at "__swift_instantiateConcreteTypeFromMangledName" #74303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MahdiBM opened this issue Jun 11, 2024 · 30 comments
Open

Runtime Crash at "__swift_instantiateConcreteTypeFromMangledName" #74303

MahdiBM opened this issue Jun 11, 2024 · 30 comments
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. crash Bug: A crash, i.e., an abnormal termination of software runtime The Swift Runtime

Comments

@MahdiBM
Copy link

MahdiBM commented Jun 11, 2024

Description

DiscordBM crashes at __swift_instantiateConcreteTypeFromMangledName.

Reproduction

  • Clone DiscordBM.
  • On CLI run BOT_TOKEN=aaaaaa swift test --filter IntegrationTests.DiscordClientTests.testGateway.
  • Or on Xcode add BOT_TOKEN as an env var with any value, and run the testGateway() test.

Stack dump

Screenshot 2024-06-11 at 10 31 57 PM

Expected behavior

No crash.

Environment

  • macOS 14 + Swift 5.10 + Xcode 15.4.
  • macOS 15 + Swift 6 + Xcode 16.

Not reproducible on Linux.

Additional information

No response

@MahdiBM MahdiBM added bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. crash Bug: A crash, i.e., an abnormal termination of software triage needed This issue needs more specific labels labels Jun 11, 2024
@MarkVillacampa
Copy link

I believe this could be the same issue: #74333

@MahdiBM
Copy link
Author

MahdiBM commented Jun 13, 2024

I tried a bunch and i noticed if i comment out this line in PartialApplication declaration, it fixes the problem, at least partially (tried these on the be95b07e3cc9db075d201f14051589676dd99785 commit):

    public var flags: IntBitField<DiscordApplication.Flag>?

So IntBitField<DiscordApplication.Flag>? must be causing the problem.

  • Using typealiases doesn't help.
  • removing the optional sign doesn't help.
  • Adding explicit Codable conformance doesn't help, but Xcode will show a line in the decoder as the source of the crash instead: self.flags = try container.decodeIfPresent(IntBitField<DiscordApplication.Flag>.self, forKey: PartialApplication.CodingKeys.flags)
  • Doing stuff like let uint = try container.decodeIfPresent(UInt.self, forKey: PartialApplication.CodingKeys.flags); self.flags = uint.map { .init(rawValue: $0) } in the decoder does not help.

@MahdiBM
Copy link
Author

MahdiBM commented Jul 10, 2024

This issue is still present in Xcode 16 beta 3.
I hope it gets resolved before the RC since a few users of DiscordBM have reported it to me 🙁.

@tbkka
Copy link
Contributor

tbkka commented Jul 10, 2024

CC: @al45tair @mikeash

@mikeash
Copy link
Contributor

mikeash commented Jul 10, 2024

Which commit should I be using with beta 3? I tried 7865b015f9135d8e1799a0f406be0174e4b04d85 (which is current ToT for main) and got a bunch of "error: pattern that the region based isolation checker does not understand how to check. Please file a bug." (Filed rdar://131471942 internally for that.) I tried ff804c560d14e5993261ad1bed923766c1b52265 which is just before mentions of Swift 6 and that crashes with a stack overflow. Then tried be95b07e3cc9db075d201f14051589676dd99785 and that seems to work fine. It's possible I need to more closely match the macOS build, but before I do that I want to make sure I'm targeting a commit that's known to build and crash (in the appropriate way!) with beta 3.

@hborla hborla added runtime The Swift Runtime and removed triage needed This issue needs more specific labels labels Jul 14, 2024
@MahdiBM
Copy link
Author

MahdiBM commented Jul 16, 2024

I would also really appreciate it if there are any workarounds to this so I can apply so earlier Swift versions can also continue to work properly 🙂.

@MahdiBM
Copy link
Author

MahdiBM commented Jul 28, 2024

Still an issue in Xcode 16 beta 4.

@MahdiBM
Copy link
Author

MahdiBM commented Oct 6, 2024

Still an issue on Xcode 16 beta 2.
And still getting frequent user reports about this issue which I have 0 idea how to even work-around.

Pretty frustrating.

I don't even know if looking around in the Swift repo is any good considering this only happens on macOS and Xcode contains a fork of the Swift repo, not the repo itself :/
Not that I'm any good at compiler work, but could have tried.

cc @tbkka @mikeash @hborla

@MahdiBM
Copy link
Author

MahdiBM commented Oct 6, 2024

@mikeash not sure what "commit" but any of the toolchains bundled with the Xcode versions I've mentioned should not have any trouble reproducing the issue.

@mikeash
Copy link
Contributor

mikeash commented Oct 6, 2024

Sorry, I meant which commit in the DiscordBM repository should I use to try to reproduce this?

@MahdiBM
Copy link
Author

MahdiBM commented Oct 6, 2024

Ah hmm really any recent commit. Just checkout main branch.

@mikeash
Copy link
Contributor

mikeash commented Oct 7, 2024

I don't recall what trouble I had previously, but I tried again (with the latest main and Xcode 16.1b2) and I think I've reproduced the issue. The crash is a bit different from what you show, but it seems to be the same basic thing. The problem is that the init(from:) method on Gateway.Event uses an enormous amount of stack space, enough to run out of stack and then crash on the stack guard page. I'm getting a crash in ___chkstk_darwin, but I think your crash in __swift_instantiateConcreteTypeFromMangledName is the same thing, it just happens to be that call which hits the stack guard page in your run.

The cause of the enormous stack frame appears to be the enum Payload type. There are issues with stack usage by large enums with many cases that have associated values. Marking it as indirect, as in indirect public enum Payload: Sendable, mitigates that and allows the tests to run without crashing for me. Hopefully that can work for you as well.

@MahdiBM
Copy link
Author

MahdiBM commented Oct 7, 2024

@mikeash thank you, I'll take that workaround.

@MahdiBM
Copy link
Author

MahdiBM commented Oct 7, 2024

To be clear that enum's size is 100% legitimate. It's an enum with a case for each different type of Discord API payloads, so the size of the enum is not really under my control.

Also would be nice to have a compile error instead of a runtime crash, at least.

@MahdiBM
Copy link
Author

MahdiBM commented Oct 7, 2024

Also ... why crash on macOS but not Linux? Just curious.

@mikeash
Copy link
Contributor

mikeash commented Oct 7, 2024

Yeah, not casting blame here, just describing the root of the issue. Something about how the stack frame layout code works causes it to make a separate area for each enum case's payload, rather using the same space for all of them. This is especially troublesome in debug builds, release builds may stand a better chance at using stack space more efficiently.

Stack space is an odd thing where most systems just sort of handwave the fact that the stack is finite and not particularly large. We just sort of assume it's going to be big enough, without any particular guarantee that it is. Usually you need a lot more recursion to run into the limit, though. This function is a really extreme case where a single function call eats up the whole stack.

I'd guess your Linux is running with a large stack, and so it manages not to run off the end.

@tbkka
Copy link
Contributor

tbkka commented Oct 7, 2024

Is this in debug builds? I had to deal with a similar problem working with enums that had many payload cases: The debug builds would use an enormous amount of stack space, but release builds were quite reasonable. Here's the basic problem:

Consider this:

func f(e: MyEnum) {
  switch e {
  case a(let aa):
     let ab = aa
     let ac = ab
  case b(let bb):
     let bc = bb
     let bd = bc
  case c(let cc):
     ... etc ...
  }
}

In a debug build, the stack frame for this function will have a separate stack slot for every one of the above variables -- aa, ab, ac, bb, bc, bd, etc, etc. That is necessary to ensure that you can inspect any one of them at any time in the debugger. In a release build, the compiler will do some additional optimizations: It will overlay the variables for each case so that aa/ab/ac use the same stack area as bb/bc/bd, and so on This dramatically reduces the amount of stack space required, but means that none of the variables in the switch can be inspected after the end of the switch (because the debugger doesn't know which overlaid variables are really valid).

One way I found to work around this is to rewrite the above in a somewhat roundabout way:

func f(e: MyEnum) {
  switch e {
  case a(let aa):
    {
       let ab = aa
       let ac = ab
    }()
  case b(let bb):
    {
       let bc = bb
       let bd = bc
    }()
  case c(let cc):
     ... etc ...
  }
}

That is, each complex case handler gets put into a separate closure that's invoked immediately. In Debug builds, these closures end up as separate functions with separate stack frames. That avoids the main function having to reserve stack space for every path in the switch. In release builds, these closures get inlined and then optimized, so the final result in that case is the same as in the first version above.

Does this help?

@MahdiBM
Copy link
Author

MahdiBM commented Oct 7, 2024

@mikeash

Yeah, not casting blame here, just describing the root of the issue.

No worries, I didn't notice any 🙂.

@MahdiBM
Copy link
Author

MahdiBM commented Oct 8, 2024

@tbkka

Is this in debug builds?

Hmm i don't recall trying on macOS with a release build, but macOS debug builds would crash while Ubuntu debug or release builds were both fine.

@fmoraes74
Copy link

Is this the same crash?

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff206b190e __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff206e05bd pthread_kill + 263
2 libsystem_c.dylib 0x00007fff20635406 abort + 125
3 libcoreclr.dylib 0x0000000103ac9f7e mono_post_native_crash_handler + 14
4 libcoreclr.dylib 0x0000000103a6dffa mono_handle_native_crash + 458
5 libcoreclr.dylib 0x00000001039c46cf mono_sigsegv_signal_handler_debug + 335
6 libsystem_platform.dylib 0x00007fff20725d7d _sigtramp + 29
7 ??? 000000000000000000 0 + 0
8 libswiftCore.dylib 0x00007fff2cdf42dd swift::Demangle::__runtime::Demangler::demangleSymbolicReference(unsigned char) + 141
9 libswiftCore.dylib 0x00007fff2cdf12a8 swift::Demangle::__runtime::Demangler::demangleType(__swift::__runtime::llvm::StringRef, std::__1::function<swift::Demangle::__runtime::Node* (swift::Demangle::__runtime::SymbolicReferenceKind, swift::Demangle::__runtime::Directness, int, void const*)>) + 168
10 libswiftCore.dylib 0x00007fff2cdd75a4 swift_getTypeByMangledNameImpl(swift::MetadataRequest, __swift::__runtime::llvm::StringRef, void const* const*, std::__1::function<swift::TargetMetadataswift::InProcess const* (unsigned int, unsigned int)>, std::__1::function<swift::TargetWitnessTableswift::InProcess const* (swift::TargetMetadataswift::InProcess const*, unsigned int)>) + 516
11 libswiftCore.dylib 0x00007fff2cdd4d6d swift::swift_getTypeByMangledName(swift::MetadataRequest, __swift::__runtime::llvm::StringRef, void const* const*, std::__1::function<swift::TargetMetadataswift::InProcess const* (unsigned int, unsigned int)>, std::__1::function<swift::TargetWitnessTableswift::InProcess const* (swift::TargetMetadataswift::InProcess const*, unsigned int)>) + 477
12 libswiftCore.dylib 0x00007fff2cdd4f9b swift_getTypeByMangledNameInContext + 171
13  HSTracker                       0x000000010231aa81  __swift_instantiateConcreteTypeFromMangledName (in HSTracker) (/<compiler-generated>:0)
14  HSTracker                       0x00000001025b8a9e  closure #1 in BattlegroundsSession.battlegroundsGameMode.setter (in HSTracker) (<stdin>:0)
15  HSTracker                       0x000000010263b259  thunk for @escaping @callee_guaranteed () -> () (in HSTracker) (/<compiler-generated>:0)
16 libdispatch.dylib 0x00007fff20535623 _dispatch_call_block_and_release + 12
17 libdispatch.dylib 0x00007fff20536806 _dispatch_client_callout + 8
18 libdispatch.dylib 0x00007fff20542b4f _dispatch_main_queue_callback_4CF + 940
19 com.apple.CoreFoundation 0x00007fff208158d8 CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE + 9
20 com.apple.CoreFoundation 0x00007fff207d7b32 __CFRunLoopRun + 2755
21 com.apple.CoreFoundation 0x00007fff207d69ac CFRunLoopRunSpecific + 563
22 com.apple.HIToolbox 0x00007fff28a211f3 RunCurrentEventLoopInMode + 292
23 com.apple.HIToolbox 0x00007fff28a20f55 ReceiveNextEventCommon + 587
24 com.apple.HIToolbox 0x00007fff28a20cf3 _BlockUntilNextEventMatchingListInModeWithFilter + 70
25 com.apple.AppKit 0x00007fff22fe0ad2 _DPSNextEvent + 864
26 com.apple.AppKit 0x00007fff22fdf2a5 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1364
27 com.apple.AppKit 0x00007fff22fd15c9 -[NSApplication run] + 586
28 com.apple.AppKit 0x00007fff22fa57cc NSApplicationMain + 816
29  HSTracker                       0x0000000102644369  main (in HSTracker) (AppDelegate.swift:19)
30 libdyld.dylib 0x00007fff206fbf3d start + 1

Reported on MacOS 11.7.0 only as far as I have heard so far. Most other users do not seem to be affected. Code was compiled with Xcode 16.1.

@mikeash
Copy link
Contributor

mikeash commented Nov 21, 2024

Hard to tell without more info. The VM Region Info from an Apple crash log would show pretty definitively if we ran out of stack space or had some other problem. Looking at this backtrace, I think this is something else. The only stack frame that could potentially use a huge amount of stack would be #14 (BattlegroundsSession.battlegroundsGameMode.setter) and it's very unlikely that a setter would be complicated enough to have this problem.

Given the crash is in demangleSymbolicReference on an older OS, I suspect this is the use of a weak-linked symbol which isn't present on that OS version. If you can see what type it's trying to use in frame #14, you can check to see if that type or any of its generic parameters were introduced in a later OS version.

@fmoraes74
Copy link

That block of core on #14 is the block seen here:

            let modified = _battlegroundsGameMode != newValue
            _battlegroundsGameMode = newValue
            if modified {
                DispatchQueue.main.async {
                    self.updateSectionsVisibilities()
                    if #available(macOS 10.15, *) {
                        Task.detached {
                            await self.updateCompositionStatsVisibility()
                        }
                    }
                    self.update()
                    self.updateScaling()
                }
            }

I decompiled the code and matched the address and it was inside the block being dispatched on the main thread and it is roughtly this:

int _$s9HSTracker20BattlegroundsSessionC21battlegroundsGameModeAA08SelectedbeF0OvsyyScMYccfU_(int arg0, int arg1, int arg2, int arg3) {
    rcx = arg3;
    r13 = arg0;
    HSTracker.BattlegroundsSession.updateSectionsVisibilities(arg0, arg1, arg2, rcx);
    rdi = 0xa;
    rsi = 0xf;
    rdx = 0x0;
    if ((Swift._stdlib_isOSVersionAtLeast(rdi, rsi, rdx) & 0x1) != 0x0) {
            r14 = &stack[-56] - (*(*(___swift_instantiateConcreteTypeFromMangledName(_$sScPSgMD, 0xf, 0x0, rcx) - 0x8) + 0x40) + 0xf & 0xfffffffffffffff0);

@fmoraes74
Copy link

VM region near:

VM Regions Near 0:
-->
__TEXT 102315000-1030fa000 [ 13.9M] r-x/r-x SM=COW /Applications/HSTracker.app/Contents/MacOS/HSTracker

and

VM Region Summary:
ReadOnly portion of Libraries: Total=905.7M resident=0K(0%) swapped_out_or_unallocated=905.7M(100%)
Writable regions: Total=673.9M written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=673.9M(100%)

                            VIRTUAL   REGION 
REGION TYPE SIZE COUNT (non-coalesced)
=========== ======= =======
Accelerate framework 256K 2
Activity Tracing 256K 1
CG backing stores 3240K 6
CG image 4100K 1
CoreAnimation 3904K 35
CoreGraphics 12K 2
CoreUI image data 340K 3
Foundation 16K 1
Image IO 48K 6
Kernel Alloc Once 8K 1
MALLOC 231.1M 82
MALLOC guard page 48K 12
MALLOC_LARGE (reserved) 128K 1 reserved VM address space (unallocated)
MALLOC_NANO (reserved) 384.0M 1 reserved VM address space (unallocated)
SQLite page cache 128K 2
STACK GUARD 56.1M 26
Stack 23.7M 26
VM_ALLOCATE 78.1M 53
VM_ALLOCATE (reserved) 8192K 1 reserved VM address space (unallocated)
__DATA 21.3M 448
__DATA_CONST 24.5M 279
__DATA_DIRTY 1462K 175
__FONT_DATA 4K 1
__LINKEDIT 502.7M 13
__OBJC_RO 70.3M 1
__OBJC_RW 2496K 3
__TEXT 403.0M 449
__UNICODE 588K 1
libnetwork 128K 8
mapped file 416.9M 48
shared memory 752K 13
=========== ======= =======
TOTAL 2.2G 1701
TOTAL, minus reserved VM space 1.8G 1701

@fmoraes74
Copy link

Also forgot to say that this crash appears to have started after I upgraded from Xcode 15.3 to 16.1

@mikeash
Copy link
Contributor

mikeash commented Nov 21, 2024

$sScPSgMD is demangling cache variable for type metadata for Swift.TaskPriority? and that seems to be the type it's failing on. macOS 11 requires a back-deployment copy of libswift_Concurrency.dylib to be embedded in the app, which is where that type lives. I suspect that dylib has gone missing from your app bundle, and this is the first thing that tries to use it.

@fmoraes74
Copy link

@mikeash Thanks for pointing it out. I compared the previous application bundles and it was present until the most recent version, so I need to see what went wrong.

@fmoraes74
Copy link

I reviewed my project.pbxprof changes and I don't see anything that should have caused the library from being included. I have added a manual link with library as optional per one Stackoverflow post suggestion and it seems to be packaging now. Wonder if this is an Xcode bug

@fmoraes74
Copy link

I rebuilt the application with Xcode 15.3 and the problem went away. I tried building with 16.1 and 16.2 beta 3 and embedding libswift_Concurrency.dylib but the problem persisted until I build with 15.3 again.

Reported the issue to Apple via https://feedbackassistant.apple.com/feedback/15937972

@mikeash
Copy link
Contributor

mikeash commented Dec 5, 2024

Thank you for the feedback report. That made its way to me and I've located the issue. The problem is an incorrect path referenced for libswift_Concurrency.dylib. When targeting OS versions that need the back-deployment library, the executable needs to use an @rpath relative path to refer to it. There are some magic symbols which tell the linker to do this. But Xcode 16 accidentally broke this when targeting versions earlier than 10.15.

This will fix it: #77980. Of course, I can't comment as to when that might find its way into an Xcode release. The good news is that it should be pretty easy to work around. Add a postprocessing step to your builds (a shell script build phase would work nicely) to fix the path by running this command:

install_name_tool -change /usr/lib/swift/libswift_Concurrency.dylib @rpath/libswift_Concurrency.dylib /path/to/your/executable

@fmoraes74
Copy link

@mikeash I downgraded to Xcode 15.3 and I verified the with otool -l that the @rpath is now correct but I am still getting a crash:

8 libswiftCore.dylib 0x00007fff2cc6c2dd swift::Demangle::__runtime::Demangler::demangleSymbolicReference(unsigned char) + 141
9 libswiftCore.dylib 0x00007fff2cc692a8 swift::Demangle::__runtime::Demangler::demangleType(__swift::__runtime::llvm::StringRef, std::__1::function<swift::Demangle::__runtime::Node* (swift::Demangle::__runtime::SymbolicReferenceKind, swift::Demangle::__runtime::Directness, int, void const*)>) + 168
10 libswiftCore.dylib 0x00007fff2cc4f5a4 swift_getTypeByMangledNameImpl(swift::MetadataRequest, __swift::__runtime::llvm::StringRef, void const* const*, std::__1::function<swift::TargetMetadata<swift::InProcess> const* (unsigned int, unsigned int)>, std::__1::function<swift::TargetWitnessTable<swift::InProcess> const* (swift::TargetMetadata<swift::InProcess> const*, unsigned int)>) + 516
11 libswiftCore.dylib 0x00007fff2cc4cd6d swift::swift_getTypeByMangledName(swift::MetadataRequest, __swift::__runtime::llvm::StringRef, void const* const*, std::__1::function<swift::TargetMetadata<swift::InProcess> const* (unsigned int, unsigned int)>, std::__1::function<swift::TargetWitnessTable<swift::InProcess> const* (swift::TargetMetadata<swift::InProcess> const*, unsigned int)>) + 477
12 libswiftCore.dylib 0x00007fff2cc4cf9b swift_getTypeByMangledNameInContext + 171
__swift_instantiateConcreteTypeFromMangledName (in HSTracker) (/<compiler-generated>:0)
closure #1 in BattlegroundsSession.battlegroundsGameMode.setter (in HSTracker) (<stdin>:0)
thunk for @escaping @callee_guaranteed () -> () (in HSTracker) (/<compiler-generated>:0)

Here's the output of otool -l:

Load command 49
          cmd LC_LOAD_WEAK_DYLIB
      cmdsize 64
         name @rpath/libswift_Concurrency.dylib (offset 24)
   time stamp 2 Wed Dec 31 19:00:02 1969
      current version 0.0.0
compatibility version 1.0.0

Any ideas of what else might be wrong? I don't know how to confirm I have fixed the crash without access to a system with the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A deviation from expected or documented behavior. Also: expected but undesirable behavior. crash Bug: A crash, i.e., an abnormal termination of software runtime The Swift Runtime
Projects
None yet
Development

No branches or pull requests

6 participants