Skip to content

Conversation

janvorli
Copy link
Member

There were some GC holes discovered caused by the fact that GC can be triggered during 2nd pass of EH in-between calls to finally handlers and catch handler. After considering options, moving the 2nd pass to native code seems the most reasonable solution.

There were some GC holes discovered caused by the fact that GC can be
triggered during 2nd pass of EH in-between calls to finally handlers and
catch handler. After considering options, moving the 2nd pass to native
code seems the most reasonable solution.
@janvorli janvorli added this to the 10.0.0 milestone Sep 18, 2025
@janvorli janvorli self-assigned this Sep 18, 2025
@Copilot Copilot AI review requested due to automatic review settings September 18, 2025 20:04
@janvorli janvorli added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) area-ExceptionHandling-coreclr labels Sep 18, 2025
@janvorli
Copy link
Member Author

The current state is that it almost works. A catch handler is correctly invoked, but then something crashes.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR moves the CoreCLR exception handling second pass from managed to native code to fix GC holes that can occur when GC is triggered between finally handlers and catch handlers during the second pass of exception handling.

  • Splits existing RhThrowEx/RhThrowHwEx methods into separate first-pass handler finding methods and second-pass execution
  • Moves the second pass execution logic (DispatchExPass2) from managed to native code
  • Removes QCALL exports for CallCatchFunclet and CallFinallyFunclet as they're now called directly from native code

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/qcallentrypoints.cpp Removes QCALL exports for CallCatchFunclet and CallFinallyFunclet
src/coreclr/vm/metasig.h Updates method signatures to include handlingFrameSP and pCatchHandler out parameters
src/coreclr/vm/exinfo.h Adds ContainsCodeOffset helper method to RhEHClause
src/coreclr/vm/exceptionhandlingqcalls.h Removes QCALL declarations for catch/finally funclets
src/coreclr/vm/exceptionhandling.h Adds declaration for new native DispatchExPass2 function
src/coreclr/vm/exceptionhandling.cpp Major refactoring: converts QCALL functions to native, adds new DispatchExPass2 implementation
src/coreclr/vm/excep.cpp Updates managed fault handling to use new method signatures
src/coreclr/vm/corelib.h Updates method name mappings for exception handling entry points
src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/ExceptionHandling.cs Splits exception handling into separate first/second pass methods

@janvorli
Copy link
Member Author

Also, please note that the collided unwind detection is not working yet, as I cannot rely on the pinvokes from the managed EH code anymore. I am working on a fix.
With the contract and EH clauses enumeration fixes based on David's feedback, many EH tests now pass until I hit one with collided unwind. I am reusing the interpreter test ran without interpreter for now.

@janvorli
Copy link
Member Author

And one more thing not done yet is rethrow.

* Reflect PR feedback
* Implement rethrow
* Implement new way of collided unwind detection now that the
  CallCatchFunclet is not called via pinvoke
* Remove forced reporting of EH code from stack frame iterator, as we
  now cannot have that code on the stack during 2nd pass
@jkotas jkotas added the arch-wasm WebAssembly architecture label Sep 19, 2025
Copy link
Contributor

Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara
See info in area-owners.md if you want to be subscribed.

@janvorli
Copy link
Member Author

I am considering reducing the number of changes in the exceptionhandling.cs by a slightly different approach - keeping only the methods we had and adding an extra argument "firstPassOnly" to them. Then if that argument is set to true, in DispatchEx store the handlingFrameSP/PC, _pReversePInvokePropagationCallback/Context and the pCatchHandler into the ExInfo and return;
The native side would then take these from the ExInfo. On NativeAOT, this argument would be ignored.

The current state is that, with some additional changes that I am going to commit after I fix the offsets in StackFrameIterator for some targets, it passes all diagnostics tests and coreclr tests (except ForeignThreadExceptionTest where it ends up messing GC mode when calling managed callback from native code catch)

I also need to add proper contracts here and there.

@janvorli janvorli removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Sep 22, 2025
@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

This is done to ensure that no GC is allowed between the scanned stack
range is extended and a funclet for the current frame is called.
@janvorli
Copy link
Member Author

@jkotas, @davidwrighton the current state of this PR is final from my point of view. Can you please review it?

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is correct and will fix the gc issue. I wish this invariant could be better documented/tested.

@janvorli janvorli merged commit bf4802d into dotnet:main Sep 29, 2025
98 checks passed
@janvorli
Copy link
Member Author

/backport to release/10.0

Copy link
Contributor

Started backporting to release/10.0: https://github.com/dotnet/runtime/actions/runs/18106203717

Copy link
Contributor

@janvorli backporting to "release/10.0" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: [WIP] Move coreclr EH second pass to native code
Applying: Several fixes
Applying: Fix arm64 build and remove now useless stuff from stackwalk
Applying: Fix build break
Applying: Fix offsets, exception interception and MUSL build break
Applying: Alternative change with minimalistic managed code differences
error: sha1 information is lacking or useless (src/coreclr/System.Private.CoreLib/src/System/Runtime/ExceptionServices/AsmOffsets.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0006 Alternative change with minimalistic managed code differences
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants