Skip to content

Finalizer can crash process when native libgit2 failed to load #1436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AArnott opened this issue Apr 16, 2017 · 18 comments
Closed

Finalizer can crash process when native libgit2 failed to load #1436

AArnott opened this issue Apr 16, 2017 · 18 comments

Comments

@AArnott
Copy link
Contributor

AArnott commented Apr 16, 2017

I've hit this myself and worked around it by helping libgit2sharp find the native binary. But for those times when it still fails it should be permissible for the caller to catch the exception regarding not finding the native binary and move on. But in this case, libgit2sharp has a finalizer which ends up re-throwing the exception. And when finalizers throw, the process goes down:

[ERROR] FATAL UNHANDLED EXCEPTION: System.TypeInitializationException: The type initializer for 'LibGit2Sharp.Core.NativeMethods' threw an exception. ---> System.DllNotFoundException: git2-1196807
  at (wrapper managed-to-native) LibGit2Sharp.Core.NativeMethods:git_libgit2_init ()
  at LibGit2Sharp.Core.NativeMethods+LibraryLifetimeObject..ctor () [0x00006] in <48079e5da5364f5db01fd1de5fb12b0d>:0 
  at LibGit2Sharp.Core.NativeMethods..cctor () [0x00054] in <48079e5da5364f5db01fd1de5fb12b0d>:0 
   --- End of inner exception stack trace ---
  at LibGit2Sharp.Core.NativeMethods+LibraryLifetimeObject.Finalize () [0x00000] in <48079e5da5364f5db01fd1de5fb12b0d

By the way, this is too much work for a finalizer to do. The only safe thing for a finalizer (at least during appdomain shutdown) is access structs. Reference types like LibraryLifetimeObject aren't safe to assume they can be used.

So at least the Finalize method should catch this exception and swallow it. But preferably, it should invoke native methods and access its own struct fields only.

@ethomson
Copy link
Member

Hmm, interesting. Am I reading correctly that the "inner" exception is the TypeInitializationException (from git_libgit2_init), which caused the finalizer to run and it also threw a TypeInitializationException?

Putting aside the correctness of the utility of the finalizer, I'm not sure that I understand why the Finalize would be trying to do anything though, handlesCount should never have been incremented since the ctor failed before AddHandle was called. So the finalizer should decrement handlesCount and return -1...

Obviously I'm not understanding something. Sorry, this is just musing, I haven't stepped through this to understand why I'm wrong. But I will unless unless somebody hits me with the clue stick before I get to this.

@whoisj
Copy link

whoisj commented Apr 17, 2017

It appears the static finalizer is getting invoked when the app domain is coming down, however since the static constructor was never invoked, the runtime is "helpfully" invoking it in the call to the finalizer.

If I'm correct, this is seriously putting the cart before the horse.

@AArnott the finalizer is only closing a handle, this is exactly what they're for. It is the runtime's decision to call the c'tor during finalization that is mind-blowing.

One bit I would add here is that the method is attributed with [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)], which has the following implications:

(from: https://msdn.microsoft.com/en-us/library/ms228973(v=vs.110).aspx)

A constrained execution region (CER) is part of a mechanism for authoring reliable managed code. A CER defines an area in which the common language runtime (CLR) is constrained from throwing out-of-band exceptions that would prevent the code in the area from executing in its entirety. Within that region, user code is constrained from executing code that would result in the throwing of out-of-band exceptions. The PrepareConstrainedRegions method must immediately precede a try block and marks catch, finally, and fault blocks as constrained execution regions. Once marked as a constrained region, code must only call other code with strong reliability contracts, and code should not allocate or make virtual calls to unprepared or unreliable methods unless the code is prepared to handle failures. The CLR delays thread aborts for code that is executing in a CER.

Constrained execution regions are used in different forms in the CLR in addition to an annotated try block, notably critical finalizers executing in classes derived from the CriticalFinalizerObject class and code executed using the ExecuteCodeWithGuaranteedCleanup method.

Given the CLR rules about CER, the finalizer will always be called and should never throw. Seems like there might be a bug...

@AArnott
Copy link
Contributor Author

AArnott commented Apr 17, 2017

It appears the static finalizer is getting invoked when the app domain is coming down, however since the static constructor was never invoked, the runtime is "helpfully" invoking it in the call to the finalizer.

I don't think there's any such thing as a "static finalizer". It's an (instance) finalizer. And this finalizer breaks the rules of avoiding all other reference types that I mentioned initially. One problem with breaking this rule is during AppDomain shutdown, other reference types are in an unknown state (since appdomain finalization does not happen in dependency order so the ref types you reference may have already been destroyed). This finalizer calls RemoveHandle which is defined in the nesting outer type. This requires that the outer type's static constructor be invoked, so as happens regularly, the CLR does that. Note too that a static constructor can only be invoked once per CLR thread safety rules. So this exception may be a replayed exception from an earlier attempt at running the .cctor, as documented:

  • If a static constructor throws an exception, the runtime will not invoke it a second time, and the type will remain uninitialized for the lifetime of the application domain in which your program is running.

Now particularly interesting is that this LibraryLifetimeObject instance can only have been created from within the NativeMethods.cctor, and it's the very last instruction there, so the finalizer would only run on it if the LibraryLifetimeObject.ctor had happened before. So I think what happened (plus, it fits well into the original repro scenario) is that someone first tried to use LibGit2Sharp, it threw this TypeInitializationException and the caller gave up. But as part of throwing this, because it was thrown from within the ctor of a CriticalFinalizerObject, the CLR (I guess) decided it was still worth finalizing even though the ctor did not successfully complete. And the instance's finalizer isn't prepared to deal with finalizing that will fail for the same reason.

BTW, in my experience I find it's really bad for a cctor to throw. The documented CLR rules guarantee that a cctor has executed before any static or instance members are invoked on it, but it doesn't document how much earlier it runs than when the last moment it must run. In particular, from this documentation:

  • A static constructor is called automatically to initialize the class before the first instance is created or any static members are referenced.
  • The user has no control on when the static constructor is executed in the program.

In other words, the cctor's could be run long before they are needed (or perhaps before other state like env vars that they implicitly depend on has occurred). So I find it's best to keep cctor's very simple, if present at all.

So my advice for fixing this would be any or all of the following:

  1. Simplify the cctor, or ensure it never throws by catching any exceptions it throws and simply storing state (perhaps the exception itself) in a static field so others can observe the exception later.
  2. Move the RemoveHandle() method and all transitive references of the outer class into the nested type so that it is self contained.
  3. Set a Boolean field in LibraryLifetimeObject to true as the last line in its ctor, and then code the finalizer to only execute RemoveHandle if the field is set to true.

Any one of these should be sufficient to solve the problem. And the last one looks by far the easiest and wouldn't introduce a breaking change.

@whoisj
Copy link

whoisj commented Apr 17, 2017

The pattern is generally more like:

~TypeName() { Dispose(); }
void Dispose() {
    IntPtr handle;
    if ((handle = Interlocked.Exchange(ref memberHandle, IntPtr.Zero)) != IntPtr.Zero) {
        CloseHandle(handle);
    }
    GC.SuppressFinalize(this);
}

Not sure how we get from here to there, but making all of this stuff non-static could be a start.

@AArnott
Copy link
Contributor Author

AArnott commented Apr 17, 2017

The official dispose pattern is closer to this:

~TypeName() { Dispose(false); }

public void Dispose()
{
    Dispose(true);
    GC.SuppressFinalize(this);
}

protected virtual void Dispose(bool disposing) {
    if (disposing)
    {
        // dispose of managed resources.
    } 

    // Dispose of native resources.
    IntPtr handle;
    if ((handle = Interlocked.Exchange(ref memberHandle, IntPtr.Zero)) != IntPtr.Zero) {
        CloseHandle(handle); // where CloseHandle is a direct p/invoke call that doesn't require ref types.
    }
}

@ethomson
Copy link
Member

It seems to me that there are two issues here; the static constructor shouldn't be throwing and there are issues with the finalizer. Let's tackle the cctor first and squirrel away the load failures, or simply catch them and let subsequent pinvokes fail to load also.

@ethomson
Copy link
Member

That usual pattern is how to make IDisposable interact nicely with finalizers, so that you don't wind up invoking the native cleanup in both. We don't have anything IDisposable here, we only do finalizers, so having Dispose methods wouldn't have utility for us, and we should just do our native cleanup code.

However I do want to point out that I think that the title of this issue is misleading - despite being in the stack trace, it's not the finalizer that's throwing here, it's the static constructor. Which will fail with a TypeInitializationException since there was an exception in the static constructor. That the finalizer is in the stack trace is a bit odd, I concede. I suppose what's actually going on is that the finalizer just rethrows the initializer's exception?

You can make the finalizer trivial (int i = 1+1;) to convince yourself that it's not doing anything unsafe and it will still be blamed in the stack trace.

Now having said that, we don't need this reference counting magic anymore. I'm not entirely clear why this would have been useful in the first place. It was so that the SafeHandleBase could ensure that it ran finalizers before calling git_libgit2_shutdown... but I don't understand how the LibraryLifetimeObject's finalizer could be called before NativeMethods is "finalized", which is not deterministic but definitely cannot happen before it's used last. IOW, nothing that calls NativeMethods can outlast this finalizer... So we don't need to refcount. (And in fact we don't; we don't have objects that extend SafeHandle and do this reference counting anymore, so it's only ever incremented in the cctor and decremented in the finalizer.)

I'll simplify the static constructor and the finalizer and just make then call git_libgit2_init and git_libgit2_shtudown, which is all they're doing anyway.

@ethomson
Copy link
Member

I suppose what's actually going on is that the finalizer just rethrows the initializer's exception?

Oh, that's exactly what you said, @AArnott :

But in this case, libgit2sharp has a finalizer which ends up re-throwing the exception. And when finalizers throw, the process goes down:

Whoops. Reading comprehension fail on my part.

@whoisj
Copy link

whoisj commented Apr 18, 2017

but I don't understand how the LibraryLifetimeObject's finalizer could be called before NativeMethods is "finalized", which is not deterministic but definitely cannot happen before it's used last.

CriticalFinalizer types will always be finalized. I'm not sure the run-time knows for sure when a static instance has been finalized, and the act of touching the type invokes its c'tor. It seems to be the events are as follows:

  1. AppDomain is coming down (process exit most likely).
  2. Run-time is finalizing all CriticalFinalizer objects.
  3. The static instances gets touched, and its c'tor invoked.
  4. The c'tor cannot allocate (see Change License  #1) and throws, which is interpreted as a TypeInitializationException.

@AArnott
Copy link
Contributor Author

AArnott commented Apr 18, 2017

You're confusing ctor with cctor I think. The exception isn't thrown due to an allocation error. Look at the Inner exception: it throws because it cannot find the native binary.

@ethomson
Copy link
Member

I don't think that @whoisj meant allocation in terms of malloc, I think that he meant that the constructor generally failed and the object could not be initialized.

@AArnott
Copy link
Contributor Author

AArnott commented Apr 18, 2017

In @whoisj's event sequence, the finalizer invokes a ctor. It does not. The callstack appears to suggest the cctor is invoked, but that doesn't happen either. In fact the Finalizer never really runs, because the CLR sees that it would need a cctor to have run but it failed in the past so it just rethrows right then and there without having tried anything. I'd be interested in what @whoisj means by "static instance" as well -- that sounds like a contradiction in terms and I'm not sure whether you mean the NativeMethods static class (which has no instance) or the nested class (which seems to have exactly one instance, but is not itself static).

@KirillOsenkov
Copy link

Just to remind you folks that the failure we're seeing is in Mono Runtime, not the CLR. I'm sure most of the above still holds, just another caveat to keep in mind.

@ethomson
Copy link
Member

@KirillOsenkov Sorry, which failure are you talking about?

@KirillOsenkov
Copy link

The one in this bug - the crash Andrew reported is from Mono runtime. This is when trying to build a project on a Mac that uses NerdBank.GitVersioning, which in turn uses libgit2sharp, which in turn fails to find or load the native git binary (git2-1196807).

@ethomson
Copy link
Member

Oh, I see what you're saying. It seemed to behave identically in the CLR and in Mono for me when the native library couldn't be loaded. (And #1438 similarly seems to correct it in both.)

@KirillOsenkov
Copy link

Oh, how wonderful, there is a fix pending? You guys don't disappoint!

@ethomson
Copy link
Member

Fixed via #1438

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants