-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add noexcept(false) to destructors for gil_scoped_release #2215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is always a bug, which makes me wonder if |
This is kind of a ugly situation but sometimes unavoidable in larger projects. The |
I'm just thinking that what is undoubtedly a bug should make a big and loud crash, so that detecting such a bug is easier. |
I think it is very, very difficult to avoid these situations in some cases. Remember that we're not purposely trying to acquire the GIL when Python is dead: it is the destructor for a I am generally in favor of clean destruction, especially when it makes running tools like valgrind and sanitizers easier, but in this particular case the payoff doesn't seem worth it. |
I'm still not convinced that
|
In some implementations, yes. For example: https://stackoverflow.com/questions/11452546/why-does-pthread-exit-throw-something-caught-by-ellipsis It is indeed pretty naughty to throw an exception from a C API, but this is the sort of thing a compiler is allowed to do, because it's the one defining the ABI in the first place.
I can test if this would solve our problem. I also don't like it either, since if you don't throw an exception from the destructor you'll exit out of the destructor into a context where you ought to have had the GIL, but you don't actually have it, and so the failure happens at a later point in time. Forcing an unwinding is probably preferable since it makes the situation more obvious. But I am not too sure. |
But Python is dead, at that moment, right? No one is supposed to be touching the GIL or anything else in Python, at that point!
I think this is @bstaletic's argument on why to just have the whole program crash directly anyway, and not maybe catch some exception thrown by |
In our particular case, we know everything's going to be all right, because our top level loop looks like:
But I think I would agree that in general it's dangerous. @colesbury do you find yourself convinced by the arguments here too? |
Here is a self contained example, which I think will be a better reference than complicated PyTorch internals: https://github.com/colesbury/pybind-exit-test I'd like to emphasize that the standard The proposed change isn't perfect (I think nested sequences of
Python is shutting down (usually). Re-acquiring the GIL from another thread is pretty common and behaves well when directly using the Python C API. Comments in the CPython code base suggest that this is explicitly supported.
I don't think the proposed change is likely to hide any additional bugs. The called functions don't throw C++ exceptions on error.
That won't work. The thread will return into the Python interpreter without having acquired the GIL.
Where do you think the bug lies? What change could the pybind-exit-test make to reliably avoid crashing? |
The
Again, it segfaults if there's another exception already in flight.
Calling EDIT: Just executing |
As you point out, the example doesn't use C++ exceptions so it doesn't handle them. You can change it to handle C++ exceptions if you like, but I don't think it's relevant to this issue.
Neither of those are reliable fixes. In real programs, the interpreter shutdown ( |
There are still too many unknowns for me to seriously consider this PR. What I've understood from this discussion: accessing the GIL when Python has shut down causes it to call pthread_exit() -- this seems very bad to me, but it's in CPython and not something that we can control here. Then GCC/libc raises __forced_unwind (a C-style exception?), which C++ seems to understand and be able to catch/propagate (this strikes me as something heavily compiler/platform-dependent). And you want it to be propagated rather than crashing in the destructor, where exceptions are forbidden. But that is just one very specific platform: what about Clang with libc++/libc++abi (it doesn't even know about abi::forced_unwind), what about MacOS, what about MSVC? How is the unwind happening there? To be honest, this seems to me like a case where it would be good to take a step back and see whether there isn't perhaps an alternative architecture that avoids these kinds of situations altogether (which reek of undefined behavior even with this hypothetical change). If you really cannot avoid it at all, then I would suggest that you create your own flavor of gil_scoped_release in PyTorch that prods Python to see if it is still alive before trying to release the GIL (and involving nasty C-style exceptions being thrown from pthread_exit). |
When a thread attempts to acquire the GIL but the Python interpreter has already destructed, Python will attempt to terminate the thread using
pthread_exit
. In many implementations ofpthread_exit
, this will trigger a stack unwinding, which will immediately callstd::terminate
if you are inside a destructor withnoexcept(true)
. Which is the case for the destructor ofgil_scoped_release
, which will attempt to acquire the GIL on reentry. The net effect of this any code which usesgil_scoped_release
and is called from a daemon thread from Python is likely to cause your process to unceremoniously exit.The fix seems to be quite simple:
Do you agree with this change? If so I can submit a PR for it. More context: pytorch/pytorch#38228
It would be reasonably simple to produce a repro that doesn't involve PyTorch, please let me know if that would be helpful.
The text was updated successfully, but these errors were encountered: