-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Avoid thread termination in scoped_released #2657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do not call `PyEval_RestoreThread()` from `~gil_scoped_release()` if python runtime is finalizing, as it will result in thread termination in Python runtime newer than 3.6, as documented in https://docs.python.org/3/c-api/init.html#c.PyEval_RestoreThread Similarly do not call `PyThreadState_DeleteCurrent` from `~gil_scoped_acquire()` if runtime is finalizing. Discovered while debugging PyTorch crash using Python-3.9 described in pytorch/pytorch#47776
@henryiii please let me know if change makes sense to you (and suggest who could review it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @rwgk & @YannickJadoul might want to check/verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's what the docs say, definitely!
2 small things:
- What about 3.6 and before? Seems the behavior was just not documented? Do we need to always call it, or could we use the post-3.6 code for 3.6 and earlier as well?
- I guess there's no point in mixing compile time and runtime checks to collapse this into
if (PY_VERSION_HEX < 0x0307000 || !_Py_IsFinalizing())
? I guess it just changes two times one line of code duplication, but that's probably not worth the potential static analysis warnings?
Thanks for catching/fixing this, btw, @malfet! :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much trouble is it to add a minimal unit test that is expected to work reliably only with this change? (I'm OK without adding a test if the setup is too much trouble.)
include/pybind11/pybind11.h
Outdated
// See https://docs.python.org/3/c-api/init.html#c.PyEval_RestoreThread | ||
if (!_Py_IsFinalizing()) | ||
PyEval_RestoreThread(tstate); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: typo: finilizing
More significantly: I think the code could be made more readable and maintainable by centralizing the #if #else #endif
in a, e.g., detail::finalization_guard()
inline helper function, so that there is a formal link to the comment from both places, and less preprocessor clutter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that that's not just going to add more confusion, to have another indirection? As long as it's just these two (very related) spots, to me, this seem reasonably fine? (We currently have these kinds of things inline in other places as well, and this should make it easy to remove once we drop <= 3.6?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The practical version of DRY/SPOT that I've seen states that exactly one duplication is allowed, if it's clearly more complex to combine it. Trying to take DRY/SPOT to the extreme of truly one point of truth for all code can impose an unmanageable level of complexity, so allowing it to be relaxed by one at programmer discretion balances the ideal with practicality. (and if there's more than one duplication, the extra complexity is always worth it).
In short, I'm fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Henry! I hadn't heard of DRY/SPOT before, good to know there is something to refer to.
In this particular case I'm not so much concerned about the code duplication, but mostly about the comment only appearing in one place. For someone arriving at the second place from some completely different context, they will not even know the comment exists in the other place, and may miss it. But if they get curious about detail::finalization_guard()
, they are guaranteed to find the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is true, though! I completely agree there. There should at least be a "see a few lines up/down, at ...".
Apart from that, I don't feel too strongly, so up to @malfet AFAIC, but I just though the #if SOME_PYTHON_VERSION
pattern exists inline in lots of places of code.
And it's kind of hard to capture what it's doing (i.e., finalization_guard
doesn't really describe what's happening) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. Fixed typo and moved and extended runtime finalization check to pybind11/detail/internals.h
If it only happens on interpreter finalization, it would have to be a test that spawns a Python instance. Probably not trivial, but not awful either (we have some already like that). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
Super minor, in two place you have "runtime is finalizing". It feels like a "the" is missing: "the runtime is finalizing".
I don't know to tell the truth, but there is a |
Suppose we have two threads. One is holding the GIL and started to finalize the interpreter. The other wants to use a CPython API and so first calls Any thoughts regarding that? |
And rename it to `is_finalizing`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this already makes me happier to see detail::is_finalizing
and not in internals.h
:-) Thanks!
I do still want to see @bstaletic's comment addressed though. I don't know about all the GIL/threading internals of Python, but @bstaletic's points seems to be a valid concern, where we might be breaking other code?
@bstaletic This scenario is unrelated to the change, as it does not change the behavior of |
Right, my bad. What about a thread that's doing somethting like this? #include <pybind11/pybind11.h>
auto the_other_thread() {
{
pybind11::gil_scoped_release g;
while(!_Py_IsFinalizing());
}
pybind11::list l;
for(auto i = 0; i < 1000000; ++i) l.append(pybind11::str("something long"));
}
PYBIND11_MODULE(foo, m) {
m.def("the_other_thread", the_other_thread);
} Obviously, this snippet shouldn't pass any kind of code review, but the point is that the thread might attempt to do anything with the CPython API. Before this PR, it would try to acquire the gil and would get killed. This makes the problem manifest as close as possible to the cause of the same problem. How do things change with this PR? If I understand things correctly, since the destructor of |
Hmm. In that case, would throwing a runtime_error exception results in a better behaviour? |
We're talking about Now my first thought was "bad things are already happening, it's no worse than right now", but that might not be true. If we allow
To be clear, throwing destructors still make me very anxious. However, this might be a non-evil exception. If we are absolutely 100% positive that only the code that was already horribly broken (such as my snippet above) would stay horribly broken, this throwing destructor might be okay. To whomever is still reading, when thinking about this, consider a really bad situation:
#2215 left one thing unanswered. In the bad scenario described in #2215, we know the behaviour of POSIX systems, but we have no idea how Windows would behave, as that question was left unanswered. Which, to be honest, makes me even more wary of throwing an exception from Anyway, that was me trying to convince myself that a throwing destructor might be a viable option. A completely different solution, though not great, could be a sort of an escape hatch. Something like And yes, I'm fully aware of the downsides that are involved in actually using such a mechanism. I'm just thinking out loud. None of the ideas I've thought of are perfect and I'm absolutely open for ideas. I'm also open to pybind11 adopting a least bad solution as long as we understand what the implications are. |
How about #include <pybind11/pybind11.h>
auto the_other_thread() {
{
pybind11::gil_scoped_release g;
while(!_Py_IsFinalizing());
if(_Py_IsFinalizing())
g.deactivate();
}
// Now it's obvious that you may not have the gil here, and you could add an if !_Py_IsFinalizing() protection.
pybind11::list l;
for(auto i = 0; i < 1000000; ++i) l.append(pybind11::str("something long"));
} Looking at it, I like |
And, of course, for pytorch, you could then subclass these two guards, and add a conditional if that calls this in the destructor. |
I think we decided on Should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this is pretty slim/clean!
2 more things:
- Do we need a test? As far as I'm concerned, we're following the Python docs and a test would probably be more hazardous than the actual straightforwards code changes we have here. Also the GIL RAII classes are already tested.
- Can we confirm this solution also works for @malfet? In principle, you could now still wrap this in a PyTorch-specific RAII wrapper, right? It's just that at least
disarm
will show the danger and not do dangerous things by default, but functionally, the same should be possible.
No test, unless we want to just call it once to make sure it doesn't vanish into thin air someday. Trying to run a test on interpreter shutdown is likely very tricky. |
Agreed! |
I tried and that's an understatement. |
Description
Do not call
PyEval_RestoreThread()
from~gil_scoped_release()
if python runtime is finalizing, as it will result in thread termination in Python runtime newer than 3.6, as documented in https://docs.python.org/3/c-api/init.html#c.PyEval_RestoreThreadSimilarly do not call
PyThreadState_DeleteCurrent
from~gil_scoped_acquire()
if runtime is finalizing.Discovered while debugging PyTorch crash using Python-3.9 described in pytorch/pytorch#47776
Suggested changelog entry:
Avoid unwanted termination if `gil_scoped_release()` is destroyed while Python runtime is finalizing