Skip to content

[BUG] GIL hang in multi-threaded situation #2888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
daltairwalter opened this issue Mar 4, 2021 · 7 comments
Open

[BUG] GIL hang in multi-threaded situation #2888

daltairwalter opened this issue Mar 4, 2021 · 7 comments

Comments

@daltairwalter
Copy link

I am seeing a gil hang issue in multi-threaded situations when a worker thread initializes pybind11 (2.6.2) while operating inside of a PyGILState_Ensure/PyGILState_Release block. The PyGILState_Release is deleting the PyThreadState and the internal tstate ends up holding a dangling pointer.

This seems to be a partially known issue within pybind11 from the following comment I see in pybind11.h:
/* Check if the GIL was acquired using the PyGILState_* API instead (e.g. if
calling from a Python thread). Since we use a different key, this ensures
we don't create a new thread state and deadlock in PyEval_AcquireThread
below. Note we don't save this state with internals.tstate, since we don't
create it we would fail to clear it (its reference count should be > 0). */
tstate = PyGILState_GetThisThreadState();

This works for all threads that don’t initialize pybind11 because they don’t start off with a stored tstate.

I have tried changing internals.h to store nullptr instead of tstate and this fixes the dangling pointer problems that I am seeing. I imagine that historically saving this tstate during initialization was somehow thought of as a performance advantage, though today as gil_scoped_acquire always either creates a new tstate or calls detail::get_thread_state_unchecked(), it is unclear how any performance advantage is achieved. If performance is a significant concern here, it would also be advisable to reduce the redundancy of calling both PyGILState_GetThisThreadState and detail::get_thread_state_unchecked.

This dangling tstate problem can happen in both officially supported and not officially supported situations. As an officially supported example would be far more complicated, I am going to start with a non-officially supported example. If necessary, examples can be made using multiple pyd’s and using the python executable rather than embedding this – though I see no reason to make life more difficult for everyone this way.

`#include <Python.h>
#include
#include <pybind11/pybind11.h>
#include

namespace py = pybind11;

void causeHang()
{
auto state = PyGILState_Ensure();
{
py::gil_scoped_acquire thisHangsTheSecondTime;
}
PyGILState_Release(state);
}

void runIt()
{
for (size_t i = 0; i < 100; ++i)
{
std::cout << "trying " << i << std::endl;
causeHang();
}
}

int main()
{
Py_Initialize();
PyEval_InitThreads();
auto saved = PyEval_SaveThread();
std::thread(runIt).join();
PyEval_RestoreThread(saved);
Py_Finalize();
}
`

@daltairwalter daltairwalter changed the title GIL hang in multi-threaded situation [BUG] GIL hang in multi-threaded situation Mar 10, 2021
@Skylion007
Copy link
Collaborator

@daltairwalter This situation should be improved in the latest PyBind11 release

@daltairwalter
Copy link
Author

I will test again with 2.8.0, but I didn't see any changes that I expect to make a difference. It still looks like the tstate that is stored in the TLS during the initial creation of internals is a dangling pointer.

If there is a specific change with this, I would be happy to look at it.

@Skylion007
Copy link
Collaborator

Was referring to this PR: #3237 and it's subsequent PRs, although it sounds like it may not solve this issue.

@jonatanklosko
Copy link

We are embedding Python and we have a few threads that we re-use for code evaluations. We were wrapping each evaluation in PyGILState_Ensure and PyGILState_Release, which I believe is a correct way to do this. We run into a segfault while using matplotlib and it comes from here:

PyEval_AcquireThread(tstate);

I believe this matches what @daltairwalter described. During the first PyGILState_Ensure/PyGILState_Release cycle, gil_scoped_acquire stores the thread state, which is then freed with PyGILState_Release, leaving a dangling pointer. On the next cycle gil_scoped_acquire re-uses the pointer, leading to a segfault.

The workaround we ended up is to create a thread state for each of our threads using PyThreadState_New. Then we wrap each evaluation in PyEval_RestoreThread and PyEval_SaveThread instead. This way each thread has a permanent state, so the pointer that pybind keeps is always valid.

@tonybaloney
Copy link

I have the exact same crash with several pybind11 packages when using CPython as an embedded library as @jonatanklosko explained.

Debugging this, pybind11 is calling PyEval_RestoreThread with an invalid pointer. From the gil_scoped_acquire RAII class.

* thread #1, name = 'QuickSnakesDemo', stop reason = signal SIGSEGV: invalid address (fault address: 0xddddddddddddde45)
    frame #0: 0x0000ffbef2d21610 libpython3.12d.so`_PyThreadState_MustExit at pycore_interp.h:243
(lldb) up
frame #1: 0x0000ffbef2d2160c libpython3.12d.so`_PyThreadState_MustExit(tstate=0x0000aaaaaac71230) at pystate.c:3053
(lldb) up
frame #2: 0x0000ffbef2cdd5c0 libpython3.12d.so`take_gil(tstate=0x0000aaaaaac71230) at ceval_gil.c:346
(lldb) fr v
(PyThreadState *) tstate = 0x0000aaaaaac71230

(lldb) up (a few times)
frame #12: 0x0000ffbef2b25ba4 libpython3.12d.so`_PyObject_MakeTpCall(tstate=0x0000aaaaaad0eca0, callable=0x0000ffbf0079ef90, args=0x0000fffff714c090, nargs=<unavailable>, keywords=0x0000000000000000) at call.c:240
(lldb) fr v
(PyThreadState *) tstate = 0x0000aaaaaad0eca0
(PyObject *) callable = 0x0000ffbf0079ef90
(PyObject *const *) args = 0x0000fffff714c090
(Py_ssize_t) nargs = <Could not evaluate DW_OP_GNU_entry_value.>

PyGILState_Release will only call _PyThreadState_SwapNoGIL(NULL) if the gil state counter is beyond 0.
PyBind11 is looking for this value, otherwise following the dangling pointer. I don't think you'll see this in non-embedded scenarios because the main process thread will still hold the GIL

@daltairwalter
Copy link
Author

When I originally created this ticket, it was possible to make this happen in non-embedded scenarios. I think it required a well behaved non-pybind11 pyd that did specific legitimate things in order to make this fail though.

@daltairwalter
Copy link
Author

As stated in the initial message, there is a relatively straightforward fix for this in pybind11. In internals.h:

Just remove the line that stores the initial tstate that is dangling:
PYBIND11_TLS_REPLACE_VALUE(internals_ptr->tstate, tstate);

The performance cost for this is that pybind11 would perform two TLS lookups in gil acquisition instead of just one for the cases where the main thread is being used. If TLS lookups are really expensive and this main thread is really the primary use case, it would also be possible to reorder the TLS lookups in gil acquire.

It would be nice to get a fix for this into pybind11, but the process for this seems fairly difficult from what I have seen with the smart holder branch and other important features - I am not sure where the process would even begin for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants