[BUG] GIL hang in multi-threaded situation

I am seeing a gil hang issue in multi-threaded situations when a worker thread initializes pybind11 (2.6.2) while operating inside of a PyGILState_Ensure/PyGILState_Release block.  The PyGILState_Release is deleting the PyThreadState and the internal tstate ends up holding a dangling pointer.

This seems to be a partially known issue within pybind11 from the following comment I see in pybind11.h:
            /* Check if the GIL was acquired using the PyGILState_* API instead (e.g. if
               calling from a Python thread). Since we use a different key, this ensures
               we don't create a new thread state and deadlock in PyEval_AcquireThread
               below. Note we don't save this state with internals.tstate, since we don't
               create it we would fail to clear it (its reference count should be > 0). */
            tstate = PyGILState_GetThisThreadState();

This works for all threads that don’t initialize pybind11 because they don’t start off with a stored tstate.

I have tried changing internals.h to store nullptr instead of tstate and this fixes the dangling pointer problems that I am seeing.  I imagine that historically saving this tstate during initialization was somehow thought of as a performance advantage, though today as gil_scoped_acquire always either creates a new tstate or calls detail::get_thread_state_unchecked(), it is unclear how any performance advantage is achieved.  If performance is a significant concern here, it would also be advisable to reduce the redundancy of calling both PyGILState_GetThisThreadState and detail::get_thread_state_unchecked.

This dangling tstate problem can happen in both officially supported and not officially supported situations.  As an officially supported example would be far more complicated, I am going to start with a non-officially supported example.  If necessary, examples can be made using multiple pyd’s and using the python executable rather than embedding this – though I see no reason to make life more difficult for everyone this way.

`#include <Python.h>
#include <iostream>
#include <pybind11/pybind11.h>
#include <thread>

namespace py = pybind11;

void causeHang()
{
    auto state = PyGILState_Ensure();
    {
        py::gil_scoped_acquire thisHangsTheSecondTime;
    }
    PyGILState_Release(state);
}

void runIt()
{
    for (size_t i = 0; i < 100; ++i)
    {
        std::cout << "trying " << i << std::endl;
        causeHang();
    }
}

int main()
{
    Py_Initialize();
    PyEval_InitThreads();
    auto saved = PyEval_SaveThread();
    std::thread(runIt).join();
    PyEval_RestoreThread(saved);
    Py_Finalize();
}
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] GIL hang in multi-threaded situation #2888

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] GIL hang in multi-threaded situation #2888

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions