Blocking destructors and the GIL

## Issue description

pybind11 allows releasing the GIL for pretty much any bound function, including constructors, but not for destructors. Besides a missed opportunity for optimizing GIL usage, this can easily cause deadlocks in certain situations. Whenever a destructor waits for another thread, and this thread tries to lock the GIL (because it needs to run Python code, or otherwise wants to work with Python objects), a deadlock occurs.

The sample program at the bottom demonstrates this problem. Destroying the dictionary triggers the destructor of the `Worker`, causing a deadlock more often than not. Obviously, `~Worker()` does not have to keep the GIL locked, and explicitly releasing it before calling `join()` will resolve the deadlock. However, this is not always a desirable solution, because it means inserting Python calls invasively into a codebase (basically into any destructor that may block). 

Are there any agreed upon strategies to deal with this problem?

## Possible solutions

If there isn't a common solution to this deadlock, I would like to propose a couple of options.

### delete_without_gil
Add a new option to the `class_` template, `delete_without_gil`. While deallocating objects of such classes, pybind11 will release the GIL.

[EDIT 2024-01-18: This was implemented under https://github.com/google/pywrapcc/pull/30088]

This is a straight-forward, but not a complete solution. The "blocking" property of destructors is transitive through the class' members. When pybind11 destroys an object of type `A`, but this object has a member of type `B` whose destructor blocks, `A` also has to be marked `delete_without_gil`. What's worse, if `~B()` originally starts out as non-blocking, but is later changed to be blocking, all classes that have a `B` member need to retroactively be marked `delete_without_gil`. Not to mention the case where `B` is polymorph, and someone unwittingly implements a new subclass with a blocking destructor.

In short, bindings for complex codebases may need to *always* specify `delete_without_gil` to be on the safe side.

[EDIT 2024-01-18: This is exactly how PyCLIF works. The new PyCLIF-pybind11 version will have the same behavior.]

### Always release the GIL during deallocation

This would prevent the deadlock pretty decisively, but objects holding Python objects (e.g. `pybind11::dict`) as members will have to take care to reacquire the GIL before destroying them. Furthermore, the GIL may thrash during destruction of a complex object hierarchy, introducing a performance penalty.

It may be prudent to allow toggling this option through a preprocessor flag. Bindings that require it and can live with the additional GIL overhead can enable it, while simpler modules can leave it as is.

## Reproducible example code
This sample will start a worker executing some Python code (simple print statements) in a separate thread, which it needs the GIL for. Upon destruction of the worker, the thread is joined. If, as is the case here, the worker is destroyed while the GIL is locked, a deadlock occurs.

```
#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

#include <atomic>
#include <thread>

using namespace std::chrono_literals;

// A worker that runs some Python code in a separate thread
struct Worker {

    Worker() {
        thread = std::thread([this] {
            while (keepRunning) {
                pybind11::gil_scoped_acquire gil;
                pybind11::print("Working");
                std::this_thread::sleep_for(10ms);
            }
        });
    }

    ~Worker() {
        keepRunning = false;
        if (thread.joinable()) {
            thread.join();
        }
    }

    std::thread thread;

    std::atomic<bool> keepRunning;

};

PYBIND11_EMBEDDED_MODULE(deadlock, mod) {
    pybind11::class_<Worker>(mod, "Worker");
}

int main() {
    pybind11::scoped_interpreter interpreter;
    pybind11::module::import("deadlock");
    {
        pybind11::dict dict;
        dict["worker"] = pybind11::cast(new Worker(), pybind11::return_value_policy::take_ownership);
        {
            // Let the worker run for a while
            pybind11::gil_scoped_release release;
            std::this_thread::sleep_for(100ms);
        }
    }
    // This line will rarely be reached due to a deadlock when destroying dict
    pybind11::print("No deadlock");
}

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blocking destructors and the GIL #1446

Issue description

Possible solutions

delete_without_gil

Always release the GIL during deallocation

Reproducible example code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Blocking destructors and the GIL #1446

Description

Issue description

Possible solutions

delete_without_gil

Always release the GIL during deallocation

Reproducible example code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions