-
Notifications
You must be signed in to change notification settings - Fork 304
Fix for deadlock in python callback #3073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for deadlock in python callback #3073
Conversation
4b7cd4f to
6d3fa90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a deadlock issue in Python callbacks by ensuring proper GIL (Global Interpreter Lock) management during callback lifecycle. The fix addresses the problem of Python objects being destroyed without holding the GIL, which can cause deadlocks in multi-threaded scenarios.
Key changes:
- Updated callback signature to use
py::strinstead ofstd::stringfor better Python integration - Added custom deleters to shared pointers that acquire GIL before destroying Python objects
- Improved error handling in UTF-8 string decoding with null pointer checks
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/python/py_utils.hpp | Changed callback signature from std::string to py::str parameter |
| src/python/py_utils.cpp | Added GIL-aware custom deleters for Python callbacks and improved UTF-8 decoding error handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ffbb64 to
06d0693
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| m_total_size(total_size), m_mutable_data(mutable_data), m_torch_tensor(torch_tensor) { } | ||
|
|
||
| ~TorchTensorAllocator() { | ||
| if (m_torch_tensor && Py_IsInitialized()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_torch_tensor is always set in constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but when we're using move it may be an empty Python object as far as i understand. So it's a little bit defensive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is TorchTensorAllocator movable? It defines a constructor which should disable default move constructot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was movable before i've added destructor (constructor doesn't disable generation of default move methods). When i've added destructor it became unmovable, so i've had to specify default move constructor and copy/assign methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess copy constructor copies a ref to m_torch_generator. Copy methods should be deleted to avoid this problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that's the old behaviour so i haven't touched it. Also all python objects have ref counter, so it's okay to copy them if needed. The only potential problem i see here is that we have to explicitly get GIL lock before calling copy methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to explicitly get GIL lock before calling copy methods.
Good catch! Apply that
f6d86e1 to
12877e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
22a6979
Fix update from release branch