-
Notifications
You must be signed in to change notification settings - Fork 915
Closed as not planned
Closed as not planned
Copy link
Description
I've been seeing some SEGV errors when calling TensorFlow ops through PyO3. I haven't been able to find a solid pattern for when they occur. As far as I can tell it's pretty random, although sometimes I can find a sweet spot by rearranging or splitting up some calls.
The backtrace consistently starts with the following frames:
#0 0x00007f9d45b5e55f in PyObject_GC_UnTrack () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
#1 0x00007f9c9affb8df in EagerTensor_dealloc () from /usr/local/lib/python3.8/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#2 0x00005555b0f8b27f in pyo3::ffi::object::Py_DECREF (op=0x7f9c8dfe3b40) at /opt/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/pyo3-0.13.2/src/ffi/object.rs:825
#3 0x00005555b0f83396 in pyo3::gil::ReferencePool::update_counts (self=0x5555b1a65eb0 <pyo3::gil::POOL>, _py=...)
at /opt/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/pyo3-0.13.2/src/gil.rs:357
#4 0x00005555b0f834c9 in pyo3::gil::GILPool::new () at /opt/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/pyo3-0.13.2/src/gil.rs:386
#5 0x00005555b0f82cff in pyo3::gil::GILGuard::acquire () at /opt/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/pyo3-0.13.2/src/gil.rs:267
#6 0x00005555b0f83b99 in pyo3::gil::ensure_gil () at /opt/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/pyo3-0.13.2/src/gil.rs:490
#7 0x00005555b00e1be1 in pyo3::python::Python::with_gil (f=...) at /opt/.cargo/registry/src/git.colasdn.top-1ecc6299db9ec823/pyo3-0.13.2/src/python.rs:157
For reference, that EagerTensor_dealloc function is defined here:
// tp_dealloc for EagerTensor.
void EagerTensor_dealloc(EagerTensor* self) {
// Unhook the object from python's GC so that the weakref deleter doesn't
// try to re-delete this.
PyObject_GC_UnTrack((PyObject*)self);
// Clear weak references to self.
// Needs to happen before any actual destruction.
PyObject_ClearWeakRefs((PyObject*)self);
Py_DECREF(self->handle_data);
Py_DECREF(self->tensor_shape);
// If an attribute dictionary has been created, release it. Note that this
// is only ever created by CPython's attribute setting methods; we don't
// create it ourselves.
Py_CLEAR(self->dict);
if (self->handle != nullptr) {
TFE_DeleteTensorHandle(self->handle);
self->handle = nullptr;
}
// Decref context after deleting the tensor handle.
Py_XDECREF(self->context);
// We have the global interpreter lock, so use this chance to perform delayed
// refcount decrements.
tensorflow::ClearDecrefCache();
auto id = self->id;
Py_TYPE(self)->tp_free(self);
TFE_Py_TapeSetDeleteTrace(id);
}This may not be the right spot to file this issue, so don't feel obligated to help with this if it doesn't seem related to PyO3. I just thought I'd check with you guys to see if these snippets raise any red flags.
Environment
Docker image: nvidia/cuda11.0-base-ubuntu20.04
Python 3.8.5
PyO3 v0.13.2
Reactions are currently unavailable