-
Notifications
You must be signed in to change notification settings - Fork 223
Add (failing) tests demonstrating that errors in Buffer.close are not raised #1117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test 845fbd4 |
|
845fbd4 to
adfb7e5
Compare
|
/ok to test 0986f5e |
0986f5e to
78a4815
Compare
|
/ok to test 1d5248e |
1d5248e to
bbdbbcd
Compare
6e9c283 to
fea6f8a
Compare
| mr.close() | ||
|
|
||
|
|
||
| @pytest.mark.xfail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work here?
@pytest.mark.xfail(reason="Issue #1118", strict=True)
The important part is strict=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's good to know.
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HUGE 👎 in testing this.
| def test_error_in_close_memory_resource(ipc_memory_resource): | ||
| """Test that errors when closing a memory resource are raised.""" | ||
| mr = ipc_memory_resource | ||
| driver.cuMemPoolDestroy(mr.handle) | ||
| with pytest.raises(CUDAError, match=".*CUDA_ERROR_INVALID_VALUE.*"): | ||
| mr.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is illegal and I disagree we need to test this. This is Python and we can't possibly guard against all kinds of bizarre ways of trying to mutate the state of our Python objects behind our back. In particular, as noted in both #1074 (comment) and offline discussion, errors like CUDA_ERROR_INVALID_VALUE are due to multiple frees. I thought we've moved on?
This test is just another instance of the same class of errors: We free the handle of an object through a direct C API call, bypassing our safeguard mechanism (under the hood we do check if the handle is already null before freeing, and then after free we set the handle to null to avoid double free), so our destructor kicks in, either through an explicit close() call or implicitly when going out of scope, and causes another free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this test is asserting that different levels of APIs offer the same guarantees, which would make developing layers of APIs really really difficult.
It seems roughly analogous to calling into the Python C API through ctypes, and expecting Python to somehow know you didn't mean to cause a segmentation violation:
❯ python3.13 -q
>>> x = 1
>>> import ctypes
>>> ctypes.pythonapi.Py_DecRef(x)
zsh: segmentation fault (core dumped) python3.13 -q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess there's probably another way to write this test such that the behavior is buggy without crossing into the C abyss of naked bindings.
Would it be enough to just call close() twice? That seems like something we should perhaps be robust to if we're not already:
❯ python -q
>>> f = open('/tmp/x', 'w')
>>> f.close()
>>> f.close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a misunderstanding. The point of this test is to check that errors occurring in close are raised rather than suppressed. The exact mechanism I chose to get the driver to generate an error is beside the point, If anyone sees a simpler mechanism, please point it out.
I ran into this issue (close not raising errors) when working with the driver bug 5570902. I don't want to rely on that behavior for the test because if the driver team ever fixes the bug, the test would then break.
Let's not confuse the issue: we are not testing our robustness in the face of nonsense behavior, i.e., someone stomping around in the driver API directly. This test just does something otherwise dumb and unsupported to trigger an error, which is totally reasonable for this test.
@cpcloud our code already tolerates double-calls to close and simply ignores the second call without error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no misunderstanding. Undefined behavior should not be tested. It is just confusing and sends a wrong signal.
re: on allowing the behavior that calling .close() is allowed to raise while invoking the destructor is not, I did raise a discussion with @pciolkosz on the Tuesday meeting. It is indeed nice to have, but it is unclear semantically what we can do about the exception, and technically it does not seem possible to implement in both C++ and Python (buffer.destroy(stream) from cccl-runtime is also noexcept). I suggest we table this discussion for later, and close this PR.
| driver.cuMemFree(buffer.handle) | ||
| with pytest.raises(CUDAError, match=".*CUDA_ERROR_INVALID_VALUE.*"): | ||
| buffer.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| try: | ||
| driver.cuMemPoolDestroy(self.mr.handle) | ||
| except Exception: # noqa: S110 | ||
| pass | ||
| else: | ||
| self.mr.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
FWIW cccl-runtime, C++ stdandad library, or any high-level frameworks/libraries have the same issue. We give you the access to the underlying handle of a container, does not mean that you can free it through |
Respectfully, I think this misses the point. When implicitly closing resources via |
…ple) are not raised.
06c8d2d to
c92fb6c
Compare
Errors occurring during
Buffer.closeare not raised. This change adds tests demonstrating the issue. See #1118.