-
Notifications
You must be signed in to change notification settings - Fork 2.2k
test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
df58dda
to
6a0e8ca
Compare
@hawkinsp, could you please review this PR, including the description? |
tests/test_eigen.cpp
Outdated
// returning the Eigen::Ref returned by x.block() will lead to heap-use-after-free, | ||
// because the block references the copy, which is destroyed when this function | ||
// returns. Therefore the block needs to be returned by value. | ||
return Eigen::MatrixXd{x.block(start_row, start_col, block_rows, block_cols)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I would double check: is this test still actually testing what it is supposed to? The idea of the test is that a subblock of a larger matrix has non-trivial striding. If you copy that block, it may not: you may end up with a contiguous matrix once more.
This is achieved without * reaching into internals, * making test_eigen.cpp depend on pybind11/numpy.h.
2fc9229
to
537574e
Compare
I want back to square one (undid the previous change). Here is a completely different way to deal with the situation: 537574e It's basically just @hawkinsp's original idea (PR description) slightly more fancy: - assert np.all(m.block(ref, 2, 1, 3, 3) == ref[2:5, 1:4])
- assert np.all(m.block(ref, 1, 4, 4, 2) == ref[1:, 4:])
- assert np.all(m.block(ref, 1, 4, 3, 2) == ref[1:4, 4:])
+ # Must be order="F", otherwise the type_caster will make a copy and
+ # m.block() will return a dangling reference (heap-use-after-free).
+ rof = np.asarray(ref, order="F")
+ assert np.all(m.block(rof, 2, 1, 3, 3) == rof[2:5, 1:4])
+ assert np.all(m.block(rof, 1, 4, 4, 2) == rof[1:, 4:])
+ assert np.all(m.block(rof, 1, 4, 3, 2) == rof[1:4, 4:]) I believe that's the best we can do to preserve the original intent of the test, but what if someone makes the same mistake again? The answer is the rest of that commit. It is clearly on the crazy side, but was kind of fun to whip up and maybe entertaining / educational? Keep or drop? Other thoughts:
|
For completeness: I ran the core fix ( |
I deleted the needs changelog label, thinking mentioning this fix in a test will be more distracting than helpful there. Mentioning the root problem, that it is not obvious at all that a dangling reference is created, seems inappropriate for the changelog. |
Description
The bug is in the implementation of "block" in test_eigen.cpp, which returns a dangling reference.
Initial analysis provided by @hawkinsp:
We are passing a C-contiguous array, but pybind11 ends up copying it to a temporary F-contiguous array so it can pass it to Eigen. The eigen reference returned by "block" points into this temporary array, but the temporary array does not outlive the call.
e.g., this works around the problem by avoiding the temporary copy:
Note that this bug was discovered only after patching the numpy sources Google-internally, to turn off caching when testing with sanitizers. The valgrind tests in the pybind11 github CI use the original numpy sources, therefore valgrind is blind to this bug.
Suggested changelog entry: