Skip to content

test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 7, 2022

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Oct 5, 2022

Description

The bug is in the implementation of "block" in test_eigen.cpp, which returns a dangling reference.

Initial analysis provided by @hawkinsp:


We are passing a C-contiguous array, but pybind11 ends up copying it to a temporary F-contiguous array so it can pass it to Eigen. The eigen reference returned by "block" points into this temporary array, but the temporary array does not outlive the call.

e.g., this works around the problem by avoiding the temporary copy:

    rf = np.asarray(ref, order='F')
    assert np.all(m.block(rf, 2, 1, 3, 3) == ref[2:5, 1:4])

Note that this bug was discovered only after patching the numpy sources Google-internally, to turn off caching when testing with sanitizers. The valgrind tests in the pybind11 github CI use the original numpy sources, therefore valgrind is blind to this bug.

Suggested changelog entry:

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 6, 2022

@hawkinsp, could you please review this PR, including the description?

// returning the Eigen::Ref returned by x.block() will lead to heap-use-after-free,
// because the block references the copy, which is destroyed when this function
// returns. Therefore the block needs to be returned by value.
return Eigen::MatrixXd{x.block(start_row, start_col, block_rows, block_cols)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I would double check: is this test still actually testing what it is supposed to? The idea of the test is that a subblock of a larger matrix has non-trivial striding. If you copy that block, it may not: you may end up with a contiguous matrix once more.

@rwgk rwgk force-pushed the test_eigen_asan_fix branch from 2fc9229 to 537574e Compare October 7, 2022 00:00
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 7, 2022

I want back to square one (undid the previous change).

Here is a completely different way to deal with the situation: 537574e

It's basically just @hawkinsp's original idea (PR description) slightly more fancy:

-    assert np.all(m.block(ref, 2, 1, 3, 3) == ref[2:5, 1:4])
-    assert np.all(m.block(ref, 1, 4, 4, 2) == ref[1:, 4:])
-    assert np.all(m.block(ref, 1, 4, 3, 2) == ref[1:4, 4:])
+    # Must be order="F", otherwise the type_caster will make a copy and
+    # m.block() will return a dangling reference (heap-use-after-free).
+    rof = np.asarray(ref, order="F")
+    assert np.all(m.block(rof, 2, 1, 3, 3) == rof[2:5, 1:4])
+    assert np.all(m.block(rof, 1, 4, 4, 2) == rof[1:, 4:])
+    assert np.all(m.block(rof, 1, 4, 3, 2) == rof[1:4, 4:])

I believe that's the best we can do to preserve the original intent of the test, but what if someone makes the same mistake again?

The answer is the rest of that commit. It is clearly on the crazy side, but was kind of fun to whip up and maybe entertaining / educational?

Keep or drop?

Other thoughts:

  • I cannot think of an easy way to defuse the time bomb that 1. the caster sometimes makes copies and 2. it is possible to return references. I fear this will continue to blow up in people's faces, unfortunately. And even if the caster didn't make copies, without keep_alive (added) it is still unsafe (I think but didn't verify).

  • I need the ASAN error to go away.

  • But I don't have the time or background to fix the inherently unsafe behavior.

@rwgk rwgk marked this pull request as ready for review October 7, 2022 00:54
@rwgk rwgk requested review from henryiii and Skylion007 October 7, 2022 00:54
@rwgk rwgk changed the title WIP: test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) Oct 7, 2022
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 7, 2022

For completeness: I ran the core fix (rof in test_eigen.py) by @hawkinsp and he found it acceptable.

@rwgk rwgk merged commit 4a42156 into pybind:master Oct 7, 2022
@rwgk rwgk deleted the test_eigen_asan_fix branch October 7, 2022 16:20
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 7, 2022
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Oct 7, 2022
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 7, 2022

I deleted the needs changelog label, thinking mentioning this fix in a test will be more distracting than helpful there. Mentioning the root problem, that it is not obvious at all that a dangling reference is created, seems inappropriate for the changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants