test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217

rwgk · 2022-10-05T17:02:39Z

Description

The bug is in the implementation of "block" in test_eigen.cpp, which returns a dangling reference.

Initial analysis provided by @hawkinsp:

We are passing a C-contiguous array, but pybind11 ends up copying it to a temporary F-contiguous array so it can pass it to Eigen. The eigen reference returned by "block" points into this temporary array, but the temporary array does not outlive the call.

e.g., this works around the problem by avoiding the temporary copy:

    rf = np.asarray(ref, order='F')
    assert np.all(m.block(rf, 2, 1, 3, 3) == ref[2:5, 1:4])

Note that this bug was discovered only after patching the numpy sources Google-internally, to turn off caching when testing with sanitizers. The valgrind tests in the pybind11 github CI use the original numpy sources, therefore valgrind is blind to this bug.

Suggested changelog entry:

rwgk · 2022-10-06T06:01:29Z

@hawkinsp, could you please review this PR, including the description?

hawkinsp · 2022-10-06T15:14:45Z

tests/test_eigen.cpp

+              // returning the Eigen::Ref returned by x.block() will lead to heap-use-after-free,
+              // because the block references the copy, which is destroyed when this function
+              // returns. Therefore the block needs to be returned by value.
+              return Eigen::MatrixXd{x.block(start_row, start_col, block_rows, block_cols)};


Something I would double check: is this test still actually testing what it is supposed to? The idea of the test is that a subblock of a larger matrix has non-trivial striding. If you copy that block, it may not: you may end up with a contiguous matrix once more.

…is).

This is achieved without * reaching into internals, * making test_eigen.cpp depend on pybind11/numpy.h.

rwgk · 2022-10-07T00:16:01Z

I want back to square one (undid the previous change).

Here is a completely different way to deal with the situation: 537574e

It's basically just @hawkinsp's original idea (PR description) slightly more fancy:

-    assert np.all(m.block(ref, 2, 1, 3, 3) == ref[2:5, 1:4])
-    assert np.all(m.block(ref, 1, 4, 4, 2) == ref[1:, 4:])
-    assert np.all(m.block(ref, 1, 4, 3, 2) == ref[1:4, 4:])
+    # Must be order="F", otherwise the type_caster will make a copy and
+    # m.block() will return a dangling reference (heap-use-after-free).
+    rof = np.asarray(ref, order="F")
+    assert np.all(m.block(rof, 2, 1, 3, 3) == rof[2:5, 1:4])
+    assert np.all(m.block(rof, 1, 4, 4, 2) == rof[1:, 4:])
+    assert np.all(m.block(rof, 1, 4, 3, 2) == rof[1:4, 4:])

I believe that's the best we can do to preserve the original intent of the test, but what if someone makes the same mistake again?

The answer is the rest of that commit. It is clearly on the crazy side, but was kind of fun to whip up and maybe entertaining / educational?

Keep or drop?

Other thoughts:

I cannot think of an easy way to defuse the time bomb that 1. the caster sometimes makes copies and 2. it is possible to return references. I fear this will continue to blow up in people's faces, unfortunately. And even if the caster didn't make copies, without keep_alive (added) it is still unsafe (I think but didn't verify).
I need the ASAN error to go away.
But I don't have the time or background to fix the inherently unsafe behavior.

rwgk · 2022-10-07T16:20:27Z

For completeness: I ran the core fix (rof in test_eigen.py) by @hawkinsp and he found it acceptable.

rwgk · 2022-10-07T16:24:40Z

I deleted the needs changelog label, thinking mentioning this fix in a test will be more distracting than helpful there. Mentioning the root problem, that it is not obvious at all that a dangling reference is created, seems inappropriate for the changelog.

rwgk force-pushed the test_eigen_asan_fix branch from df58dda to 6a0e8ca Compare October 5, 2022 17:07

rwgk mentioned this pull request Oct 5, 2022

Add Eigen::Tensor & Eigen::TensorMap support #4201

Merged

hawkinsp reviewed Oct 6, 2022

View reviewed changes

rwgk added 4 commits October 6, 2022 14:03

Disable test triggering ASAN failure (to pin-point where the problem …

38f4106

…is).

Fix unsafe "block" implementation in test_eigen.cpp

a5462ed

Undo changes (i.e. revert back to master).

2589283

Detect "type_caster for Eigen::Ref made a copy."

537574e

This is achieved without * reaching into internals, * making test_eigen.cpp depend on pybind11/numpy.h.

rwgk force-pushed the test_eigen_asan_fix branch from 2fc9229 to 537574e Compare October 7, 2022 00:00

rwgk marked this pull request as ready for review October 7, 2022 00:54

rwgk requested review from henryiii and Skylion007 October 7, 2022 00:54

Skylion007 approved these changes Oct 7, 2022

View reviewed changes

Add comment pointing to PR, for easy reference.

630f2c5

rwgk changed the title ~~WIP: test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure)~~ test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) Oct 7, 2022

rwgk merged commit 4a42156 into pybind:master Oct 7, 2022

rwgk deleted the test_eigen_asan_fix branch October 7, 2022 16:20

github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 7, 2022

rwgk removed the needs changelog Possibly needs a changelog entry label Oct 7, 2022

rwgk mentioned this pull request Feb 11, 2023

FWD pybind11 google/pybind11clif#4217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217

test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217

Uh oh!

rwgk commented Oct 5, 2022 •

edited

Loading

Uh oh!

rwgk commented Oct 6, 2022

Uh oh!

hawkinsp Oct 6, 2022

Uh oh!

rwgk commented Oct 7, 2022

Uh oh!

rwgk commented Oct 7, 2022

Uh oh!

rwgk commented Oct 7, 2022

Uh oh!

Uh oh!

test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217

test_eigen.py test_nonunit_stride_to_python bug fix (ASAN failure) #4217

Uh oh!

Conversation

rwgk commented Oct 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Suggested changelog entry:

Uh oh!

rwgk commented Oct 6, 2022

Uh oh!

hawkinsp Oct 6, 2022

Choose a reason for hiding this comment

Uh oh!

rwgk commented Oct 7, 2022

Uh oh!

rwgk commented Oct 7, 2022

Uh oh!

rwgk commented Oct 7, 2022

Uh oh!

Uh oh!

rwgk commented Oct 5, 2022 •

edited

Loading