Skip to content

undefined behavior: tstate->datastack_top == NULL #96569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthiasgoergens opened this issue Sep 5, 2022 · 27 comments
Closed

undefined behavior: tstate->datastack_top == NULL #96569

matthiasgoergens opened this issue Sep 5, 2022 · 27 comments
Assignees
Labels
3.11 only security fixes 3.12 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@matthiasgoergens
Copy link
Contributor

I was chasing down some C trouble in code I had been experimenting. I used all the debug options I could find:

export CC="clang"
configure --with-assertions --with-address-sanitizer --with-trace-refs --with-undefined-behavior-sanitizer --with-pydebug
nice make -j8

For sanity checking, I ran this on current main. I got:

../../Python/pystate.c:2199:27: runtime error: applying non-zero offset 112 to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../Python/pystate.c:2199:27 in 
../../Python/pystate.c:2199:27: runtime error: applying non-zero offset 112 to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../Python/pystate.c:2199:27 in 
../../Python/pystate.c:2199:27: runtime error: applying non-zero offset 112 to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../Python/pystate.c:2199:27 in 

For a minimal reproducible example, have a look at my example PR that adds this check and fails to build:

diff --git a/Python/pystate.c b/Python/pystate.c
index a11f1622ecd..09543add9dd 100644
--- a/Python/pystate.c
+++ b/Python/pystate.c
@@ -2196,6 +2196,7 @@ _PyThreadState_PushFrame(PyThreadState *tstate, size_t size)
 {
     assert(size < INT_MAX/sizeof(PyObject *));
     PyObject **base = tstate->datastack_top;
+    assert(base != NULL);
     PyObject **top = base + size;
     if (top >= tstate->datastack_limit) {
         base = push_chunk(tstate, (int)size);

Error messages

Enter any relevant error message caused by the crash, including a core dump if there is one.

I already pasted the error message I get from the sanitizers above. Here's the error message I get from my assertion instead (and building with just sequential make):

./Programs/_freeze_module zipimport ../../Lib/zipimport.py Python/frozen_modules/zipimport.h
./_bootstrap_python ../../Programs/_freeze_module.py abc ../../Lib/abc.py Python/frozen_modules/abc.h
_bootstrap_python: ../../Python/pystate.c:2199: _PyInterpreterFrame *_PyThreadState_PushFrame(PyThreadState *, size_t): Assertion `base != NULL' failed.
make: *** [Makefile:1238: Python/frozen_modules/abc.h] Aborted (core dumped)

Your environment

I tested this on Archlinux against latest main. You can also see it in action on the failed test run for my PR on github.

@matthiasgoergens matthiasgoergens added the type-crash A hard crash of the interpreter, possibly with a core dump label Sep 5, 2022
@sweeneyde
Copy link
Member

cc @markshannon

It looks like this happens at the very first call to PyEval_EvalCode, when executing <module> for importlib.

It goes: PyImport_ImportFrozenmoduleObject > exec_code_in_module > PyEval_EvalCode > _PyEval_Vector > _PyEvalFramePushAndInit > _PyThreadState_PushFrame

@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 5, 2022

git bisect puts the finger an 42a64c0 but that one looks fairly innocuous as far as C is concerned?

42a64c03ec5c443f2a5c2ee4284622f5d1f5326c is the first bad commit
commit 42a64c03ec5c443f2a5c2ee4284622f5d1f5326c
Author: Victor Stinner <[email protected]>
Date:   Mon Jan 17 13:58:40 2022 +0100

    Revert "bpo-40066:  [Enum] update str() and format() output (GH-30582)" (GH-30632)
    
    This reverts commit acf7403f9baea3ae1119fc6b4a3298522188bf96.

 Doc/howto/enum.rst                                 |  272 +-
 Doc/library/enum.rst                               |  265 +-
 Doc/library/ssl.rst                                |    4 +-
 Lib/enum.py                                        |  603 ++--
 Lib/inspect.py                                     |   30 +-
 Lib/plistlib.py                                    |    3 +-
 Lib/re.py                                          |    2 -
 Lib/ssl.py                                         |    1 +
 Lib/test/test_enum.py                              | 2916 ++++++++++----------
 Lib/test/test_signal.py                            |    2 +-
 Lib/test/test_socket.py                            |   12 +-
 Lib/test/test_ssl.py                               |    8 +-
 Lib/test/test_unicode.py                           |    6 +-
 .../2022-01-13-11-41-24.bpo-40066.1QuVli.rst       |    2 -
 14 files changed, 2030 insertions(+), 2096 deletions(-)
 delete mode 100644 Misc/NEWS.d/next/Library/2022-01-13-11-41-24.bpo-40066.1QuVli.rst

In any case here's my bisecting log:

$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [80f5eee5a8b8720583f8dc5be3be0fda1aebba28] Hot fix
# git bisect good 80f5eee5a8b8720583f8dc5be3be0fda1aebba28
# status: waiting for bad commit, 1 good commit known
# good: [ae0a2b756255629140efcbe57fc2e714f0267aa3] bpo-44590: Lazily allocate frame objects (GH-27077)
git bisect good ae0a2b756255629140efcbe57fc2e714f0267aa3
# status: waiting for bad commit, 2 good commits known
# bad: [c9f0f41f2556aeb61a17efbaf26bc4dc15214e0d] Assertion to avoid undefined behaviour
git bisect bad c9f0f41f2556aeb61a17efbaf26bc4dc15214e0d
# bad: [c9f0f41f2556aeb61a17efbaf26bc4dc15214e0d] Assertion to avoid undefined behaviour
git bisect bad c9f0f41f2556aeb61a17efbaf26bc4dc15214e0d
# bad: [3c4abfab0d3e2a3b1e626a5eb185ad1f5436b532] Fix EncodingWarning in libregrtest (GH-31654)
git bisect bad 3c4abfab0d3e2a3b1e626a5eb185ad1f5436b532
# bad: [3c4abfab0d3e2a3b1e626a5eb185ad1f5436b532] Fix EncodingWarning in libregrtest (GH-31654)
git bisect bad 3c4abfab0d3e2a3b1e626a5eb185ad1f5436b532
# good: [ee49484c0f0d0d79e8fc40835da10b78f89ae503] [doc] Clarify MRO precedence in descriptor super binding section (GH-29539)
git bisect good ee49484c0f0d0d79e8fc40835da10b78f89ae503
# bad: [7c0914d35eaaab2f323260ba5fe8884732533888] bpo-45535: [Enum] include special dunders in dir() (GH-30677)
git bisect bad 7c0914d35eaaab2f323260ba5fe8884732533888
# good: [1cbb88736c32ac30fd530371adf53fe7554be0a5] bpo-46059: Clarify pattern-matching example in "control flow" docs (GH-30079)
git bisect good 1cbb88736c32ac30fd530371adf53fe7554be0a5
# good: [e5894ca8fd05e6a6df1033025b9093b68baa718d] bpo-46266:  Add calendar day of week constants to __all__  (GH-30412)
git bisect good e5894ca8fd05e6a6df1033025b9093b68baa718d
# good: [e34c9367f8e0068ca4bcad9fb5c2c1024d02a77d] bpo-40280: Allow to compile _testcapi as builtin module (GH-30559)
git bisect good e34c9367f8e0068ca4bcad9fb5c2c1024d02a77d
# good: [cfbde65df318eea243706ff876e5ef834c085e5f] bpo-46383: Fix signature of zoneinfo module_free function (GH-30607)
git bisect good cfbde65df318eea243706ff876e5ef834c085e5f
# bad: [d6c6e6ba739ee714e5706144853008f1eed446ba] Skip signing side-loadable MSIX for Windows (GH-30644)
git bisect bad d6c6e6ba739ee714e5706144853008f1eed446ba
# good: [7f4b69b9076bdbcea31f6ad16eb125ee99cf0175] bpo-40280: Change subprocess imports for cleaner error on wasm32 (GH-30620)
git bisect good 7f4b69b9076bdbcea31f6ad16eb125ee99cf0175
# bad: [83d544b9292870eb44f6fca37df0aa351c4ef83a] bpo-40066: [Enum] skip failing doc test (GH-30637)
git bisect bad 83d544b9292870eb44f6fca37df0aa351c4ef83a
# bad: [ad6e640f910787e73fd00f59117fbd22cdf88c78] bpo-13886: Skip PTY non-ASCII tests if readline is loaded (GH-30631)
git bisect bad ad6e640f910787e73fd00f59117fbd22cdf88c78
# bad: [42a64c03ec5c443f2a5c2ee4284622f5d1f5326c] Revert "bpo-40066:  [Enum] update str() and format() output (GH-30582)" (GH-30632)
git bisect bad 42a64c03ec5c443f2a5c2ee4284622f5d1f5326c
# first bad commit: [42a64c03ec5c443f2a5c2ee4284622f5d1f5326c] Revert "bpo-40066:  [Enum] update str() and format() output (GH-30582)" (GH-30632)

(Ignore the hot-fix one at the start..)

@sweeneyde
Copy link
Member

It looks like this is happening in the 3.11 branch @pablogsal

@sweeneyde
Copy link
Member

sweeneyde commented Sep 5, 2022

I'm suspicious of 45e62a2, where _PyEvalFramePushAndInit's call to _PyThreadState_BumpFramePointer (checks for base==NULL) got transformed into a usage of _PyThreadState_PushFrame (does not check for base==NULL).

@sweeneyde
Copy link
Member

sweeneyde commented Sep 5, 2022

One option would be to never let datastack_top be NULL, but that could interfere with the efforts at 77195cd to be compatible with greenlet and other frame-swapping extensions.

So by my shallow understanding, the easiest solution is probably just to check for NULL when pushing frames.

@sweeneyde sweeneyde changed the title Null pointer trouble undefined behavior: tstate->datastack_top == NULL Sep 5, 2022
@markshannon
Copy link
Member

Given that:

  • Not allowing datastack_top to be NULL fixes the root cause.
  • greenlets is incompatible with 3.11 anyway, and fixing it will require more than a simple code change.

I think we should go back to ensuring that datastack_top is never NULL, at least in the long term.

In the short term, we could add the NULL check for 3.11, then make the larger change (including #32303) for 3.12

@pablogsal thoughts?

@kumaraditya303
Copy link
Contributor

@markshannon The latest version of greenlet is compatible with 3.11 https://pypi.org/project/greenlet/#files

@kumaraditya303 kumaraditya303 added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.11 only security fixes 3.12 only security fixes labels Sep 5, 2022
@pablogsal
Copy link
Member

pablogsal commented Sep 5, 2022

Given that:

  • Not allowing datastack_top to be NULL fixes the root cause.
  • greenlets is incompatible with 3.11 anyway, and fixing it will require more than a simple code change.

I think we should go back to ensuring that datastack_top is never NULL, at least in the long term.

In the short term, we could add the NULL check for 3.11, then make the larger change (including #32303) for 3.12

@pablogsal thoughts?

As mentioned previously, greenlet has been released being compatible with 3.11 so whatever we do we should try to not break it.

What consequences ensuring that datastack_top is never NULL will have?

I agree that for 3.11 (and given that the last RC release is today) we should add a null check as long as we understand the consequences of that because we really don't want to learn about side effects later.

@markshannon
Copy link
Member

Is greenlet compatible with 3.11? There has been no work on it since March

@AlexWaygood
Copy link
Member

AlexWaygood commented Sep 5, 2022

Is greenlet compatible with 3.11? There has been no work on it since March

You're looking at the master branch — there were commits eleven days ago on the 1.1 branch https://github.com/python-greenlet/greenlet/tree/maint/1.1

And there was a new release on August 25, adding compatibility with 3.11: https://pypi.org/project/greenlet/

@markshannon
Copy link
Member

What consequences ensuring that datastack_top is never NULL will have?

  • No need to check for NULL, removing a whole class of potential failures.
  • Maintains both the API and the ABI (not just the stable one, the full one)
  • It will break greenlets. Probably. Who knows?

@pablogsal
Copy link
Member

pablogsal commented Sep 5, 2022

No need to check for NULL, removing a whole class of potential failures.

What potential failures are we talking about? I don't feel comfortable making a patch that we can regret later because we discover that it fails in a bunch of unknown scenarios

@markshannon
Copy link
Member

I don't feel comfortable making a patch that we can regret later because we discover that it fails in a bunch of unknown scenarios

What patch? Ensuring that datastack_top is never NULL? Or adding NULL checks.

@pablogsal
Copy link
Member

What patch? Ensuring that datastack_top is never NULL? Or adding NULL checks.

Adding NULL checks if we don't know the consequences of adding NULL checks. What you referred to:

removing a whole class of potential failures.

@markshannon
Copy link
Member

Adding NULL checks do not remove a whole class of potential failures.

Making sure that datastack_top is never NULL eliminates both NULL checks and NULL errors.

@pablogsal
Copy link
Member

I understand that, that's what I am saying that I don't feel comfortable just adding NULL checks because we don't know if we are introducing a whole class of issues or not fixing enough.

@markshannon
Copy link
Member

The NULL checks can't add any issues. If the pointer is never NULL, they are just a bit wasteful.
As for missing cases, that is trickier.

The comparison NULL + N < NULL is undefined according to the spec, even though it appears to always be false and probably is for most (all?) compilers.
So, if there are similar comparisons, then we could easily miss them.

There isn't much we can do about that, apart from better static analysis.

@pablogsal
Copy link
Member

Ok I see.

How does making sure that datastack_top is never NULL break greenlets?

@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 5, 2022

More specifically, in this case the arithmetic is already undefined.

Null + anything is undefined.

You don't even need to use the pointer in any way afterwards or compare it etc.

Undefined behaviour also 'travels backwards in time'. Ie it removes any obligation on the compiler for the entire execution, not just for what happens after the UB occurs.

(I haven't seen this instance here cause any problems in practice, but I also didn't look for problems. I stopped at getting a reproduction, and only looked into the causes a bit.)

So the potential problem is not so much that your comparison will give an unexpected result, but that an optimisting compiler will remove the entire branch that leads up to the UB (because compilers like to treat UB by just pretending the circumstances in which it occurs can never happen, and this lets them remove code, and thus win speed benchmarks.)

@markshannon
Copy link
Member

How does making sure that datastack_top is never NULL break greenlets?

Greenlets sets datastack_top to NULL.
Technically, greenlets would be breaking a CPython invariant, but the effect is the same.

@pablogsal
Copy link
Member

How confident are you adding NULL checks for 3.11 and fixing this in 3.12 properly? Are we going to find more problems around this area just because the patch is insufficient?

@markshannon
Copy link
Member

Fairly confident
datastack_top only occurs 15 or so times in the code, and I've checked them all.

@vstinner
Copy link
Member

vstinner commented Sep 6, 2022

Greenlets sets datastack_top to NULL.

Oh, that's surpising. With my change, co-authored by Miro Hrončok (@hroncok), greenlet only saves/restores datastack_top value: python-greenlet/greenlet@31ccde2

@pablogsal
Copy link
Member

pablogsal commented Sep 6, 2022

Fairly confident
datastack_top only occurs 15 or so times in the code, and I've checked them all.

Ok, let's go with the NULL check then for 3.11 unless there is a better alternative

@markshannon
Copy link
Member

Oh, that's surpising. With my change, co-authored by Miro Hrončok (@hroncok), greenlet only saves/restores datastack_top value: python-greenlet/greenlet@31ccde2

A new greenlet has a NULL framestack, so when it swaps the greenlet with the current frame, it sets the framestack to NULL. At least, that is my understanding.
python-greenlet/greenlet@31ccde2#diff-98aec0a05657427241b330f6514931482276b068dc3b15d2b4102a1871b9a9a0R628

@gvanrossum
Copy link
Member

Does this still need to stay open? @pablogsal @markshannon

@matthiasgoergens
Copy link
Contributor Author

I think we can close this, because we have merged fixes for both main (a while ago) and 3.11 (more recently).

I am closing this issue. But please feel free to reopen, if necessary.

Repository owner moved this from Todo to Done in Release and Deferred blockers 🚫 Sep 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

8 participants