Skip to content

int.from_bytes crashes if byteorder is a string subclass #98783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sobolevn opened this issue Oct 27, 2022 · 6 comments
Closed

int.from_bytes crashes if byteorder is a string subclass #98783

sobolevn opened this issue Oct 27, 2022 · 6 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@sobolevn
Copy link
Member

Here's a minimal reproduction:

» ./python.exe
Python 3.12.0a1+ (heads/main:bded5edd9a, Oct 27 2022, 23:29:21) [Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class SubStr(str): ...
... 
>>> int.from_bytes(b"0", SubStr("big"))
Assertion failed: (PyUnicode_CheckExact(str1)), function _PyUnicode_Equal, file Objects/unicodeobject.c, line 10447.
[1]    18466 abort      ./python.exe

I think this happes because of these lines:

cpython/Objects/unicodeobject.c

Lines 10444 to 10453 in bded5ed

int
_PyUnicode_Equal(PyObject *str1, PyObject *str2)
{
assert(PyUnicode_CheckExact(str1));
assert(PyUnicode_CheckExact(str2));
if (str1 == str2) {
return 1;
}
return unicode_compare_eq(str1, str2);
}

_PyUnicode_Equal uses assert(PyUnicode_CheckExact(...)), while many function (including int_from_bytes_impl) use PyUnicode_Check() or just parse str objects from args.

cpython/Objects/longobject.c

Lines 6167 to 6170 in bded5ed

else if (_PyUnicode_Equal(byteorder, &_Py_ID(little)))
little_endian = 1;
else if (_PyUnicode_Equal(byteorder, &_Py_ID(big)))
little_endian = 0;

Probably other functions that use _PyUnicode_Equal are also affected.

I would like to raise a question: shouldn't it be assert(PyUnicode_Check(...)) in _PyUnicode_Equal?

I would like to send a PR with the fix!

@sobolevn sobolevn added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 27, 2022
@sobolevn sobolevn self-assigned this Oct 27, 2022
@sobolevn
Copy link
Member Author

CC @sweeneyde as blame owner

@sweeneyde
Copy link
Member

I would like to raise a question: shouldn't it be assert(PyUnicode_Check(...)) in _PyUnicode_Equal?

_PyUnicode_Equal was originally added as a part of 03768c4, for use in COMPARE_OP_STR_JUMP opcode, where it would have had the incorrect behavior for string subclasses, which could have __eq__ overridden.

So it looks like that made the switch from _PyUnicode_EqualToASCIIId to _PyUnicode_Equal not an exact replacement.

Replacing the assertion seems reasonable to me, since COMPARE_OP_STR_JUMP checks exact types already, but I would want to double-check that that is equivalent to _PyUnicode_EqualToASCIIId.

@sweeneyde
Copy link
Member

Related: faster-cpython/ideas#486 would add back in a hash check for these cases, which would restore the hash check that was in _PyUnicode_EqualToASCIIId.

@sweeneyde
Copy link
Member

There also seems to be a discrepancy between whether or not _PyUnicode_Equal can give an error. I don't think it ever can now, and it seems the lines you mention assume it doesn't, but there's this comment:

/* Equality check. Returns -1 on failure. */
PyAPI_FUNC(int) _PyUnicode_Equal(PyObject *, PyObject *);

and this error check:

cpython/Python/ceval.c

Lines 3422 to 3425 in 3e07f82

int res = _PyUnicode_Equal(left, right);
if (res < 0) {
goto error;
}

At the time it was added, error checks were needed because PyUnicode_READY could fail. But since f9c9354, PyUnicode_READY is no longer necessary, and the check cannot fail.

I think it should be safe to delete the "Returns -1 on failure" comment and remove the NULL check in ceval.c.

@sobolevn
Copy link
Member Author

Other places that crash because of this:

  • int.to_bytes: the same as OP
  • type_repr:
» ./python.exe       
Python 3.12.0a1+ (heads/main:bded5edd9a, Oct 27 2022, 23:29:21) [Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class A: ...
... 
>>> class SubStr(str): ...
... 
>>> A.__module__ = SubStr('ex')
>>> A
Assertion failed: (PyUnicode_CheckExact(str1)), function _PyUnicode_Equal, file Objects/unicodeobject.c, line 10447.
[1]    21783 abort      ./python.exe
  • type_new_visit_slots and type_new_copy_slots
>>> class SubStr(str): ...
... 
>>> class A:
...    __slots__ = (SubStr('a'),)
... 
Assertion failed: (PyUnicode_CheckExact(str1)), function _PyUnicode_Equal, file Objects/unicodeobject.c, line 10447.
[1]    21853 abort      ./python.exe
  • super_getattro
  • Probably write_unraisable_exc_file, but I cannot reproduce it
  • print_exception_message

sobolevn added a commit to sobolevn/cpython that referenced this issue Oct 28, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 30, 2022
…code_Equal` (pythonGH-98806)

(cherry picked from commit 76f989d)

Co-authored-by: Nikita Sobolev <[email protected]>
sweeneyde added a commit that referenced this issue Oct 30, 2022
…icode_Equal` (GH-98806) (#98871)

* gh-98783: Fix crashes when `str` subclasses are used in `_PyUnicode_Equal` (GH-98806)
(cherry picked from commit 76f989d)

Co-authored-by: Nikita Sobolev <[email protected]>
@sweeneyde
Copy link
Member

Thanks for finding and fixing this!

I went ahead and opened #98879 about the error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

2 participants