Skip to content

Conversation

mcepl
Copy link
Contributor

@mcepl mcepl commented May 19, 2025

(backport of #133944 from 3.13, and originally from #129648)

If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().

Still I haven’t managed to fix it completely. Still failing on one test:

[  778s] ======================================================================
[  778s] FAIL: test_warning (test.test_codeop.CodeopTests.test_warning)
[  778s] ----------------------------------------------------------------------
[  778s] Traceback (most recent call last):
[  778s]   File "/home/abuild/rpmbuild/BUILD/python312-3.12.10-build/Python-3.12.10/Lib/test/test_codeop.py", line 283, in test_warning
[  778s]     with warnings_helper.check_warnings(
[  778s]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[  778s]   File "/home/abuild/rpmbuild/BUILD/python312-3.12.10-build/Python-3.12.10/Lib/contextlib.py", line 144, in __exit__
[  778s]     next(self.gen)
[  778s]   File "/home/abuild/rpmbuild/BUILD/python312-3.12.10-build/Python-3.12.10/Lib/test/support/warnings_helper.py", line 185, in _filterwarnings
[  778s]     raise AssertionError("unhandled warning %s" % reraise[0])
[  778s] AssertionError: unhandled warning {message : SyntaxWarning("'\\e' is an invalid escape sequence. "), category : 'SyntaxWarning', filename : '<input>', lineno : 1, line : None}
[  778s] 
[  778s] ----------------------------------------------------------------------
[  778s] Ran 1 test in 0.001s

Complete build log

Any idea how to fix it?

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
@bedevere-app bedevere-app bot added topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump type-security A security issue awaiting review labels May 19, 2025
@mcepl
Copy link
Contributor Author

mcepl commented May 29, 2025

Duplicate of #134337

@mcepl mcepl closed this May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants