Skip to content

3.11 regression: traceback.format_list raises UnicodeDecodeError in certain scenarios #98744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sebastinas opened this issue Oct 26, 2022 · 7 comments
Labels
3.11 only security fixes 3.12 only security fixes topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@sebastinas
Copy link

sebastinas commented Oct 26, 2022

Bug report

Take the following piece of code:

import sys, traceback
try:
    width
except:
    _, _, tb = sys.exc_info()

tblist = traceback.extract_tb(tb)
print(traceback.format_list(tblist))

With Python 3.10 and earlier versions, executing a file with this code produces:

['  File "/tmp/test.py", line 3, in <module>\n    width\n']

With Python 3.11, a UnicodeDecodeError is raised instead:

Traceback (most recent call last):
  File "/tmp/test.py", line 9, in <module>
    print(traceback.format_list(tblist))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 41, in format_list
    return StackSummary.from_list(extracted_list).format()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 531, in format
    formatted_frame = self.format_frame_summary(frame_summary)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 478, in format_frame_summary
    colno = _byte_offset_to_character_offset(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 566, in _byte_offset_to_character_offset
    return len(as_utf8[:offset + 1].decode("utf-8"))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 4: unexpected end of data

Your environment

  • CPython versions tested on: 3.10.8 and 3.11.0
  • Operating system and architecture: Debian unstable, amd64
@sebastinas sebastinas added the type-bug An unexpected behavior, bug, or error label Oct 26, 2022
@sebastinas
Copy link
Author

Current main (d578aae) also has the same issue.

@serhiy-storchaka
Copy link
Member

cc @ammaraskar, @pablogsal, @isidentical

@mdboom
Copy link
Contributor

mdboom commented Oct 28, 2022

Also cc @iritkatriel (related to source positions -- in this case a source position in the middle of a UTF-8 codepoint).

@isidentical
Copy link
Member

It seems like we are not mirroring the behaviour from the C implementation, which uses errors="replace" instead of the default strict. I'll try to send a PR soon. Thanks for the report @sebastinas!

@mdboom
Copy link
Contributor

mdboom commented Oct 28, 2022

@isidentical: I was just looking at this as well, and there is also an off-by-one difference between this and the C implementation.

@isidentical
Copy link
Member

isidentical commented Oct 28, 2022

@mdboom you are right! That is also an issue (in the C Implementation, we do a lengths check to not to overflow; although it is not necessary here. Another problem is the caret positions which actually might differ from node's offsets). Both should now be resolved with #98824. Thanks for catching it.

isidentical added a commit to isidentical/cpython that referenced this issue Oct 28, 2022
isidentical added a commit to isidentical/cpython that referenced this issue Oct 29, 2022
…back module (pythonGH-98824).

(cherry picked from commit c0f2a5e)

Co-authored-by: Batuhan Taskaya <[email protected]>
pablogsal pushed a commit that referenced this issue Oct 29, 2022
@isidentical
Copy link
Member

Should be fixed now, thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 only security fixes topic-unicode type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants