3.11 regression: traceback.format_list raises UnicodeDecodeError in certain scenarios #98744

sebastinas · 2022-10-26T21:12:04Z

Bug report

Take the following piece of code:

import sys, traceback
try:
    ｗｉｄｔｈ
except:
    _, _, tb = sys.exc_info()

tblist = traceback.extract_tb(tb)
print(traceback.format_list(tblist))

With Python 3.10 and earlier versions, executing a file with this code produces:

['  File "/tmp/test.py", line 3, in <module>\n    ｗｉｄｔｈ\n']

With Python 3.11, a UnicodeDecodeError is raised instead:

Traceback (most recent call last):
  File "/tmp/test.py", line 9, in <module>
    print(traceback.format_list(tblist))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 41, in format_list
    return StackSummary.from_list(extracted_list).format()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 531, in format
    formatted_frame = self.format_frame_summary(frame_summary)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 478, in format_frame_summary
    colno = _byte_offset_to_character_offset(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/traceback.py", line 566, in _byte_offset_to_character_offset
    return len(as_utf8[:offset + 1].decode("utf-8"))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 4: unexpected end of data

Your environment

CPython versions tested on: 3.10.8 and 3.11.0
Operating system and architecture: Debian unstable, amd64

The text was updated successfully, but these errors were encountered:

sebastinas · 2022-10-27T08:25:50Z

Current main (d578aae) also has the same issue.

serhiy-storchaka · 2022-10-28T06:14:48Z

cc @ammaraskar, @pablogsal, @isidentical

mdboom · 2022-10-28T20:18:13Z

Also cc @iritkatriel (related to source positions -- in this case a source position in the middle of a UTF-8 codepoint).

isidentical · 2022-10-28T20:25:17Z

It seems like we are not mirroring the behaviour from the C implementation, which uses errors="replace" instead of the default strict. I'll try to send a PR soon. Thanks for the report @sebastinas!

mdboom · 2022-10-28T20:27:25Z

@isidentical: I was just looking at this as well, and there is also an off-by-one difference between this and the C implementation.

…Python traceback

isidentical · 2022-10-28T20:50:39Z

@mdboom you are right! That is also an issue (in the C Implementation, we do a lengths check to not to overflow; although it is not necessary here. Another problem is the caret positions which actually might differ from node's offsets). Both should now be resolved with #98824. Thanks for catching it.

…Python traceback

…98824)

…back module (pythonGH-98824). (cherry picked from commit c0f2a5e) Co-authored-by: Batuhan Taskaya <[email protected]>

…odule (#98850) Co-authored-by: Batuhan Taskaya <[email protected]>

isidentical · 2022-10-29T14:19:58Z

Should be fixed now, thanks everyone!

sebastinas added the type-bug An unexpected behavior, bug, or error label Oct 26, 2022

mdboom added the topic-unicode label Oct 28, 2022

mdboom added the needs backport to 3.11 only security fixes label Oct 28, 2022

bedevere-bot mentioned this issue Oct 28, 2022

gh-98744: Prevent column-level decoding crashes on traceback module #98824

Merged

isidentical added a commit to isidentical/cpython that referenced this issue Oct 28, 2022

pythongh-98744: Prevent column-level decoding crashes when using the …

764b30f

…Python traceback

isidentical added a commit to isidentical/cpython that referenced this issue Oct 28, 2022

pythongh-98744: Prevent column-level decoding crashes when using the …

d27eda4

…Python traceback

pablogsal pushed a commit that referenced this issue Oct 29, 2022

gh-98744: Prevent column-level decoding crashes on traceback module (#…

c0f2a5e

…98824)

bedevere-bot mentioned this issue Oct 29, 2022

[3.11] gh-98744: Prevent column-level decoding crashes on traceback module #98850

Merged

isidentical added a commit to isidentical/cpython that referenced this issue Oct 29, 2022

[3.11] pythongh-98744: Prevent column-level decoding crashes on trace…

13735ae

…back module (pythonGH-98824). (cherry picked from commit c0f2a5e) Co-authored-by: Batuhan Taskaya <[email protected]>

pablogsal pushed a commit that referenced this issue Oct 29, 2022

[3.11] gh-98744: Prevent column-level decoding crashes on traceback m…

751da28

…odule (#98850) Co-authored-by: Batuhan Taskaya <[email protected]>

isidentical closed this as completed Oct 29, 2022

mdboom mentioned this issue Nov 4, 2022

When computing the anchors on the traceback, results may be wrong if unicode chars are used #99103

Closed

picnixz added 3.11 only security fixes 3.12 only security fixes and removed needs backport to 3.11 only security fixes labels Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

3.11 regression: traceback.format_list raises UnicodeDecodeError in certain scenarios #98744

3.11 regression: traceback.format_list raises UnicodeDecodeError in certain scenarios #98744

sebastinas commented Oct 26, 2022 •

edited

Loading

sebastinas commented Oct 27, 2022

Uh oh!

serhiy-storchaka commented Oct 28, 2022

Uh oh!

mdboom commented Oct 28, 2022

Uh oh!

isidentical commented Oct 28, 2022

Uh oh!

mdboom commented Oct 28, 2022

Uh oh!

isidentical commented Oct 28, 2022 •

edited

Loading

Uh oh!

isidentical commented Oct 29, 2022

Uh oh!

Uh oh!

3.11 regression: traceback.format_list raises UnicodeDecodeError in certain scenarios #98744

3.11 regression: traceback.format_list raises UnicodeDecodeError in certain scenarios #98744

Comments

sebastinas commented Oct 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Your environment

sebastinas commented Oct 27, 2022

Uh oh!

serhiy-storchaka commented Oct 28, 2022

Uh oh!

mdboom commented Oct 28, 2022

Uh oh!

isidentical commented Oct 28, 2022

Uh oh!

mdboom commented Oct 28, 2022

Uh oh!

isidentical commented Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isidentical commented Oct 29, 2022

Uh oh!

sebastinas commented Oct 26, 2022 •

edited

Loading

isidentical commented Oct 28, 2022 •

edited

Loading