Skip to content

Not closing an f-string leads to a use-after-free #103718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lysnikolaou opened this issue Apr 23, 2023 · 1 comment
Closed

Not closing an f-string leads to a use-after-free #103718

lysnikolaou opened this issue Apr 23, 2023 · 1 comment
Assignees
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@lysnikolaou
Copy link
Member

lysnikolaou commented Apr 23, 2023

Bug report

Not closing an f-string in the REPL or a file leads to a use-after-free. This had to do with how f-string buffers are updated when in need of reallocating more space for the tokenizer buffer and it was introduced in 1ef61cf. Here's an example (this only fails with address sanitizer enabled):

❯ ./python.exe 
Python 3.12.0a7+ (heads/fix-updating-fstring-buffers-tok:0056701aa3, Apr 23 2023, 11:12:49) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> f"
... 
=================================================================
==8991==ERROR: AddressSanitizer: heap-use-after-free on address 0x000104313550 at pc 0x0001006c6d98 bp 0x00016fa28e70 sp 0x00016fa28e68
READ of size 1 at 0x000104313550 thread T0
    #0 0x1006c6d94 in unicode_decode_utf8 unicodeobject.c:4526
    #1 0x1006cc0bc in PyUnicode_DecodeUTF8 unicodeobject.c:4431
    #2 0x10050f1d8 in _syntaxerror_range tokenizer.c:1252
    #3 0x10050da08 in syntaxerror tokenizer.c:1294
    #4 0x100500bbc in _PyTokenizer_Get tokenizer.c:2639
    #5 0x1003d6810 in _PyPegen_fill_token pegen.c:201
    #6 0x100475738 in fstring_replacement_field_rule parser.c:15626
    #7 0x1003eeb98 in fstring_rule parser.c:1334
    #8 0x10044b1fc in strings_rule parser.c:15962
    #9 0x10041b00c in atom_rule parser.c:14388
    #10 0x100427ee4 in t_primary_rule parser.c:18429
    #11 0x1004ef890 in single_subscript_attribute_target_rule parser.c:18319
    #12 0x1004e75a8 in _tmp_13_rule parser.c:25515
    #13 0x1004d8a34 in simple_stmt_rule parser.c:1730
    #14 0x1003f4140 in simple_stmts_rule parser.c:1625
    #15 0x1003e81f4 in _PyPegen_parse parser.c:41238
    #16 0x1003da7b4 in _PyPegen_run_parser pegen.c:825
    #17 0x1003dae14 in _PyPegen_run_parser_from_file_pointer pegen.c:897
    #18 0x1004fdc6c in _PyParser_ASTFromFile peg_api.c:26
    #19 0x1008f0c74 in PyRun_InteractiveOneObjectEx pythonrun.c:240
    #20 0x1008efa04 in _PyRun_InteractiveLoopObject pythonrun.c:137
    #21 0x1008ef6e8 in _PyRun_AnyFileObject pythonrun.c:72
    #22 0x1008f0920 in PyRun_AnyFileExFlags pythonrun.c:104
    #23 0x100943acc in Py_RunMain main.c:689
    #24 0x1009448d4 in pymain_main main.c:719
    #25 0x100944b74 in Py_BytesMain main.c:743
    #26 0x1003d5b7c in main python.c:15
    #27 0x18e8dff24  (<unknown module>)

0x000104313550 is located 16 bytes inside of 28-byte region [0x000104313540,0x00010431355c)
freed by thread T0 here:
    #0 0x101c070ec in wrap_realloc+0x9c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x430ec) (BuildId: f0a7ac5c49bc3abc851181b6f92b308a32000000200000000100000000000b00)
    #1 0x10065213c in _PyMem_RawRealloc obmalloc.c:64
    #2 0x100653cd0 in _PyMem_DebugRawRealloc obmalloc.c:1957
    #3 0x1006544e4 in _PyMem_DebugRealloc obmalloc.c:2045
    #4 0x100654afc in PyMem_Realloc obmalloc.c:609
    #5 0x10050e950 in tok_reserve_buf tokenizer.c:480
    #6 0x10050c3c4 in tok_nextc tokenizer.c:1198
    #7 0x100500490 in _PyTokenizer_Get tokenizer.c:2639
    #8 0x1003d6810 in _PyPegen_fill_token pegen.c:201
    #9 0x100475738 in fstring_replacement_field_rule parser.c:15626
    #10 0x1003eeb98 in fstring_rule parser.c:1334
    #11 0x10044b1fc in strings_rule parser.c:15962
    #12 0x10041b00c in atom_rule parser.c:14388
    #13 0x100427ee4 in t_primary_rule parser.c:18429
    #14 0x1004ef890 in single_subscript_attribute_target_rule parser.c:18319
    #15 0x1004e75a8 in _tmp_13_rule parser.c:25515
    #16 0x1004d8a34 in simple_stmt_rule parser.c:1730
    #17 0x1003f4140 in simple_stmts_rule parser.c:1625
    #18 0x1003e81f4 in _PyPegen_parse parser.c:41238
    #19 0x1003da7b4 in _PyPegen_run_parser pegen.c:825
    #20 0x1003dae14 in _PyPegen_run_parser_from_file_pointer pegen.c:897
    #21 0x1004fdc6c in _PyParser_ASTFromFile peg_api.c:26
    #22 0x1008f0c74 in PyRun_InteractiveOneObjectEx pythonrun.c:240
    #23 0x1008efa04 in _PyRun_InteractiveLoopObject pythonrun.c:137
    #24 0x1008ef6e8 in _PyRun_AnyFileObject pythonrun.c:72
    #25 0x1008f0920 in PyRun_AnyFileExFlags pythonrun.c:104
    #26 0x100943acc in Py_RunMain main.c:689
    #27 0x1009448d4 in pymain_main main.c:719
    #28 0x100944b74 in Py_BytesMain main.c:743
    #29 0x1003d5b7c in main python.c:15

previously allocated by thread T0 here:
    #0 0x101c06e68 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x42e68) (BuildId: f0a7ac5c49bc3abc851181b6f92b308a32000000200000000100000000000b00)
    #1 0x1006520f8 in _PyMem_RawMalloc obmalloc.c:42
    #2 0x100654144 in _PyMem_DebugMalloc obmalloc.c:2022
    #3 0x100654a04 in PyMem_Malloc obmalloc.c:587
    #4 0x10050b620 in tok_nextc tokenizer.c:1198
    #5 0x100504848 in tok_get_normal_mode tokenizer.c:1619
    #6 0x1005006ac in _PyTokenizer_Get tokenizer.c:2639
    #7 0x1003d6810 in _PyPegen_fill_token pegen.c:201
    #8 0x1003e71f4 in _PyPegen_parse parser.c:41238
    #9 0x1003da7b4 in _PyPegen_run_parser pegen.c:825
    #10 0x1003dae14 in _PyPegen_run_parser_from_file_pointer pegen.c:897
    #11 0x1004fdc6c in _PyParser_ASTFromFile peg_api.c:26
    #12 0x1008f0c74 in PyRun_InteractiveOneObjectEx pythonrun.c:240
    #13 0x1008efa04 in _PyRun_InteractiveLoopObject pythonrun.c:137
    #14 0x1008ef6e8 in _PyRun_AnyFileObject pythonrun.c:72
    #15 0x1008f0920 in PyRun_AnyFileExFlags pythonrun.c:104
    #16 0x100943acc in Py_RunMain main.c:689
    #17 0x1009448d4 in pymain_main main.c:719
    #18 0x100944b74 in Py_BytesMain main.c:743
    #19 0x1003d5b7c in main python.c:15
    #20 0x18e8dff24  (<unknown module>)

SUMMARY: AddressSanitizer: heap-use-after-free unicodeobject.c:4526 in unicode_decode_utf8
Shadow bytes around the buggy address:
  0x007020882650: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x007020882660: fa fa fa fa fa fa fa fa fa fa fa fa 00 00 00 05
  0x007020882670: fa fa 00 00 00 05 fa fa fd fd fd fd fa fa fd fd
  0x007020882680: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
  0x007020882690: fd fd fd fd fa fa 00 00 00 00 fa fa 00 00 00 00
=>0x0070208826a0: fa fa fd fd fd fd fa fa fd fd[fd]fd fa fa fd fd
  0x0070208826b0: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
  0x0070208826c0: fd fd fd fd fa fa 00 00 00 00 fa fa 00 00 00 06
  0x0070208826d0: fa fa 00 00 00 03 fa fa 00 00 02 fa fa fa 00 00
  0x0070208826e0: 00 00 fa fa 00 00 02 fa fa fa 00 00 02 fa fa fa
  0x0070208826f0: 00 00 02 fa fa fa 00 00 02 fa fa fa 00 00 00 01
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==8991==ABORTING
zsh: abort      ./python.exe

Linked PRs

@lysnikolaou lysnikolaou added the type-bug An unexpected behavior, bug, or error label Apr 23, 2023
lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Apr 23, 2023
lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Apr 23, 2023
@lysnikolaou lysnikolaou added type-crash A hard crash of the interpreter, possibly with a core dump and removed type-bug An unexpected behavior, bug, or error labels Apr 23, 2023
@lysnikolaou lysnikolaou self-assigned this Apr 23, 2023
@pablogsal pablogsal reopened this Apr 24, 2023
@pablogsal
Copy link
Member

Reopening this as is still not fixed:

>>> f"{
... 1
... +
... 2
... }
=================================================================
==25280==ERROR: AddressSanitizer: heap-use-after-free on address 0x0001079165d0 at pc 0x00010471807c bp 0x00016b9392b0 sp 0x00016b9392a8
READ of size 1 at 0x0001079165d0 thread T0
    #0 0x104718078 in unicode_decode_utf8 unicodeobject.c:4526
    #1 0x1045678a4 in _syntaxerror_range tokenizer.c:1266
    #2 0x104565be0 in syntaxerror tokenizer.c:1308
    #3 0x104555208 in _PyTokenizer_Get tokenizer.c:2657
    #4 0x1044c8998 in _PyPegen_fill_token pegen.c:201
    #5 0x10451a2c8 in fstring_replacement_field_rule parser.c:15626
    #6 0x1044dc3f8 in fstring_rule parser.c:1334
    #7 0x10450cb88 in strings_rule parser.c:15962
    #8 0x1044f6734 in atom_rule parser.c:14388
    #9 0x104501a68 in t_primary_rule parser.c:18429
    #10 0x10454ccf0 in single_subscript_attribute_target_rule parser.c:18319
    #11 0x104541cb0 in simple_stmt_rule parser.c:1730
    #12 0x1044e6cec in simple_stmts_rule parser.c:1625
    #13 0x1044d9d90 in _PyPegen_parse parser.c:41238
    #14 0x1044cc328 in _PyPegen_run_parser pegen.c:825
    #15 0x1044cc94c in _PyPegen_run_parser_from_file_pointer pegen.c:897
    #16 0x104961ef4 in PyRun_InteractiveOneObjectEx pythonrun.c:240
    #17 0x104960cf4 in _PyRun_InteractiveLoopObject pythonrun.c:137
    #18 0x104960a18 in _PyRun_AnyFileObject pythonrun.c:72
    #19 0x104961ad4 in PyRun_AnyFileExFlags pythonrun.c:104
    #20 0x1049b0124 in Py_RunMain main.c:689
    #21 0x1049b10a0 in pymain_main main.c:719
    #22 0x1049b15cc in Py_BytesMain main.c:743
    #23 0x1b161fe4c  (<unknown module>)

0x0001079165d0 is located 0 bytes inside of 5-byte region [0x0001079165d0,0x0001079165d5)
freed by thread T0 here:
    #0 0x105936de4 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ede4)
    #1 0x104563fec in tok_nextc tokenizer.c:1212
    #2 0x1045587fc in tok_get_normal_mode tokenizer.c:1633
    #3 0x1045551ec in _PyTokenizer_Get tokenizer.c:2657
    #4 0x1044c8998 in _PyPegen_fill_token pegen.c:201
    #5 0x10450de54 in yield_expr_rule parser.c:10831
    #6 0x10451a4bc in fstring_replacement_field_rule parser.c:15650
    #7 0x1044dc3f8 in fstring_rule parser.c:1334
    #8 0x10450cb88 in strings_rule parser.c:15962
    #9 0x1044f6734 in atom_rule parser.c:14388
    #10 0x104501a68 in t_primary_rule parser.c:18429
    #11 0x10454ccf0 in single_subscript_attribute_target_rule parser.c:18319
    #12 0x104541cb0 in simple_stmt_rule parser.c:1730
    #13 0x1044e6cec in simple_stmts_rule parser.c:1625
    #14 0x1044d9d90 in _PyPegen_parse parser.c:41238
    #15 0x1044cc328 in _PyPegen_run_parser pegen.c:825
    #16 0x1044cc94c in _PyPegen_run_parser_from_file_pointer pegen.c:897
    #17 0x104961ef4 in PyRun_InteractiveOneObjectEx pythonrun.c:240
    #18 0x104960cf4 in _PyRun_InteractiveLoopObject pythonrun.c:137
    #19 0x104960a18 in _PyRun_AnyFileObject pythonrun.c:72
    #20 0x104961ad4 in PyRun_AnyFileExFlags pythonrun.c:104
    #21 0x1049b0124 in Py_RunMain main.c:689
    #22 0x1049b10a0 in pymain_main main.c:719
    #23 0x1049b15cc in Py_BytesMain main.c:743
    #24 0x1b161fe4c  (<unknown module>)

previously allocated by thread T0 here:
    #0 0x105936ca8 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3eca8)
    #1 0x1045636ec in tok_nextc tokenizer.c:1212
    #2 0x1045587fc in tok_get_normal_mode tokenizer.c:1633
    #3 0x1045551ec in _PyTokenizer_Get tokenizer.c:2657
    #4 0x1044c8998 in _PyPegen_fill_token pegen.c:201
    #5 0x1044d9a04 in _PyPegen_parse parser.c:41238
    #6 0x1044cc328 in _PyPegen_run_parser pegen.c:825
    #7 0x1044cc94c in _PyPegen_run_parser_from_file_pointer pegen.c:897
    #8 0x104961ef4 in PyRun_InteractiveOneObjectEx pythonrun.c:240
    #9 0x104960cf4 in _PyRun_InteractiveLoopObject pythonrun.c:137
    #10 0x104960a18 in _PyRun_AnyFileObject pythonrun.c:72
    #11 0x104961ad4 in PyRun_AnyFileExFlags pythonrun.c:104
    #12 0x1049b0124 in Py_RunMain main.c:689
    #13 0x1049b10a0 in pymain_main main.c:719
    #14 0x1049b15cc in Py_BytesMain main.c:743
    #15 0x1b161fe4c  (<unknown module>)

SUMMARY: AddressSanitizer: heap-use-after-free unicodeobject.c:4526 in unicode_decode_utf8
Shadow bytes around the buggy address:
  0x007020f42c60: fa fa 02 fa fa fa 05 fa fa fa 03 fa fa fa 02 fa
  0x007020f42c70: fa fa 05 fa fa fa 02 fa fa fa 03 fa fa fa 02 fa
  0x007020f42c80: fa fa 05 fa fa fa 02 fa fa fa 06 fa fa fa 03 fa
  0x007020f42c90: fa fa 02 fa fa fa fd fa fa fa fd fa fa fa 05 fa
  0x007020f42ca0: fa fa fd fd fa fa fd fd fa fa fd fa fa fa 00 fa
=>0x007020f42cb0: fa fa 00 fa fa fa fd fa fa fa[fd]fa fa fa fd fa
  0x007020f42cc0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa 04 fa
  0x007020f42cd0: fa fa fd fa fa fa fd fa fa fa fd fd fa fa fd fa
  0x007020f42ce0: fa fa 06 fa fa fa fd fd fa fa fd fd fa fa fd fd
  0x007020f42cf0: fa fa fd fd fa fa fd fd fa fa fd fa fa fa fd fa
  0x007020f42d00: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==25280==ABORTING
[1]    25280 abort      ./python.exe

lysnikolaou added a commit to lysnikolaou/cpython that referenced this issue Apr 25, 2023
lysnikolaou added a commit that referenced this issue Apr 25, 2023
Turns out we always need to remember/restore fstring buffers in all of
the stack of tokenizer modes, cause they might change to
`TOK_REGULAR_MODE` and have newlines inside the braces (which is when we
need to reallocate the buffer and restore the fstring ones).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

2 participants