Skip to content

NULL ptr deref in _PyCode_ConstantKey when compiling code #128632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alex opened this issue Jan 8, 2025 · 15 comments
Closed

NULL ptr deref in _PyCode_ConstantKey when compiling code #128632

alex opened this issue Jan 8, 2025 · 15 comments
Assignees
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@alex
Copy link
Member

alex commented Jan 8, 2025

Crash report

What happened?

Unfortunately it's a slightly large minimal reproducer. You can use xxd -r to go from the hexdump to the actual binary.

~/p/cpython ❯❯❯ xxd ~/Downloads/clusterfuzz-testcase-minimized-fuzz_pycompile-5092056728403968 
00000000: 5c62 2320 2323 2063 6f64 696e 673a 206c  \b# ## coding: l
00000010: 6174 696e 332f 30ff ffff ffff ffff ff6c  atin3/0........l
00000020: 6174 696e 37ff ffff 6463 6173 6564 6464  atin7...dcaseddd
00000030: 6464 6479 2e62 6b0a 0a0a 0a63 6c61 7373  dddy.bk....class
00000040: 2069 6e32 2829 3a0a 2020 2364 6464 6464   in2():.  #ddddd
00000050: 6762 6b0a 0a20 2064 6464 6464 640a 0a0a  gbk..  dddddd...
00000060: 476c 6174 762e 5f5f 7274 0a63 6c61 7373  Glatv.__rt.class
00000070: 2069 6e32 28ba 293a 0a20 2064 6464 6464   in2(.):.  ddddd
00000080: 6467 626c 0a0a 2020 636c 6173 7320 47ed  dgbl..  class G.
00000090: 5b5f 7072 7765 616e 7065 725d 3a61 7464  [_prweanper]:atd
000000a0: 6464 640a 0a0a 0a0a 0a0a 0a30 6f37 300a  ddd........0o70.
000000b0: 0a0a 0a0a 0a0a 0a0a 7476 6147 6c2e 5f5f  ........tvaGl.__
000000c0: 7274 0a63 6c61 7373 2069 6e32 28cf 293a  rt.class in2(.):
000000d0: 0a20 2064 6464 6464 6467 626c 0a0a 2020  .  ddddddgbl..  
000000e0: 636c 6173 7320 47ed 5b5f 7072 7765 616e  class G.[_prwean
000000f0: 7065 725d 3a61 2564 6462 2320 6762 6c0a  per]:a%ddb# gbl.
00000100: 0a20 2063 6c61 7373 2047 ed5b 5f70 7277  .  class G.[_prw
00000110: 6561 6e70 6572 5d3a 6174 6464 6464 0a0a  eanper]:atdddd..
00000120: 2320 636f 6469 6e67 3d6c 6174 696e 2d31  # coding=latin-1
00000130: 0a0a 0a47 6c61 7476 2e5f 5f72 740a 636c  ...Glatv.__rt.cl
00000140: 6173 7320 696e 3228 ba29 3a0a 2020 6464  ass in2(.):.  dd
00000150: 6464 6464 6762 6c0a 0a20 2063 6c61 7373  ddddgbl..  class
00000160: 2047 ed5b 5f5f 636c 6173 7364 6963 745f   G.[__classdict_
00000170: 5f5d 3a61 7464 6464 640a 0a0a 0a0a 0a0a  _]:atdddd.......
00000180: 0a30 6f37 300a 0a0a 0a0a 0a0a 0a0a 7476  .0o70.........tv
00000190: 6147 6c2e 5f5f 7274 0a63 6c61 7373 2069  aGl.__rt.class i
000001a0: 6e32 28cf 293a 0a20 2064 6464 6464 6467  n2(.):.  ddddddg
000001b0: 626c 0a0a 2020 636c 6173 7320 47ed 5b5f  bl..  class G.[_
000001c0: 7072 7765 616e 7065 725d 3a61 2564 6462  prweanper]:a%ddb
000001d0: 2320 6762 6c0a 0a20 2063 6c61 7373 2047  # gbl..  class G
000001e0: ed5b 5f70 7277 6561 6e70 6572 5d3a 6174  .[_prweanper]:at
000001f0: 6464 6464 0a0a 2320 636f 2600 0000 0000  dddd..# co&.....
00000200: 0000 6469 6c67 3d6c 6174 696e 2d31 0a0a  ..dilg=latin-1..
00000210: 0a47 6c61 7476 2e5f 5f72 740a 636c 6173  .Glatv.__rt.clas
00000220: 7320 696e 3228 ba29 3a0a 2020 6464 6464  s in2(.):.  dddd
00000230: 6464 6762 6c0a 0a20 2063 6c61 7373 2047  ddgbl..  class G
00000240: ed5b 5f70 7277 6561 6e70 6572 5d3a 6174  .[_prweanper]:at
00000250: 6464 6464 0aee 0a0a 0a0a 0a0a 306f 3730  dddd........0o70
00000260: 0a0a 0a0a 0a0a 0a0a 0a47 6c61 7476 2e5f  .........Glatv._
00000270: 5f72 740a 636c 6173 7320 696e 3228 cf29  _rt.class in2(.)
00000280: 3a0a 2020 6464 6464 6464 6762 6c0a 0a20  :.  ddddddgbl.. 
00000290: 2063 6c61 7373 2047 ed5b 5f70 7277 6561   class G.[_prwea
000002a0: 6e70 6572 5d3a 6125 6464 6223 2023 2320  nper]:a%ddb# ## 
000002b0: 636f 64ff ffff ff64 6464 6464 989b 86d1  cod....ddddd....
000002c0: 9d94 f5f5 0a0a 636c 6173 7320 696e 3228  ......class in2(
000002d0: 293a 6f37 300a 0a0a 0a0a 0a0a 0a40 476c  ):o70........@Gl
000002e0: 6174 3a61 7464 6464 640a 0a0a 0a0a 0a0a  at:atdddd.......
000002f0: 0a30 6f37 300a 0a0a 0a0a 0a0a 0a0a 476c  .0o70.........Gl
00000300: 6174 762e 6223 2023 2320 606f 6469 6e67  atv.b# ## `oding
00000310: 3a20 6c61 7469 6e33 2f30 ffff ffff ffff  : latin3/0......
00000320: ffff 6c61 7469 6e37 ffff ff64 6361 7365  ..latin7...dcase
00000330: 6464 6464 6464 792e 2e5f 5f72 740a 636c  ddddddy..__rt.cl
00000340: 6173 7320 696e 3228 ba29 3a0a 2020 6464  ass in2(.):.  dd
00000350: 6464 6464 6762 6c0a 0a20 2063 6c61 7373  ddddgbl..  class
00000360: 2047 ed0a 306f 3730 0a0a 0a0a 0a0a 0a0a   G..0o70........
00000370: 0a74 7661 476c 2e5f 5f72 740a 636c 6173  .tvaGl.__rt.clas
00000380: 7320 696e 3228 cf29 3a0a 2020 6464 6464  s in2(.):.  dddd
00000390: 6464 6762 6c0a 0a20 2043 6c61 7373 2047  ddgbl..  Class G
000003a0: ed5b 5f70 7277 6561 6e70 6572 5d3a 6125  .[_prweanper]:a%
000003b0: 6464 6223 2067 6237 3531 3734 3631 3034  ddb# gb751746104
000003c0: 3530 3935 3634 3039 3431 3731 3531 2320  50956409417151# 
000003d0: 636f 6464 6464 640a 0a23 2063 6f64 696e  coddddd..# codin
000003e0: 673d 6c61 7469 6e2d 310a 0a0a 476c 6174  g=latin-1...Glat
000003f0: 762e 5f5f 7274 0a63 6c61 7373 2069 6e32  v.__rt.class in2
00000400: 28ba 293a 0a20 2064 6464 6464 6467 626c  (.):.  ddddddgbl
00000410: 0a0a 0a0a 0a0a 0a0a 0a47 6c61 7476 2e5f  .........Glatv._
00000420: 5f72 740a 636c 6173 7320 696e 3228 cf29  _rt.class in2(.)
00000430: 3a0a 2020 6464 6464 6464 6762 6c0a 0a20  :.  ddddddgbl.. 
00000440: 2063 6c61 7373 2047 ed5b 5f70 7277 6561   class G.[_prwea
00000450: 6e70 6572 5d3a 6125 6464 6264 6464 6464  nper]:a%ddbddddd
00000460: 6762 6b0a 0a20 2064 6464 6464 640a 0a0a  gbk..  dddddd...
00000470: 476c 6174 762e 5f5f 7274 0a63 6c61 7373  Glatv.__rt.class
00000480: 2069 6e32 28ba 293a 0a20 2064 6464 6464   in2(.):.  ddddd
00000490: 6467 626c 0a0a 2020 636c 6173 7320 47ed  dgbl..  class G.
000004a0: 5b5f 7072 7765 616e 7065 725d 3a61 7464  [_prweanper]:atd
000004b0: 6464 640a 0a0a 0100 000d 0a0a 0a0a 0a30  ddd............0
000004c0: 6f37 300a 0a0a 0a0a 0a0a 0a74 7279 3a20  o70........try: 
000004d0: 0a47 6c61 0a0a 0a0a 0a0a 0a47 6c61 7476  .Gla.......Glatv
000004e0: 2e5f 5f72 740a 636c 6173 7320 696e 3228  .__rt.class in2(
000004f0: cf29 3a0a 2020 6464 6464 6464 6762 6c0a  .):.  ddddddgbl.
00000500: 0a20 2063 6cff ffff ffff ffff ffff ffff  .  cl...........
00000510: ffff ffff ffff ffff ffff ffff ffff ffff  ................
00000520: 6173 7320 47ed 5b5f 7072 7765 616e 7065  ass G.[_prweanpe
00000530: 725d 3a61 2564 6462 6464 6464 6467 626b  r]:a%ddbdddddgbk
00000540: 0a0a 2020 6464 6464 6464 0a0a 0a43 6c61  ..  dddddd...Cla
00000550: 7476 2e5f 5f72 740a 636c 6173 7320 6965  tv.__rt.class ie
00000560: 3228 ba29 3a0a 2020 6464 6464 6464 6762  2(.):.  ddddddgb
00000570: 6c0a 0a20 2063 6c61 7373 2047 ed5b 5f70  l..  class G.[_p
00000580: 7277 6561 6e70 6572 5d3a 6174 640a ee0a  rweanper]:atd...
00000590: 0a0a 0a0a 0a30 6f37 300a 0a0a 0a0a 0a0a  .....0o70.......
000005a0: 0a0a 2f3d 2074 762e 5f5f 7274 0a63 6c61  ../= tv.__rt.cla
000005b0: 7373 2069 6e32 28cf 293a 0a20 2064 6464  ss in2(.):.  ddd
000005c0: 6464 6467 626c 0a0a 5f70 7277 6561 6e70  dddgbl.._prweanp
000005d0: 6572 5d3a 6174 6464 6464 0a0a 2320 636f  er]:atdddd..# co
000005e0: 6469 6e67 3d74 6820 6223 2023 2320 636f  ding=th b# ## co
000005f0: 64ff ffff 7479 7065 ff64 6464 6464 6464  d...type.ddddddd
00000600: 792e 62                                  y.b
~/p/cpython ❯❯❯ ./python.exe -c '
                data = open("/Users/alex_gaynor/Downloads/clusterfuzz-testcase-minimized-fuzz_pycompile-5092056728403968", "rb").read()
                start = ["eval", "single", "exec"][data[0] % 3]
                opt = data[1] % 4
                compile(data[2:].split(b"\x00")[0], "<fuzz>", start, optimize=opt)'
python.exe(19196,0x1f3918240) malloc: nano zone abandoned due to inability to reserve vm space.
<string>:2: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/alex_gaynor/Downloads/clusterfuzz-testcase-minimized-fuzz_pycompile-5092056728403968'>
  data = open("/Users/alex_gaynor/Downloads/clusterfuzz-testcase-minimized-fuzz_pycompile-5092056728403968", "rb").read()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Include/object.h:268:20: runtime error: member access within null pointer of type 'PyObject' (aka 'struct _object')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Include/object.h:268:20 in 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==19196==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x000104329198 bp 0x00016bddd130 sp 0x00016bddd000 T0)
==19196==The signal is caused by a READ memory access.
==19196==Hint: address points to the zero page.
    #0 0x104329198 in _PyCode_ConstantKey codeobject.c:2417
    #1 0x104329430 in _PyCode_ConstantKey codeobject.c:2479
    #2 0x104859bec in const_cache_insert compile.c:315
    #3 0x104859794 in _PyCompile_ConstCacheMergeOne compile.c:1233
    #4 0x104722d20 in _PyAssemble_MakeCodeObject assemble.c:754
    #5 0x10485ad24 in _PyCompile_OptimizeAndAssemble compile.c:1369
    #6 0x104812d58 in codegen_visit_stmt codegen.c:2897
    #7 0x10480ba24 in _PyCodegen_Body codegen.c:828
    #8 0x104824604 in codegen_class_body codegen.c:1483
    #9 0x104812740 in codegen_visit_stmt codegen.c:2897
    #10 0x10480ba24 in _PyCodegen_Body codegen.c:828
    #11 0x10485d27c in compiler_codegen compile.c
    #12 0x10485b518 in _PyAST_Compile compile.c:1382
    #13 0x1049c48f0 in Py_CompileStringObject pythonrun.c:1497
    #14 0x10475cb34 in builtin_compile bltinmodule.c.h:363
    #15 0x1044669fc in cfunction_vectorcall_FASTCALL_KEYWORDS methodobject.c:452
    #16 0x104304afc in _PyObject_VectorcallTstate pycore_call.h:167
    #17 0x1047994f8 in _PyEval_EvalFrameDefault generated_cases.c.h:2013
    #18 0x1047698b4 in PyEval_EvalCode ceval.c:658
    #19 0x1049c66b8 in run_eval_code_obj pythonrun.c:1338
    #20 0x1049c6204 in run_mod pythonrun.c:1423
    #21 0x1049c21c4 in _PyRun_StringFlagsWithName pythonrun.c:1222
    #22 0x1049c2004 in _PyRun_SimpleStringFlagsWithName pythonrun.c:548
    #23 0x104a57cc4 in Py_RunMain main.c:776
    #24 0x104a591b8 in pymain_main main.c:806
    #25 0x104a59554 in Py_BytesMain main.c:830
    #26 0x189cb8270  (<unknown module>)

==19196==Register values:
 x[0] = 0x000000016bddcf18   x[1] = 0x0000000000000000   x[2] = 0x0000000000000000   x[3] = 0x00000001084007a0  
 x[4] = 0x0000000063000000   x[5] = 0x0000000000000000   x[6] = 0x0000000000000000   x[7] = 0x0000000000000000  
 x[8] = 0x0000000000000000   x[9] = 0x00000001064be5e8  x[10] = 0x0000000000000000  x[11] = 0x0000000000000084  
x[12] = 0x0000000105c50000  x[13] = 0x00000001064c06e8  x[14] = 0x0000000000000000  x[15] = 0x0000000000000000  
x[16] = 0x000000030a47dd90  x[17] = 0x00000001064180a0  x[18] = 0x0000000000000000  x[19] = 0x000000016bddd080  
x[20] = 0x000000016bddd000  x[21] = 0x0000000000000000  x[22] = 0x0000000000000008  x[23] = 0x000000702d7dba00  
x[24] = 0x0000000000000000  x[25] = 0x0000007000020000  x[26] = 0x0000000000000000  x[27] = 0x0000000000000000  
x[28] = 0x0000000000000001     fp = 0x000000016bddd130     lr = 0x0000000104329834     sp = 0x000000016bddd000  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV codeobject.c:2417 in _PyCode_ConstantKey
==19196==ABORTING
fish: Job 1, './python.exe -c '' terminated by signal data = open("/Users/alex_gaynor… (start = ["eval", "single", "exe…)
fish: Job opt = data[1] % 4, 'compile(data[2:].split(b"\x00")…' terminated by signal SIGABRT (Abort)

Found by OSS-Fuzz.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Output from running 'python -VV' on the command line:

No response

Linked PRs

@alex alex added the type-crash A hard crash of the interpreter, possibly with a core dump label Jan 8, 2025
@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 8, 2025

Hi there, I'm not familiar with the code so can't say WHY this is happening but the immediate cause of this seems to be the offset calculation in Python/assemble.c compute_localsplus_info() line 535, the last loop for freevars does not account for a cellvar put immediately above and overwrites that pointer instead of putting at the next location in the tuple:

https://github.com/python/cpython/blob/main/Python/assemble.c#L535

When corrected the following error appears instead (no segfault):

Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    compile(src, '<fuzz>', start, optimize=opt)
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: compiler_lookup_arg(name='__classdict__') with reftype=5 failed in in2; freevars of code <generic parameters of >: ('__classdict__',)

EDIT: Segfault reproducible with just this script:

class A:
  class B[__classdict__]: pass

@ZeroIntensity ZeroIntensity added 3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes labels Jan 8, 2025
@ZeroIntensity
Copy link
Member

EDIT: Segfault reproducible with just this script:

Thanks for that! Confirmed on current main back to 3.12.8. This is possibly a security problem, because the interpreter can be crashed with just compile(). I'm speculating, though.

cc @Eclips4 as an ast expert (you might like this issue!)

@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 8, 2025

Tracked the problem down to the fact that in this particular case a cell var and a free var get the same offset in locals which causes the free var pointer for the last __classdict__ to overwrite the cell var pointer for .type_params and leaves a NULL in the tuple where it is meant to be written into. At the point of error in the function compute_localsplus_info() the var dicts are as follows:

umd->u_varnames = {'__classdict__': 0, '.generic_base': 1}
umd->u_cellvars = {'.type_params': 0}
umd->u_freevars = {'__classdict__': 0}  # assuming the value should be a 1?

Included a snippet of code here below which will detect the condition and raise a SystemError exception (which would be raised if this were handled correctly anyway as this is a very degenerate condition). Included the "fix" here and not a PR because it is a bandaid just to avoid the crash. The proper way to fix this would be to figure out why the __classdict__ freevar gets an index of 0 and fix that (the __classdict__ being added in codegen_load_classdict_freevar() doesn't seem to take into account variables already present in u_metadata.u_cellvars to offset its index?).

"Fix": Replace Python/assemble.c lines 506-538 with the following:

    // This counter mirrors the fix done in fix_cell_offsets().
    int numdropped = 0, maxcelloffset = -1;
    pos = 0;
    while (PyDict_Next(umd->u_cellvars, &pos, &k, &v)) {
        int has_name = PyDict_Contains(umd->u_varnames, k);
        RETURN_IF_ERROR(has_name);
        if (has_name) {
            // Skip cells that are already covered by locals.
            numdropped += 1;
            continue;
        }

        int offset = PyLong_AsInt(v);
        if (offset == -1 && PyErr_Occurred()) {
            return ERROR;
        }
        assert(offset >= 0);
        offset += nlocals - numdropped;
        maxcelloffset = Py_MAX(maxcelloffset, offset);
        assert(offset < nlocalsplus);
        _Py_set_localsplus_info(offset, k, CO_FAST_CELL, names, kinds);
    }

    pos = 0;
    while (PyDict_Next(umd->u_freevars, &pos, &k, &v)) {
        int offset = PyLong_AsInt(v);
        if (offset == -1 && PyErr_Occurred()) {
            return ERROR;
        }
        assert(offset >= 0);
        offset += nlocals - numdropped;
        assert(offset < nlocalsplus);

        // TODO: remove once gh-128632 is fixed properly or leave to prevent future unforseen segfaults?
        if (offset <= maxcelloffset) {
            PyErr_SetString(PyExc_SystemError,
                            "overlapping cell and free variable offsets detected (see gh-128632)");
            return ERROR;
        }

        _Py_set_localsplus_info(offset, k, CO_FAST_FREE, names, kinds);
    }

P.S. If this niche case is not worth fixing properly let me know and I will send up a PR with this "fix" to at least avoid a crash and a test case.

@alex
Copy link
Member Author

alex commented Jan 8, 2025

Thanks for minimizing this!

@iritkatriel
Copy link
Member

Thank you @tom-pytel for the analysis.

CC @JelleZijlstra .

@JelleZijlstra JelleZijlstra self-assigned this Jan 9, 2025
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 10, 2025
@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 10, 2025

The crash happens because __classdict__ is a special variable, how about just disallowing that completely (can't be used anyway):

>>> class A:
...   class B[__classdict__]: pass
... 
  File "<python-input-0>", line 2
    class B[__classdict__]: pass
            ^^^^^^^^^^^^^
SyntaxError: reserved name '__classdict__' can not be used for type parameter

Very simple in codegen.c, or maybe better parser? Could restrict to only class annotations or other details but idea is this. Expand to list of forbidden names just in case?

$ git diff main
diff --git a/Python/codegen.c b/Python/codegen.c
index 61707ba677..a84e4fc84e 100644
--- a/Python/codegen.c
+++ b/Python/codegen.c
@@ -1160,6 +1160,12 @@ codegen_type_params(compiler *c, asdl_type_param_seq *type_params)
         location loc = LOC(typeparam);
         switch(typeparam->kind) {
         case TypeVar_kind:
+            if (_PyUnicode_EqualToASCIIString(typeparam->v.TypeVar.name, "__class__") ||
+                _PyUnicode_EqualToASCIIString(typeparam->v.TypeVar.name, "__classdict__")) {
+                return _PyCompile_Error(c, loc, "reserved name '%U' "
+                                        "can not be used for type parameter",
+                                        typeparam->v.TypeVar.name);
+            }
             ADDOP_LOAD_CONST(c, loc, typeparam->v.TypeVar.name);
             if (typeparam->v.TypeVar.bound) {
                 expr_ty bound = typeparam->v.TypeVar.bound;
@@ -1192,6 +1198,12 @@ codegen_type_params(compiler *c, asdl_type_param_seq *type_params)
             RETURN_IF_ERROR(codegen_nameop(c, loc, typeparam->v.TypeVar.name, Store));
             break;
         case TypeVarTuple_kind:
+            if (_PyUnicode_EqualToASCIIString(typeparam->v.TypeVarTuple.name, "__class__") ||
+                _PyUnicode_EqualToASCIIString(typeparam->v.TypeVarTuple.name, "__classdict__")) {
+                return _PyCompile_Error(c, loc, "reserved name '%U' "
+                                        "can not be used for type parameter",
+                                        typeparam->v.TypeVarTuple.name);
+            }
             ADDOP_LOAD_CONST(c, loc, typeparam->v.TypeVarTuple.name);
             ADDOP_I(c, loc, CALL_INTRINSIC_1, INTRINSIC_TYPEVARTUPLE);
             if (typeparam->v.TypeVarTuple.default_value) {
@@ -1211,6 +1223,12 @@ codegen_type_params(compiler *c, asdl_type_param_seq *type_params)
             RETURN_IF_ERROR(codegen_nameop(c, loc, typeparam->v.TypeVarTuple.name, Store));
             break;
         case ParamSpec_kind:
+            if (_PyUnicode_EqualToASCIIString(typeparam->v.ParamSpec.name, "__class__") ||
+                _PyUnicode_EqualToASCIIString(typeparam->v.ParamSpec.name, "__classdict__")) {
+                return _PyCompile_Error(c, loc, "reserved name '%U' "
+                                        "can not be used for type parameter",
+                                        typeparam->v.ParamSpec.name);
+            }
             ADDOP_LOAD_CONST(c, loc, typeparam->v.ParamSpec.name);
             ADDOP_I(c, loc, CALL_INTRINSIC_1, INTRINSIC_PARAMSPEC);
             if (typeparam->v.ParamSpec.default_value) {

@iritkatriel
Copy link
Member

iritkatriel commented Jan 11, 2025

The crash happens because classdict is a special variable, how about just disallowing that completely

That may well be the solution. It was added in the implementation of PEP 695, but I don't see it mentioned in the PEP itself.

There's more here from @JelleZijlstra : https://jellezijlstra.github.io/pep695.html

CC @carljm.

@JelleZijlstra
Copy link
Member

I'm OK with making this a SyntaxError, thanks for the investigation! We should probably do this in symtable.c though (unless Irit has different opinions on where this sort of check should happen?).

@iritkatriel
Copy link
Member

I'm OK with making this a SyntaxError, thanks for the investigation! We should probably do this in symtable.c though (unless Irit has different opinions on where this sort of check should happen?).

It's fine in the symtable for now if it's easy to do.

Ideally syntax errors would be identified during ast construction so they show up in linting/static analysis. But we also want the errors detected when a user-constructed ast is compiled. I think we should add a syntax check pass on the ast that can be applied after ast is constructed before it is returned, and also before a user defined ast is processed. But that's out of scope for this issue, so for now we can put the check where it's easiest to plug it in.

@tom-pytel
Copy link
Contributor

tom-pytel commented Jan 11, 2025

But we also want the errors detected when a user-constructed ast is compiled.

Users can generate invalid ASTs in many more ways than can be generated organically by the parser besides just using this identifier as a type param, so is this worth consideration?

Check in symtable.c is nice as its just one place symtable_visit_type_param_bound_or_default(). Could be ast.c in validate_typeparam() but not sure can get nice syntax error location like in symtable? Parser probably out of the question as would need to be in the grammar which seems like exteme overkill for niche issue.

But whatever you decide I suggest adding comment below in two places where possibility for this error to arise in future due to new features/keywords exists:

$ git diff main
diff --git a/Python/codegen.c b/Python/codegen.c
index 61707ba6770..9ee4691cdd8 100644
--- a/Python/codegen.c
+++ b/Python/codegen.c
@@ -3065,6 +3065,9 @@ codegen_addop_yield(compiler *c, location loc) {
     return SUCCESS;
 }
 
+/* XXX: Currently if this is used to insert a new name into u_freevars when
+   there are already entries in u_cellvars then the wrong index will be put
+   into u_freevars causing a hard error downstream. */
 static int
 codegen_load_classdict_freevar(compiler *c, location loc)
 {
diff --git a/Python/compile.c b/Python/compile.c
index ef470830336..2c78914e954 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -970,6 +970,9 @@ _PyCompile_ResolveNameop(compiler *c, PyObject *mangled, int scope,
         break;
     }
     if (*optype != COMPILE_OP_FAST) {
+        /* XXX: Currently if this is used to insert a new name into u_freevars when
+           there are already entries in u_cellvars then the wrong index will be put
+           into u_freevars causing a hard error downstream. */
         *arg = _PyCompile_DictAddObj(dict, mangled);
         RETURN_IF_ERROR(*arg);
     }

And maybe an assertion of the form assert(maxcellvaroffset < minfreevaroffset) in assemble.c compute_localsplus_info() to catch this issue should it arise again as the syntax error just masks a potential underlying problem.

@alex
Copy link
Member Author

alex commented Jan 11, 2025

Minimal reproducer with ast module:

compile(
    ast.Module(
        body=[
            ast.ClassDef(
                name="A",
                lineno=0,
                col_offset=0,
                body=[
                    ast.ClassDef(
                        name="B",
                        lineno=0,
                        col_off_set=0,
                        body=[ast.Pass(lineno=0, col_offset=0)],
                        type_params=[
                            ast.TypeVar(
                                "__classdict__",
                                lineno=0,
                                col_offset=0,
                                end_lineno=0,
                                end_col_offset=0,
                            )
                        ],
                    )
                ],
            )
        ]
    ),
    filename="<fuzz>",
    mode="exec",
)

@iritkatriel
Copy link
Member

Users can generate invalid ASTs in many more ways than can be generated organically by the parser besides just using this identifier as a type param, so is this worth consideration?

Yes, an invalid AST shouldn't result in a segfault.

Check in symtable.c is nice as its just one place symtable_visit_type_param_bound_or_default(). Could be ast.c in validate_typeparam() but not sure can get nice syntax error location like in symtable? Parser probably out of the question as would need to be in the grammar which seems like exteme overkill for niche issue.

symtable is probably the best option. The validation in ast.c is only performed on user-generated ASTs. Internally generated ASTs are trusted to be valid. So it is not a good place to detect syntax errors.

Would you like to create a PR?

@tom-pytel
Copy link
Contributor

Yes, an invalid AST shouldn't result in a segfault.

I didn't mean the segfault would still be allowed to happen, but moot as the check in symtable covers it.

The validation in ast.c is only performed on user-generated ASTs.

Your sure? I was getting the call to ast.c:validate_typeparam() when running a script, maybe you thinking py ast module?

Would you like to create a PR?

Its up, have a look.

@iritkatriel
Copy link
Member

The validation in ast.c is only performed on user-generated ASTs.
Your sure? I was getting the call to ast.c:validate_typeparam() when running a script, maybe you thinking py ast module?

Ah, maybe it was that it only runs in debug mode?
(I once tried to add a syntax errors there and it didn't always work)

@tom-pytel
Copy link
Contributor

Ah, maybe it was that it only runs in debug mode?

Nope, its called optimized as well, gonna say its a version thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes 3.13 bugs and security fixes 3.14 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

6 participants