-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
NULL ptr deref in _PyCode_ConstantKey when compiling code #128632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi there, I'm not familiar with the code so can't say WHY this is happening but the immediate cause of this seems to be the offset calculation in Python/assemble.c compute_localsplus_info() line 535, the last loop for freevars does not account for a cellvar put immediately above and overwrites that pointer instead of putting at the next location in the tuple: https://github.com/python/cpython/blob/main/Python/assemble.c#L535 When corrected the following error appears instead (no segfault): Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
compile(src, '<fuzz>', start, optimize=opt)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: compiler_lookup_arg(name='__classdict__') with reftype=5 failed in in2; freevars of code <generic parameters of Gí>: ('__classdict__',) EDIT: Segfault reproducible with just this script: class A:
class B[__classdict__]: pass |
Thanks for that! Confirmed on current main back to 3.12.8. This is possibly a security problem, because the interpreter can be crashed with just cc @Eclips4 as an ast expert (you might like this issue!) |
Tracked the problem down to the fact that in this particular case a cell var and a free var get the same offset in locals which causes the free var pointer for the last umd->u_varnames = {'__classdict__': 0, '.generic_base': 1}
umd->u_cellvars = {'.type_params': 0}
umd->u_freevars = {'__classdict__': 0} # assuming the value should be a 1? Included a snippet of code here below which will detect the condition and raise a "Fix": Replace Python/assemble.c lines 506-538 with the following: // This counter mirrors the fix done in fix_cell_offsets().
int numdropped = 0, maxcelloffset = -1;
pos = 0;
while (PyDict_Next(umd->u_cellvars, &pos, &k, &v)) {
int has_name = PyDict_Contains(umd->u_varnames, k);
RETURN_IF_ERROR(has_name);
if (has_name) {
// Skip cells that are already covered by locals.
numdropped += 1;
continue;
}
int offset = PyLong_AsInt(v);
if (offset == -1 && PyErr_Occurred()) {
return ERROR;
}
assert(offset >= 0);
offset += nlocals - numdropped;
maxcelloffset = Py_MAX(maxcelloffset, offset);
assert(offset < nlocalsplus);
_Py_set_localsplus_info(offset, k, CO_FAST_CELL, names, kinds);
}
pos = 0;
while (PyDict_Next(umd->u_freevars, &pos, &k, &v)) {
int offset = PyLong_AsInt(v);
if (offset == -1 && PyErr_Occurred()) {
return ERROR;
}
assert(offset >= 0);
offset += nlocals - numdropped;
assert(offset < nlocalsplus);
// TODO: remove once gh-128632 is fixed properly or leave to prevent future unforseen segfaults?
if (offset <= maxcelloffset) {
PyErr_SetString(PyExc_SystemError,
"overlapping cell and free variable offsets detected (see gh-128632)");
return ERROR;
}
_Py_set_localsplus_info(offset, k, CO_FAST_FREE, names, kinds);
} P.S. If this niche case is not worth fixing properly let me know and I will send up a PR with this "fix" to at least avoid a crash and a test case. |
Thanks for minimizing this! |
Thank you @tom-pytel for the analysis. CC @JelleZijlstra . |
The crash happens because >>> class A:
... class B[__classdict__]: pass
...
File "<python-input-0>", line 2
class B[__classdict__]: pass
^^^^^^^^^^^^^
SyntaxError: reserved name '__classdict__' can not be used for type parameter Very simple in $ git diff main
diff --git a/Python/codegen.c b/Python/codegen.c
index 61707ba677..a84e4fc84e 100644
--- a/Python/codegen.c
+++ b/Python/codegen.c
@@ -1160,6 +1160,12 @@ codegen_type_params(compiler *c, asdl_type_param_seq *type_params)
location loc = LOC(typeparam);
switch(typeparam->kind) {
case TypeVar_kind:
+ if (_PyUnicode_EqualToASCIIString(typeparam->v.TypeVar.name, "__class__") ||
+ _PyUnicode_EqualToASCIIString(typeparam->v.TypeVar.name, "__classdict__")) {
+ return _PyCompile_Error(c, loc, "reserved name '%U' "
+ "can not be used for type parameter",
+ typeparam->v.TypeVar.name);
+ }
ADDOP_LOAD_CONST(c, loc, typeparam->v.TypeVar.name);
if (typeparam->v.TypeVar.bound) {
expr_ty bound = typeparam->v.TypeVar.bound;
@@ -1192,6 +1198,12 @@ codegen_type_params(compiler *c, asdl_type_param_seq *type_params)
RETURN_IF_ERROR(codegen_nameop(c, loc, typeparam->v.TypeVar.name, Store));
break;
case TypeVarTuple_kind:
+ if (_PyUnicode_EqualToASCIIString(typeparam->v.TypeVarTuple.name, "__class__") ||
+ _PyUnicode_EqualToASCIIString(typeparam->v.TypeVarTuple.name, "__classdict__")) {
+ return _PyCompile_Error(c, loc, "reserved name '%U' "
+ "can not be used for type parameter",
+ typeparam->v.TypeVarTuple.name);
+ }
ADDOP_LOAD_CONST(c, loc, typeparam->v.TypeVarTuple.name);
ADDOP_I(c, loc, CALL_INTRINSIC_1, INTRINSIC_TYPEVARTUPLE);
if (typeparam->v.TypeVarTuple.default_value) {
@@ -1211,6 +1223,12 @@ codegen_type_params(compiler *c, asdl_type_param_seq *type_params)
RETURN_IF_ERROR(codegen_nameop(c, loc, typeparam->v.TypeVarTuple.name, Store));
break;
case ParamSpec_kind:
+ if (_PyUnicode_EqualToASCIIString(typeparam->v.ParamSpec.name, "__class__") ||
+ _PyUnicode_EqualToASCIIString(typeparam->v.ParamSpec.name, "__classdict__")) {
+ return _PyCompile_Error(c, loc, "reserved name '%U' "
+ "can not be used for type parameter",
+ typeparam->v.ParamSpec.name);
+ }
ADDOP_LOAD_CONST(c, loc, typeparam->v.ParamSpec.name);
ADDOP_I(c, loc, CALL_INTRINSIC_1, INTRINSIC_PARAMSPEC);
if (typeparam->v.ParamSpec.default_value) { |
That may well be the solution. It was added in the implementation of PEP 695, but I don't see it mentioned in the PEP itself. There's more here from @JelleZijlstra : https://jellezijlstra.github.io/pep695.html CC @carljm. |
I'm OK with making this a SyntaxError, thanks for the investigation! We should probably do this in symtable.c though (unless Irit has different opinions on where this sort of check should happen?). |
It's fine in the symtable for now if it's easy to do. Ideally syntax errors would be identified during ast construction so they show up in linting/static analysis. But we also want the errors detected when a user-constructed ast is compiled. I think we should add a syntax check pass on the ast that can be applied after ast is constructed before it is returned, and also before a user defined ast is processed. But that's out of scope for this issue, so for now we can put the check where it's easiest to plug it in. |
Users can generate invalid ASTs in many more ways than can be generated organically by the parser besides just using this identifier as a type param, so is this worth consideration? Check in symtable.c is nice as its just one place But whatever you decide I suggest adding comment below in two places where possibility for this error to arise in future due to new features/keywords exists: $ git diff main
diff --git a/Python/codegen.c b/Python/codegen.c
index 61707ba6770..9ee4691cdd8 100644
--- a/Python/codegen.c
+++ b/Python/codegen.c
@@ -3065,6 +3065,9 @@ codegen_addop_yield(compiler *c, location loc) {
return SUCCESS;
}
+/* XXX: Currently if this is used to insert a new name into u_freevars when
+ there are already entries in u_cellvars then the wrong index will be put
+ into u_freevars causing a hard error downstream. */
static int
codegen_load_classdict_freevar(compiler *c, location loc)
{
diff --git a/Python/compile.c b/Python/compile.c
index ef470830336..2c78914e954 100644
--- a/Python/compile.c
+++ b/Python/compile.c
@@ -970,6 +970,9 @@ _PyCompile_ResolveNameop(compiler *c, PyObject *mangled, int scope,
break;
}
if (*optype != COMPILE_OP_FAST) {
+ /* XXX: Currently if this is used to insert a new name into u_freevars when
+ there are already entries in u_cellvars then the wrong index will be put
+ into u_freevars causing a hard error downstream. */
*arg = _PyCompile_DictAddObj(dict, mangled);
RETURN_IF_ERROR(*arg);
} And maybe an assertion of the form |
Minimal reproducer with ast module: compile(
ast.Module(
body=[
ast.ClassDef(
name="A",
lineno=0,
col_offset=0,
body=[
ast.ClassDef(
name="B",
lineno=0,
col_off_set=0,
body=[ast.Pass(lineno=0, col_offset=0)],
type_params=[
ast.TypeVar(
"__classdict__",
lineno=0,
col_offset=0,
end_lineno=0,
end_col_offset=0,
)
],
)
],
)
]
),
filename="<fuzz>",
mode="exec",
) |
Yes, an invalid AST shouldn't result in a segfault.
symtable is probably the best option. The validation in ast.c is only performed on user-generated ASTs. Internally generated ASTs are trusted to be valid. So it is not a good place to detect syntax errors. Would you like to create a PR? |
I didn't mean the segfault would still be allowed to happen, but moot as the check in symtable covers it.
Your sure? I was getting the call to
Its up, have a look. |
Ah, maybe it was that it only runs in debug mode? |
Nope, its called optimized as well, gonna say its a version thing. |
…hon#128744) (cherry picked from commit 891c61c)
…am (pythonGH-128744) (cherry picked from commit 891c61c) Co-authored-by: Tomasz Pytel <[email protected]>
…-128744) (#132085) (cherry picked from commit 891c61c) Co-authored-by: Tomasz Pytel <[email protected]>
#132090) (cherry picked from commit 891c61c) Co-authored-by: Tomasz Pytel <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
Crash report
What happened?
Unfortunately it's a slightly large minimal reproducer. You can use
xxd -r
to go from the hexdump to the actual binary.Found by OSS-Fuzz.
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Output from running 'python -VV' on the command line:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: