Skip to content

Code objects have incorrect hash/equality when code is instrumented for sys.monitoring #111984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nedbat opened this issue Nov 11, 2023 · 4 comments
Labels
3.12 only security fixes type-bug An unexpected behavior, bug, or error

Comments

@nedbat
Copy link
Member

nedbat commented Nov 11, 2023

Bug report

Bug description:

A thread stress test in the coverage.py test suite was failing with seemingly impossible behavior: a code object is used as a key in a dict, and then cannot be found. I eventually tracked it down to a mismatch between how bytecodes are examined in code_hash and in code_richcompare.

I have instructions for demonstrating the problem if you like, though it's a thread race condition, so it's a bit involved: https://gist.github.com/nedbat/2fb43527d102b8bb6e7fb7f3400fa3a9

Hash computation

When computing the hash, opcodes are checked if they are instrumented, and if so, use the original opcodes:

static Py_hash_t
code_hash(PyCodeObject *co)
{
    //...
    for (int i = 0; i < Py_SIZE(co); i++) {
        int deop = _Py_GetBaseOpcode(co, i);
        SCRAMBLE_IN(deop);
        SCRAMBLE_IN(_PyCode_CODE(co)[i].op.arg);
        i += _PyOpcode_Caches[deop];
    }
/* Get the underlying opcode, stripping instrumentation */
int _Py_GetBaseOpcode(PyCodeObject *code, int i)
{
    int opcode = _PyCode_CODE(code)[i].op.code;
    if (opcode == INSTRUMENTED_LINE) {
        opcode = code->_co_monitoring->lines[i].original_opcode;
    }
    if (opcode == INSTRUMENTED_INSTRUCTION) {
        opcode = code->_co_monitoring->per_instruction_opcodes[i];
    }
    CHECK(opcode != INSTRUMENTED_INSTRUCTION);
    CHECK(opcode != INSTRUMENTED_LINE);
    int deinstrumented = DE_INSTRUMENT[opcode];
    if (deinstrumented) {
        return deinstrumented;
    }
    return _PyOpcode_Deopt[opcode];
}
#define _PyCode_CODE(CO) _Py_RVALUE((_Py_CODEUNIT *)(CO)->co_code_adaptive)

Equality

When checking equality, opcodes are not checked if they are instrumented:

static PyObject *
code_richcompare(PyObject *self, PyObject *other, int op)
{
    // ...
    for (int i = 0; i < Py_SIZE(co); i++) {
        _Py_CODEUNIT co_instr = _PyCode_CODE(co)[i];
        _Py_CODEUNIT cp_instr = _PyCode_CODE(cp)[i];
        co_instr.op.code = _PyOpcode_Deopt[co_instr.op.code];
        cp_instr.op.code = _PyOpcode_Deopt[cp_instr.op.code];
        eq = co_instr.cache == cp_instr.cache;
        if (!eq) {
            goto unequal;
        }
        i += _PyOpcode_Caches[co_instr.op.code];
    }

It seems like the hash and the equality will be well matched if the code hasn't been instrumented. But if it has been, then they don't match.

CPython versions tested on:

3.12

Operating systems tested on:

No response

@nedbat nedbat added the type-bug An unexpected behavior, bug, or error label Nov 11, 2023
@gpshead gpshead added the 3.12 only security fixes label Nov 11, 2023
@gaogaotiantian
Copy link
Member

This is fixed in main in #109107. Do we want to backport it @brandtbucher? There might be conflicts as the executor part is not there in 3.12.

@nedbat
Copy link
Member Author

nedbat commented Nov 11, 2023

I'm adding support in coverage.py for sys.monitoring. If we don't backport this, then any code that hashes code objects could behave strangely under coverage measurement.

@brandtbucher
Copy link
Member

Yeah, the code_richcompare stuff and test should probably be backported manually to 3.12.

@gaogaotiantian, interested in opening a PR?

@gaogaotiantian
Copy link
Member

Done in #112329

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants