Skip to content

Speed up frame handling in Python-to-Python calls. #111

@markshannon

Description

@markshannon

For Python-to-Python calls we avoid consuming the C stack by making the call with the _PyEval_EvalFrameDefault function.
However, the handling of frames is not as efficient as it could be.

Tighten this up would have a few benefits:

  1. Speed up Python-to-Python (probably by only a small amount)
  2. Allow cleanup Python frames to be inserted cheaply enough for useful specialization of calls to Python special methods that need clean up to be called in a specialized instruction (__init__, __setitem__, etc.)
  3. Allow artificial frames to be inserted cheaply for compiled code that wants to have nice tracebacks and debuggability (e.g. Cython code).

In order to speed up frame handling we need to reduce the amount of work done in pushing the frame, and when clearing the frame.

The frame consists of three parts:

  1. The "specials": code object, globals, builtins and (slow) locals, link pointers and saved offsets for calls.
  2. The local variables area.
  3. The (evaluation) stack.

The stack is empty on both entry and exit, so has no cost apart from setting the stacktop on entry. This is about as efficient as it can be.

The use of local variables could be tracked in the compiler to create a bitmap describing which locals needs to cleared on exit. However, without a lot of additional work in the compiler, the bitmap will not be precise so we would gain little from it.

That leaves the specials. Most of the cost is in initializing and clearing the four fields:

    PyObject *f_globals;
    PyObject *f_builtins;
    PyObject *f_locals;
    PyCodeObject *f_code;

Not only do these need to be copied from the function on entry, they each need an INCREF on entry and (more expensively) a DECREF on exit. Combining them into a single object would save this work on call and return.

typedef struct _frame_scopes {
    PyObject_HEADER;    
    PyObject *f_globals;
    PyObject *f_builtins;
    PyObject *f_locals;
    PyCodeObject *f_code;
} PyFrameScopes;
typedef struct _interpreter_frame {
    PyObject *f_globals;
    PyObject *f_builtins;
    PyObject *f_locals;
    PyCodeObject *f_code;
    ...

Would become

typedef struct _interpreter_frame {
    PyFrameScopes *scopes;
   ...

and initializing the "specials" part of the frame would become considerably cheaper, and use less space.

There are some downsides to creating this object, however:

  1. Extra complexity and overhead when creating a function (possibly negatively impacting the performance of creating a closure)
  2. Changing the unstable API. We would need to move the PyFunctionObject to the internal headers to make that explicit.
  3. Additional overhead for LOAD_GLOBAL due to the extra indirection. Hopefully the cost of the extra memory load in LOAD_GLOBAL will be outweighed by saving many indirections and branches in each call.
  4. Although f_locals is always NULL for functions, it is non-NULL and cannot be shared when executing module or class level code. Each call to module or class level code would need a new PyFrameScopes to be created.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions