Skip to content

Freeze initialized runtime state for use in subsequent executions. #78

@ericsnowcurrently

Description

@ericsnowcurrently

This is based on a discusssion @markshannon and I had the other day, but it also relates to discussions I've had with other core devs periodically for several years.

The idea is to start up the runtime, finish initialization, and then take a snapshot of the process memory (or a subset). That snapshot is then rendered as a header file (a la frozen modules) which the runtime can use on subsequent executions to get to that initialized state instead of executing all the usual runtime code. (This is reminiscent of a technique xemacs uses.)

Benefits

  • possibly skip most of runtime init, getting us to running user code much faster
  • allow us to do one allocation (for the whole snapshot) instead of the many we normally do
  • (we may be able to get that snapshot into the DATA section to avoid allocation altogether, though likely not worth the trouble)
  • the snapshot could be re-used to speed up creating subinterpreters
  • if we make the snapshot dump human-readable, it could be a useful diagnostic tool

Caveats and Challenges

  • ? other than relatively short-lived ones, most Python processes won't benefit all that much
  • must be part of the build process (probably not realistic to do at runtime)
  • taking the snapshot might not be so easy
  • turning the snapshot back into a fully initialized runtime might not be so easy
  • there are lots of things to fix up (e.g. offsets, pointers, object hashes, maybe refcounts), which may make it too complex or otherwise neutralize any performance gains
  • command-line options and env vars can invalidate the snapshot

Open Questions

  • is it worth it?
  • is it worth the time to figure out if it's worth it?
  • would it make sense to do this with a subset of runtime initialization?
  • what should be in the snapshot?
  • what should the format be for the snapshot dump?
  • make it human-readable?
  • how to turn the initialized runtime into a snapshot?
    • in-proc vs. external
    • stdout vs. outfile
  • what should the format be for the data we will use to initialize the runtime? (e.g. in a header file)
  • how to render the snapshot dump as that data?
  • how to go from that data to a fully initialized runtime?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions