-
Notifications
You must be signed in to change notification settings - Fork 52
Closed
Labels
Description
This is based on a discusssion @markshannon and I had the other day, but it also relates to discussions I've had with other core devs periodically for several years.
The idea is to start up the runtime, finish initialization, and then take a snapshot of the process memory (or a subset). That snapshot is then rendered as a header file (a la frozen modules) which the runtime can use on subsequent executions to get to that initialized state instead of executing all the usual runtime code. (This is reminiscent of a technique xemacs uses.)
Benefits
- possibly skip most of runtime init, getting us to running user code much faster
- allow us to do one allocation (for the whole snapshot) instead of the many we normally do
- (we may be able to get that snapshot into the DATA section to avoid allocation altogether, though likely not worth the trouble)
- the snapshot could be re-used to speed up creating subinterpreters
- if we make the snapshot dump human-readable, it could be a useful diagnostic tool
Caveats and Challenges
- ? other than relatively short-lived ones, most Python processes won't benefit all that much
- must be part of the build process (probably not realistic to do at runtime)
- taking the snapshot might not be so easy
- turning the snapshot back into a fully initialized runtime might not be so easy
- there are lots of things to fix up (e.g. offsets, pointers, object hashes, maybe refcounts), which may make it too complex or otherwise neutralize any performance gains
- command-line options and env vars can invalidate the snapshot
Open Questions
- is it worth it?
- is it worth the time to figure out if it's worth it?
- would it make sense to do this with a subset of runtime initialization?
- what should be in the snapshot?
- what should the format be for the snapshot dump?
- make it human-readable?
- how to turn the initialized runtime into a snapshot?
- in-proc vs. external
- stdout vs. outfile
- what should the format be for the data we will use to initialize the runtime? (e.g. in a header file)
- how to render the snapshot dump as that data?
- how to go from that data to a fully initialized runtime?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done