-
Notifications
You must be signed in to change notification settings - Fork 52
Description
This is a Cpython startup improvement approach proposed by Alibaba Compiler Team.
We are working on ways to speed up python application startup time. The main idea here is sharing code objects from mmaped file, which produces similar startup benefits with a simpler implementation, compared to Experiment E.
Our design is inspired by the Application Class-Data Sharing (AppCDS) feature, introduced in OpenJDK. AppCDS allows a set of application classes to be pre-processed into a shared archive file, which can then be memory-mapped at runtime to reduce startup time and memory footprint.
Based on the above principle, we proposed Code-Data Sharing (CDS) approach, which allows a set of code objects to be deep copied into a memory-mapped heap image file. During runtime:
- use
MAP_FIXED
to map to the predetermined heap image to ensure that the pointers are correct - One concern is ASLR will randomly arrange the address of data section, causing
ob_type
may point to wrong address in memory. The solution is to patch the correct address forob_type
by traversing each object in heap image. - rehash the
frozen_set
s - get the code object directly from heap image while importing packages
Experiments
Env: Linux & Intel skylake
Running empty application
$time for i in `seq 100`; do PYCDSMODE=0 python3 -c ''; done
real 0m1.486s
user 0m1.186s
sys 0m0.307s
$PYCDSMODE=1 python3 -c '' # dump
$time for i in `seq 100`; do PYCDSMODE=2 python3 -c ''; done
real 0m1.201s
user 0m0.934s
sys 0m0.273s
Startup time benefits: 19.18% reduction
WebServer (flask + requests + pymongo)
$time PYCDSMODE=0 python3 -c 'import flask, requests, pymongo'
real 0m0.303s
user 0m0.278s
sys 0m0.025s
$PYCDSMODE=1 python3 -c 'import flask, requests, pymongo' dump
$time PYCDSMODE=2 python3 -c 'import flask, requests, pymongo'
real 0m0.257s
user 0m0.232s
sys 0m0.024s
Startup time benefits: 15.18% reduction
Summary
Compared to the existing approaches, the main contribution of Our CDS approach includes:
-
CDS use the heap object directly, while the memory-mapped implementation in PyICE needs some deserialization
-
CDS doesn't need to generate C source code, thus avoiding using C toolchain for compiling. This is essential for a production environment on the cloud
Considering AppCDS has proved to be successful in OpenJDK 10, we believe our proposal can be a practical feature to enhance CPython startup performance, even while our overall design is still evolving.