Skip to content

Faster startup -- Share code objects from memory-mapped file #86

@yuleil

Description

@yuleil

This is a Cpython startup improvement approach proposed by Alibaba Compiler Team.

We are working on ways to speed up python application startup time. The main idea here is sharing code objects from mmaped file, which produces similar startup benefits with a simpler implementation, compared to Experiment E.

Our design is inspired by the Application Class-Data Sharing (AppCDS) feature, introduced in OpenJDK. AppCDS allows a set of application classes to be pre-processed into a shared archive file, which can then be memory-mapped at runtime to reduce startup time and memory footprint.

Based on the above principle, we proposed Code-Data Sharing (CDS) approach, which allows a set of code objects to be deep copied into a memory-mapped heap image file. During runtime:

  • use MAP_FIXED to map to the predetermined heap image to ensure that the pointers are correct
  • One concern is ASLR will randomly arrange the address of data section, causing ob_type may point to wrong address in memory. The solution is to patch the correct address for ob_type by traversing each object in heap image.
  • rehash the frozen_set s
  • get the code object directly from heap image while importing packages

Experiments

Env: Linux & Intel skylake

Running empty application

$time for i in `seq 100`; do PYCDSMODE=0 python3 -c ''; done
real  0m1.486s
user  0m1.186s
sys   0m0.307s

$PYCDSMODE=1 python3 -c '' # dump
$time for i in `seq 100`; do PYCDSMODE=2 python3 -c ''; done
real  0m1.201s
user  0m0.934s
sys   0m0.273s

Startup time benefits: 19.18% reduction

WebServer (flask + requests + pymongo)

$time PYCDSMODE=0 python3 -c 'import flask, requests, pymongo'
real  0m0.303s
user  0m0.278s
sys   0m0.025s

$PYCDSMODE=1 python3 -c 'import flask, requests, pymongo' dump
$time PYCDSMODE=2 python3 -c 'import flask, requests, pymongo'
real  0m0.257s
user  0m0.232s
sys   0m0.024s

Startup time benefits: 15.18% reduction

Summary

Compared to the existing approaches, the main contribution of Our CDS approach includes:

  • CDS use the heap object directly, while the memory-mapped implementation in PyICE needs some deserialization

  • CDS doesn't need to generate C source code, thus avoiding using C toolchain for compiling. This is essential for a production environment on the cloud

Considering AppCDS has proved to be successful in OpenJDK 10, we believe our proposal can be a practical feature to enhance CPython startup performance, even while our overall design is still evolving.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions