Skip to content

Parallelizing the install process + PoC! #8187

@bmartinn

Description

@bmartinn

First of all, kudos guys on the v20+ release 👍 , wheel caching is much better and the addition of cached built wheels is an awesome time saver.

TL;DR
Reference implementation for pip install --parallel , which will download inspect and install packages in parallel, highly accelerating the installation environment of an entire project (or actually any requirements.txt file).
A few speedup numbers (Ubuntu w/ python 3.6):

Requirements variant  | Parallel+D/L | PyPi v20.1+D/L | Speedup incl' Download | Parallel | PyPi v20.1 | Speedup Cached Wheels
--------------------- | ------------ | -------------- | ---------------------- | -------- | ---------- | ----------------------
A                     | 0:58.14      | 1:51.08        | x1.90                  | 0:16.16  | 0:33.57    | x2.08
B                     | 5:47.36      | 6:55.49        | x1.19                  | 0:52.19  | 1:12.07    | x1.37
C                     | 0:56.08      | 1:44.34        | x1.86                  | 0:14.36  | 0:29.21    | x2.01
D                     | 0:36.45      | 1:39.55        | x2.71                  | 0:14.59  | 0:33.20    | x2.22
--------------------- | ------------ | -------------- | ---------------------- | -------- | ---------- | ----------------------
Average               |              |                | x1.91                  |          |            | x1.92

Details:
We heavily rely on pip to set up environments on our ever changing production systems.
Specifically either setting a new virtual environment or inside a docker, the full implementation of our ML/DL agent is open-source and can be found here: https://github.com/allegroai/trains

With the introduction of pip 20.1 we decided to try and parallelize the pip install process in order to save precious spin up time.

  1. The package resolving & download was moved into a ThreadPool (RequirementSet was added a global lock). The only caveat is the download progress bar (more on that down below).

The reason for using Threads is the fact that most of the speedup here is Network and I/O which are parallelized well on python Threads. The second reason is the way RequirementSet is constancy growing with every package discovered, and it's requirements. Sharing the set among all threads
is quite trivial, as opposed to sharing them across processes.

  1. Wheel files (only) installation process moved to a Process Pool, as installing the wheels will not execute any script, and can be parallelized without risk.

The reason for choosing Processes over Threads is the limiting factor here is actually CPU cycles in the installation process. Also the Wheel unpacking and copying is totally independent from one another and the rest of the process, and lastly order has no meaning in this stage as the requirements order is only important when building packages.

  1. Unpacking the wheels was optimized to reuse the unpacked folders, this happens between the inspection part (resolver) on a cached wheel and the installation of the same wheel (essentially what happened was the wheel was unpacked twice)

  2. Solving the parallel progress bar: The progress bar has now a global lock, the first progress-bar to acquire it, will be the one outputting the progress to the tty. Once the progress-bar is done, another instance can acquire the lock and will continue to report from its current progress stage.
    This looks something like:

|████████████████████████████████| 101 kB 165 kB/s 
|█                           | 71 kB 881 kB/s eta 0:00:03

The full reference code can be found here
Usage example:

pip install --parallel -r requirements.txt

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions