-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
First of all, kudos guys on the v20+ release 👍 , wheel caching is much better and the addition of cached built wheels is an awesome time saver.
TL;DR
Reference implementation for pip install --parallel
, which will download inspect and install packages in parallel, highly accelerating the installation environment of an entire project (or actually any requirements.txt
file).
A few speedup numbers (Ubuntu w/ python 3.6):
Requirements variant | Parallel+D/L | PyPi v20.1+D/L | Speedup incl' Download | Parallel | PyPi v20.1 | Speedup Cached Wheels
--------------------- | ------------ | -------------- | ---------------------- | -------- | ---------- | ----------------------
A | 0:58.14 | 1:51.08 | x1.90 | 0:16.16 | 0:33.57 | x2.08
B | 5:47.36 | 6:55.49 | x1.19 | 0:52.19 | 1:12.07 | x1.37
C | 0:56.08 | 1:44.34 | x1.86 | 0:14.36 | 0:29.21 | x2.01
D | 0:36.45 | 1:39.55 | x2.71 | 0:14.59 | 0:33.20 | x2.22
--------------------- | ------------ | -------------- | ---------------------- | -------- | ---------- | ----------------------
Average | | | x1.91 | | | x1.92
Details:
We heavily rely on pip to set up environments on our ever changing production systems.
Specifically either setting a new virtual environment or inside a docker, the full implementation of our ML/DL agent is open-source and can be found here: https://github.com/allegroai/trains
With the introduction of pip 20.1 we decided to try and parallelize the pip install process in order to save precious spin up time.
- The package resolving & download was moved into a
ThreadPool
(RequirementSet
was added a global lock). The only caveat is the download progress bar (more on that down below).
The reason for using Threads is the fact that most of the speedup here is Network and I/O which are parallelized well on python Threads. The second reason is the way RequirementSet
is constancy growing with every package discovered, and it's requirements. Sharing the set among all threads
is quite trivial, as opposed to sharing them across processes.
- Wheel files (only) installation process moved to a Process
Pool
, as installing the wheels will not execute any script, and can be parallelized without risk.
The reason for choosing Processes over Threads is the limiting factor here is actually CPU cycles in the installation process. Also the Wheel unpacking and copying is totally independent from one another and the rest of the process, and lastly order has no meaning in this stage as the requirements order is only important when building packages.
-
Unpacking the wheels was optimized to reuse the unpacked folders, this happens between the inspection part (
resolver
) on a cached wheel and the installation of the same wheel (essentially what happened was the wheel was unpacked twice) -
Solving the parallel progress bar: The progress bar has now a global lock, the first progress-bar to acquire it, will be the one outputting the progress to the tty. Once the progress-bar is done, another instance can acquire the lock and will continue to report from its current progress stage.
This looks something like:
|████████████████████████████████| 101 kB 165 kB/s
|█ | 71 kB 881 kB/s eta 0:00:03
The full reference code can be found here
Usage example:
pip install --parallel -r requirements.txt