Multithreading and unsupported platforms

This is opened as the continuation of GH-7962, GH-8161, GH-8162, GH-3981 and GH-4654 and is one of the approaches to solve GH-825.

### Why multithreading?
One common inconvenience with using `pip` is the delay for networking, since most package indices are not really fast<sup>[citation needed]</sup> and during package management `pip` needs to fetch many things (the package list, the packages themselves, etc.).  Parallelization is one obvious solution to tackle this, and I hope it will the cheaper one, hence this issue is open to ensure that the implementation process will not be a labor-expensive work.

Until next year when Python 2 support is dropped, there are two options: multithreading and multiprocessing.  While the latter is safer, (1) not every platform has multiple CPU cores and (2) the modified code will need to undergo a huge refactoring to give each core the data it needs.  So we are left with multiprocessing.  The Python 3 `asyncio` immediate solution however (plus it also require making many existing routines awaitable).

### What is the problem with multithreading?
Putting thread-safety aside (not because it's not a problem, but rather because I think everyone knows how problematic it is), the most obvious solution provided by Python `multiprocessing.dummy.Pool` requires `sem_open` ([bpo-3770](https://bugs.python.org/issue3770)), which seems to raises `ImportError` [during initialization of the pool's attributes](https://github.com/python/cpython/search?q=%22synchronize+import%22).  Since `sem_open` is to be provided by the operating system, this raises the question that whether `multiprocessing.dummy` is supported on platforms that `pip` care to support and is (the more generic?) `threading` suffers the same issue if we implement the `Pool` ourselves.  How about `concurrent.futures` (GH-3981)?  Would it be worth it to do it, from the developers' perspective as well as that of our users, if things go wrong on their platform?

### If we decide to do it anyway, how?
From GH-8162, IMHO it is safe to assume that (this is a really dangerous thing to say :disappointed:) we can fallback to `map` if `multiprocessing.dummy.Pool` can't have `sem_open`.  If this works, personally I suggest to declare a higher order function to reuse in other places, namely for parallel downloading of packages (GH-825).  Still under the assumption that this is correct, we can easily mock the failing behavior for testing.  However, with my modest experience in threading and the overwhelming responsibility of not breaking thousands<sup>[citation needed, could be millions]</sup> of people's workflows, please do not take my words for granted and kindly share your thoughts on this particular matter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multithreading and unsupported platforms #8169

Why multithreading?

What is the problem with multithreading?

If we decide to do it anyway, how?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multithreading and unsupported platforms #8169

Description

Why multithreading?

What is the problem with multithreading?

If we decide to do it anyway, how?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions