Skip to content

ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers #115634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dalleyg opened this issue Feb 18, 2024 · 15 comments
Open
Assignees
Labels
topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@dalleyg
Copy link

dalleyg commented Feb 18, 2024

Bug report

Bug description:

Starting in Python 3.11 when the max_tasks_per_child parameter was introduced, ProcessPoolExecutor hangs when max_tasks_per_child>1 and enough tasks have been submitted to trigger a worker restart.

The following reproducer hangs on a fresh installation of Python 3.11 on Linux.

import os
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
    futs = [exe.submit(os.getpid) for _ in range(10)]
    for fut in futs:
        print(fut.result())

Issuing a keyboard interrupt results in the following stack trace:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 451, in result
    self._condition.wait(timeout)
  File "/usr/lib/python3.11/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/usr/lib/python3.11/concurrent/futures/_base.py", line 647, in __exit__
    self.shutdown(wait=True)
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 825, in shutdown
    self._executor_manager_thread.join()
  File "/usr/lib/python3.11/threading.py", line 1112, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.11/threading.py", line 1132, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

Notes:

  • Interleaving submissions and result checking does not help.
  • Adding a timeout to the result() calls does not help.
  • Increasing the pool size does not help, unless it's made large enough to not require any worker restarts.
  • Interestingly, setting max_tasks_per_child=1 works great. It never hangs, and a new process correctly used for each task.
  • It does not hang if max_tasks_per_child is set high enough so that no worker restarts happen.
  • I have reproduced this problem in the following test environments:
    • On GitHub's default Linux CI environment using Python 3.11.
    • On GitHub's default Linux CI environment using Python 3.12.
    • On Ubuntu 22.04.3 LTS running under WSL2 using a fresh installation of Python 3.11.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

Linked PRs

@dalleyg dalleyg added the type-bug An unexpected behavior, bug, or error label Feb 18, 2024
@dalleyg dalleyg changed the title ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted ProcessPoolExecutor hangs when 1<max_tasks_per_child<num_submitted//max_workers Feb 18, 2024
@gaogaotiantian
Copy link
Member

Can't repro this on WSL2 + 3.12/3.11/main. Result:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 286, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/home/gaogaotiantian/programs/mycpython/example.py", line 4, in <module>
    futs = [exe.submit(os.getpid) for _ in range(10)]
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/process.py", line 835, in submit
    self._adjust_process_count()
  File "/usr/lib/python3.12/concurrent/futures/process.py", line 794, in _adjust_process_count
    self._spawn_process()
  File "/usr/lib/python3.12/concurrent/futures/process.py", line 812, in _spawn_process
    p.start()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "/usr/lib/python3.12/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Traceback (most recent call last):
  File "/home/gaogaotiantian/programs/mycpython/example.py", line 6, in <module>
    print(fut.result())
          ^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

@brianschubert
Copy link
Contributor

brianschubert commented Feb 18, 2024

Can't reproduce on Linux + 3.12.1/3.11.7/main. I get the same BrokenProcessPool as @gaogaotiantian, including when max_tasks_per_child=1 or max_tasks_per_child=<very large number>.

@dalleyg
Copy link
Author

dalleyg commented Feb 18, 2024

Presumably getting BrokenProcessPool is also a bug; just a different one. Or am I misunderstanding?

Here are the specific Python version I tested on WSL2 + Ubuntu. It was installed with apt:

% python3.11 --version
Python 3.11.0rc1

@brianschubert
Copy link
Contributor

brianschubert commented Feb 18, 2024

@dalleyg 3.11.0rc1 is a pre-release version of Python 3.11. Are you able to reproduce this with a "full" release version of Python? Preferably one of the latest minor releases, currently 3.12.2 and 3.11.8.

The BrokenProcessPool looks like a side effect of your demo script trying to spawn subprocesses at import time while using the "spawn" start method. See "Safe importing of main module" from the multiprocessing docs as described in the error message.

@dalleyg
Copy link
Author

dalleyg commented Feb 18, 2024

I just downloaded the 3.11.8 tarball from python.org, ran ./configure and make, moved my ~/.local directory out of the way, and I still get the same hanging issue.

~/Python-3.11.8% ./python -c '
import os
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
    futs = [exe.submit(os.getpid) for _ in range(10)]
    for fut in futs:
        print(fut.result())
'
139562
139562
^C^CTraceback (most recent call last):
  File "<string>", line 7, in <module>
  File "/home/dalleyg/projects/Python-3.11.8/Lib/concurrent/futures/_base.py", line 451, in result
    self._condition.wait(timeout)
  File "/home/dalleyg/projects/Python-3.11.8/Lib/threading.py", line 327, in wait
    waiter.acquire()
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/home/dalleyg/projects/Python-3.11.8/Lib/concurrent/futures/_base.py", line 647, in __exit__
    self.shutdown(wait=True)
  File "/home/dalleyg/projects/Python-3.11.8/Lib/concurrent/futures/process.py", line 851, in shutdown
    self._executor_manager_thread.join()
  File "/home/dalleyg/projects/Python-3.11.8/Lib/threading.py", line 1119, in join
    self._wait_for_tstate_lock()
  File "/home/dalleyg/projects/Python-3.11.8/Lib/threading.py", line 1139, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

I get the same hanging result on WSL2 + Ubuntu with:

  • The official Ubuntu Python 3.11 and 3.12 packages (detailed in previous posts).
  • Building Python 3.11.8 from scratch (immediately above).
  • Building Python 3.12.2 from scratch (similar to above).
  • Installing Python 3.11.7 from ppa:deadsnakes (one subminor back version: the latest available).
  • Installing Python 3.12.1 from ppa:deadsnakes (one subminor back version: the latest available).

On the other hand, a colleague is seeing the BrokenProcessPool exception that both of you are getting on:

  • Bare Metal Linux with Python 3.11.7
  • Bare Metal Linux with Python 3.11.8
  • Bare Metal Linux with Python 3.12.2
  • The Docker Python 3.12.2 image from https://hub.docker.com/_/python/

This makes me wonder if the hanging bug is WSL2 and/or Ubuntu-specific, and the BrokenProcessPool bug instead occurs on native Linux systems.

Do you recommend that I fork this issue into two?

  • Edit this issue to be WSL2-specific.
  • Create a new issue for the BrokenProcessPool problem.

@dalleyg
Copy link
Author

dalleyg commented Feb 18, 2024

If I use Docker, I get the BrokenProcessPool exception as well:

% cat issue115634.py
import os
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
    futs = [exe.submit(os.getpid) for _ in range(10)]
    for fut in futs:
        print(fut.result())

% docker run -it --rm --name 115634 -v "$PWD":/usr/src/myapp -w /usr/src/myapp python:3.11 python3 --version
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Python 3.11.8

% docker run -it --rm --name 115634 -v "$PWD":/usr/src/myapp -w /usr/src/myapp python:3.11 python issue115634.py
...SNIP...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

@gaogaotiantian
Copy link
Member

I failed to repro this on WSL2 + Ubuntu 20.04 so I don't believe WSL is the sufficient trigger.

@dalleyg
Copy link
Author

dalleyg commented Feb 18, 2024

Since others can't reproduce my hanging problem, I'll close this issue for now and open a new one for the BrokenProcessPool error.

Thanks for the help!

@dalleyg dalleyg closed this as completed Feb 18, 2024
@dalleyg dalleyg reopened this Feb 18, 2024
@dalleyg dalleyg closed this as not planned Won't fix, can't repro, duplicate, stale Feb 18, 2024
@dalleyg
Copy link
Author

dalleyg commented Feb 18, 2024

Okay, I have a reproducer now, so I'm reopening this.

When running the code as a script, BrokenProcessPool is the result. When running it with python -c, hanging is the result. Here's a bash reproducer for both using Docker images (note: I'm using some syntax here that's not compatible with zsh so be sure to use bash):

echo '
import os
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
    futs = [exe.submit(os.getpid) for _ in range(10)]
    for fut in futs:
        print(fut.result())
' > broken.py

(
    echo "TESTING WITH A SCRIPT"
    for v in 3.10 3.11.8 3.12.2 3.13.0a4; do
        DOCKER="docker run -it --rm --name 115634 -q -v "$PWD":/usr/src/myapp -w /usr/src/myapp python:$v"
        echo "--------------------------------------"
        (
            $DOCKER python3 --version &&
                $DOCKER timeout --signal=SIGINT --kill-after 5s -v 2s python3 broken.py
        ) 2>&1
    done
) | tee broken-script.log

(
    echo "TESTING WITH THE SCRIPT INLINED INTO THE COMMAND"
    for v in 3.10 3.11.8 3.12.2 3.13.0a4; do
        DOCKER="docker run -it --rm --name 115634 -q -v "$PWD":/usr/src/myapp -w /usr/src/myapp python:$v"
        echo "--------------------------------------"
        (
            $DOCKER python3 --version &&
                $DOCKER timeout --signal=SIGINT --kill-after 5s -v 2s python3 -c "$(cat broken.py)"
        ) 2>&1
    done
) | tee broken-inline.log

Here's a snipped version of broken-script.log that shows the eager exceptions. The full tracebacks look like the one shown in #115634 (comment).

% cat broken-script.log
TESTING WITH A SCRIPT
--------------------------------------
Python 3.10.13
Traceback (most recent call last):
  File "/usr/src/myapp/broken.py", line 4, in <module>
    with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
TypeError: ProcessPoolExecutor.__init__() got an unexpected keyword argument 'max_tasks_per_child'
--------------------------------------
Python 3.11.8
...SNIP...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
--------------------------------------
Python 3.12.2
...SNIP...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
--------------------------------------
Python 3.13.0a4
...SNIP...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

And here's what I see in broken-inline.log (complete output):

cat broken-inline.log
TESTING WITH THE SCRIPT INLINED INTO THE COMMAND
--------------------------------------
Python 3.10.13
Traceback (most recent call last):
  File "<string>", line 4, in <module>
TypeError: ProcessPoolExecutor.__init__() got an unexpected keyword argument 'max_tasks_per_child'
--------------------------------------
Python 3.11.8
4
4
timeout: sending signal INT to command ‘python3’
timeout: sending signal KILL to command ‘python3’
--------------------------------------
Python 3.12.2
4
4
timeout: sending signal INT to command ‘python3’
timeout: sending signal KILL to command ‘python3’
--------------------------------------
Python 3.13.0a4
4
4
timeout: sending signal INT to command ‘python3’
timeout: sending signal KILL to command ‘python3’

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Feb 18, 2024

I'm able to repro with

import os
from concurrent.futures import ProcessPoolExecutor
if __name__ == '__main__':
    with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
        futs = [exe.submit(os.getpid) for _ in range(10)]
        for fut in futs:
            print(fut.result())

or even simpler:

from concurrent.futures import ProcessPoolExecutor
if __name__ == '__main__':
    with ProcessPoolExecutor(1, max_tasks_per_child=2) as exe:
        futs = [exe.submit(print, i) for i in range(10)]

@dalleyg
Copy link
Author

dalleyg commented Feb 18, 2024

Confirmed. With the __name__ guard, I get the hanging for both the script and the inlined script.

@gaogaotiantian gaogaotiantian self-assigned this Feb 18, 2024
@gaogaotiantian
Copy link
Member

There's something fundamentally wrong here for the process numbers, I'll work on it.

@gaogaotiantian
Copy link
Member

See #115642 for the detailed explanation and the potential fix.

@dalleyg
Copy link
Author

dalleyg commented Feb 19, 2024

That was quick! I'll watch that PR for a resolution.

In the meantime, I've put a safety check in my code. I'll adjust it once there's a final resolution.

lovetox added a commit to gajim/gajim that referenced this issue Feb 23, 2025
@karolinepauls
Copy link

Would it be possible to add a warning not to use max_tasks_per_child to the documentation until the fix is introduced and backported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants