Skip to content

Deadlock on using ProcessPoolExecutor in fork mp context #105464

Open
@kekekekule

Description

@kekekekule

Hi!
I got an issue using Python 3.9.16 and 3.9.17

Launching this code (inspired by #90622)

from concurrent import futures
import faulthandler
import asyncio

def work(iteration, item):
    #anything here, even a pass
    pass


async def do_one(i):
    faulthandler.dump_traceback_later(20, repeat=True)
    with futures.ProcessPoolExecutor() as executor:
        loop = asyncio.get_event_loop()
        loop.run_in_executor(executor, work, i, 1)
        print(f"executor: max_workers={executor._max_workers}, processes={executor._processes}, method={executor._mp_context.get_start_method(allow_none=False)}, safe={executor._safe_to_dynamically_spawn_children}")
    faulthandler.cancel_dump_traceback_later()

async def run():
    for i in range(10000):
        await do_one(i)


async def main():
    task = asyncio.create_task(run())
    await asyncio.gather(task)

if __name__ == "__main__":
    asyncio.run(main())

Causes a deadlock sometimes.

The faulthandler dumps the traceback below on timeout:

Thread 0x0000004018db2700 (most recent call first):
  File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 27 in poll
  File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 43 in wait
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 149 in join
  File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 521 in join_executor_internals
  File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 343 in run
  File "/usr/local/lib/python3.9/threading.py", line 980 in _bootstrap_inner
  File "/usr/local/lib/python3.9/threading.py", line 937 in _bootstrap

Thread 0x0000004002fcf3c0 (most recent call first):
  File "/usr/local/lib/python3.9/threading.py", line 1080 in _wait_for_tstate_lock
  File "/usr/local/lib/python3.9/threading.py", line 1060 in join
  File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 767 in shutdown
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 637 in __exit__
  File "//t.py", line 17 in do_one
  File "//t.py", line 22 in run
  File "/usr/local/lib/python3.9/asyncio/events.py", line 80 in _run
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1905 in _run_once
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 601 in run_forever
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 634 in run_until_complete
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44 in run
  File "//t.py", line 31 in <module>

The pstree shows the following:

root@6b8cab06d7d7:/# pstree -pst
bash(1)─┬─python(4650)─┬─python(11873)───{python}(11875)
        │              ├─{python}(4652)
        │              └─{python}(11874)
        └─{bash}(8)

Running under docker launched in QEMU on Mac OS M1:

root@6b8cab06d7d7:/# uname -a
Linux 6b8cab06d7d7 5.15.68-0-virt #1-Alpine SMP Fri, 16 Sep 2022 06:29:31 +0000 x86_64 x86_64 x86_64 GNU/Linux

Tried to analyze under strace and suspect that the problem, here's the last output, after that it simply hangs:

37 close(7) = 0
37 close(6) = 0
37 getpid() = 37
304 write(11,0x630d440,104) = 104
304 read(8,0x5e2c410,4) = 4
304 read(8,0x5e2c410,4) = 4
304 getpid() = 304
304 write(11,0x5e427d0,19) = 19
304 exit_group(0)
37 wait4(304,275152568292,0,0,0,39) = 304
37 getpid() = 37

Having child processes with pids 304 and 305, on interrupting my script I got from strace that it hung on wait4 of pid = 305.

^C37 wait4(305,275152568292,0,0,0,39) = -1 errno=4 (Interrupted system call)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_pid=0, si_uid=0} ---

304 exits normally after SIGINT, 305 not, still hangs.

Did not check versions upper than 3.9.x. However, I do not exclude that the problem can be reproducible on higher versions of Python.

UPD. Reproduced the same on Python 3.10.10:

Thread 0x00000040106b5700 (most recent call first):
  File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 27 in poll
  File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 43 in wait
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 149 in join
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 521 in join_executor_internals
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 343 in run
  File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x0000004002fcd3c0 (most recent call first):
  File "/usr/local/lib/python3.10/threading.py", line 1116 in _wait_for_tstate_lock
  File "/usr/local/lib/python3.10/threading.py", line 1096 in join
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 775 in shutdown
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 649 in __exit__
  File "//t.py", line 14 in do_one
  File "//t.py", line 22 in run
  File "/usr/local/lib/python3.10/asyncio/events.py", line 80 in _run
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1909 in _run_once
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 603 in run_forever
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 636 in run_until_complete
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44 in run
  File "//t.py", line 31 in <module>

UPD2. Just investigated, that when issue happens, one of the children isn't even introduced in strace except wait4.
UPD3. Some logging from multiprocessing:

[DEBUG/MainProcess] created semlock with handle 275333459968
[DEBUG/MainProcess] created semlock with handle 275333464064
[DEBUG/MainProcess] created semlock with handle 275333468160
[DEBUG/MainProcess] created semlock with handle 275333472256
[DEBUG/MainProcess] created semlock with handle 275333476352
[DEBUG/ForkProcess-3332] Queue._after_fork()
[INFO/ForkProcess-3332] child process calling self.run()
executor: max_workers=2, processes={32178: <ForkProcess name='ForkProcess-3331' pid=32178 parent=22185 started>, 32179: <ForkProcess name='ForkProcess-3332' pid=32179 parent=22185 started>}, method=fork, safe=False
[DEBUG/MainProcess] Queue._start_thread()
[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/ForkProcess-3331] Queue._after_fork()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()
[INFO/ForkProcess-3331] child process calling self.run()
[DEBUG/MainProcess] telling queue thread to quit
[INFO/ForkProcess-3332] process shutting down
[DEBUG/ForkProcess-3332] running all "atexit" finalizers with priority >= 0
[DEBUG/ForkProcess-3332] running the remaining "atexit" finalizers
[DEBUG/MainProcess] Queue.join_thread()
[INFO/ForkProcess-3332] process exiting with exitcode 0
[DEBUG/MainProcess] joining queue thread
[INFO/ForkProcess-3331] process shutting down
[DEBUG/MainProcess] feeder thread got sentinel -- exiting
[DEBUG/MainProcess] ... queue thread joined
[DEBUG/ForkProcess-3331] running all "atexit" finalizers with priority >= 0
[DEBUG/ForkProcess-3331] running the remaining "atexit" finalizers
[INFO/ForkProcess-3331] process exiting with exitcode 0
[DEBUG/MainProcess] created semlock with handle 275333480448
[DEBUG/MainProcess] created semlock with handle 275333484544
[DEBUG/MainProcess] created semlock with handle 275333488640
[DEBUG/MainProcess] created semlock with handle 275333492736
[DEBUG/MainProcess] created semlock with handle 275333496832
executor: max_workers=2, processes={32184: <ForkProcess name='ForkProcess-3333' pid=32184 parent=22185 started>, 32186: <ForkProcess name='ForkProcess-3334' pid=32186 parent=22185 started>}, method=fork, safe=False
[DEBUG/MainProcess] Queue._start_thread()
[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()
[DEBUG/ForkProcess-3334] Queue._after_fork()
[INFO/ForkProcess-3334] child process calling self.run()
[DEBUG/MainProcess] telling queue thread to quit
[DEBUG/MainProcess] Queue.join_thread()
[DEBUG/MainProcess] joining queue thread
[DEBUG/MainProcess] feeder thread got sentinel -- exiting
[INFO/ForkProcess-3334] process shutting down
[DEBUG/MainProcess] ... queue thread joined
[DEBUG/ForkProcess-3334] running all "atexit" finalizers with priority >= 0
[DEBUG/ForkProcess-3334] running the remaining "atexit" finalizers
[INFO/ForkProcess-3334] process exiting with exitcode 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions