Description
Hi!
I got an issue using Python 3.9.16 and 3.9.17
Launching this code (inspired by #90622)
from concurrent import futures
import faulthandler
import asyncio
def work(iteration, item):
#anything here, even a pass
pass
async def do_one(i):
faulthandler.dump_traceback_later(20, repeat=True)
with futures.ProcessPoolExecutor() as executor:
loop = asyncio.get_event_loop()
loop.run_in_executor(executor, work, i, 1)
print(f"executor: max_workers={executor._max_workers}, processes={executor._processes}, method={executor._mp_context.get_start_method(allow_none=False)}, safe={executor._safe_to_dynamically_spawn_children}")
faulthandler.cancel_dump_traceback_later()
async def run():
for i in range(10000):
await do_one(i)
async def main():
task = asyncio.create_task(run())
await asyncio.gather(task)
if __name__ == "__main__":
asyncio.run(main())
Causes a deadlock sometimes.
The faulthandler dumps the traceback below on timeout:
Thread 0x0000004018db2700 (most recent call first):
File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 27 in poll
File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 43 in wait
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 149 in join
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 521 in join_executor_internals
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 343 in run
File "/usr/local/lib/python3.9/threading.py", line 980 in _bootstrap_inner
File "/usr/local/lib/python3.9/threading.py", line 937 in _bootstrap
Thread 0x0000004002fcf3c0 (most recent call first):
File "/usr/local/lib/python3.9/threading.py", line 1080 in _wait_for_tstate_lock
File "/usr/local/lib/python3.9/threading.py", line 1060 in join
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 767 in shutdown
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 637 in __exit__
File "//t.py", line 17 in do_one
File "//t.py", line 22 in run
File "/usr/local/lib/python3.9/asyncio/events.py", line 80 in _run
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1905 in _run_once
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 601 in run_forever
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 634 in run_until_complete
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44 in run
File "//t.py", line 31 in <module>
The pstree shows the following:
root@6b8cab06d7d7:/# pstree -pst
bash(1)─┬─python(4650)─┬─python(11873)───{python}(11875)
│ ├─{python}(4652)
│ └─{python}(11874)
└─{bash}(8)
Running under docker launched in QEMU on Mac OS M1:
root@6b8cab06d7d7:/# uname -a
Linux 6b8cab06d7d7 5.15.68-0-virt #1-Alpine SMP Fri, 16 Sep 2022 06:29:31 +0000 x86_64 x86_64 x86_64 GNU/Linux
Tried to analyze under strace and suspect that the problem, here's the last output, after that it simply hangs:
37 close(7) = 0
37 close(6) = 0
37 getpid() = 37
304 write(11,0x630d440,104) = 104
304 read(8,0x5e2c410,4) = 4
304 read(8,0x5e2c410,4) = 4
304 getpid() = 304
304 write(11,0x5e427d0,19) = 19
304 exit_group(0)
37 wait4(304,275152568292,0,0,0,39) = 304
37 getpid() = 37
Having child processes with pids 304 and 305, on interrupting my script I got from strace that it hung on wait4 of pid = 305.
^C37 wait4(305,275152568292,0,0,0,39) = -1 errno=4 (Interrupted system call)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL, si_pid=0, si_uid=0} ---
304 exits normally after SIGINT, 305 not, still hangs.
Did not check versions upper than 3.9.x. However, I do not exclude that the problem can be reproducible on higher versions of Python.
UPD. Reproduced the same on Python 3.10.10:
Thread 0x00000040106b5700 (most recent call first):
File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 27 in poll
File "/usr/local/lib/python3.10/multiprocessing/popen_fork.py", line 43 in wait
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 149 in join
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 521 in join_executor_internals
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 343 in run
File "/usr/local/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/usr/local/lib/python3.10/threading.py", line 973 in _bootstrap
Thread 0x0000004002fcd3c0 (most recent call first):
File "/usr/local/lib/python3.10/threading.py", line 1116 in _wait_for_tstate_lock
File "/usr/local/lib/python3.10/threading.py", line 1096 in join
File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 775 in shutdown
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 649 in __exit__
File "//t.py", line 14 in do_one
File "//t.py", line 22 in run
File "/usr/local/lib/python3.10/asyncio/events.py", line 80 in _run
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 1909 in _run_once
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 603 in run_forever
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 636 in run_until_complete
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44 in run
File "//t.py", line 31 in <module>
UPD2. Just investigated, that when issue happens, one of the children isn't even introduced in strace except wait4.
UPD3. Some logging from multiprocessing:
[DEBUG/MainProcess] created semlock with handle 275333459968
[DEBUG/MainProcess] created semlock with handle 275333464064
[DEBUG/MainProcess] created semlock with handle 275333468160
[DEBUG/MainProcess] created semlock with handle 275333472256
[DEBUG/MainProcess] created semlock with handle 275333476352
[DEBUG/ForkProcess-3332] Queue._after_fork()
[INFO/ForkProcess-3332] child process calling self.run()
executor: max_workers=2, processes={32178: <ForkProcess name='ForkProcess-3331' pid=32178 parent=22185 started>, 32179: <ForkProcess name='ForkProcess-3332' pid=32179 parent=22185 started>}, method=fork, safe=False
[DEBUG/MainProcess] Queue._start_thread()
[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/ForkProcess-3331] Queue._after_fork()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()
[INFO/ForkProcess-3331] child process calling self.run()
[DEBUG/MainProcess] telling queue thread to quit
[INFO/ForkProcess-3332] process shutting down
[DEBUG/ForkProcess-3332] running all "atexit" finalizers with priority >= 0
[DEBUG/ForkProcess-3332] running the remaining "atexit" finalizers
[DEBUG/MainProcess] Queue.join_thread()
[INFO/ForkProcess-3332] process exiting with exitcode 0
[DEBUG/MainProcess] joining queue thread
[INFO/ForkProcess-3331] process shutting down
[DEBUG/MainProcess] feeder thread got sentinel -- exiting
[DEBUG/MainProcess] ... queue thread joined
[DEBUG/ForkProcess-3331] running all "atexit" finalizers with priority >= 0
[DEBUG/ForkProcess-3331] running the remaining "atexit" finalizers
[INFO/ForkProcess-3331] process exiting with exitcode 0
[DEBUG/MainProcess] created semlock with handle 275333480448
[DEBUG/MainProcess] created semlock with handle 275333484544
[DEBUG/MainProcess] created semlock with handle 275333488640
[DEBUG/MainProcess] created semlock with handle 275333492736
[DEBUG/MainProcess] created semlock with handle 275333496832
executor: max_workers=2, processes={32184: <ForkProcess name='ForkProcess-3333' pid=32184 parent=22185 started>, 32186: <ForkProcess name='ForkProcess-3334' pid=32186 parent=22185 started>}, method=fork, safe=False
[DEBUG/MainProcess] Queue._start_thread()
[DEBUG/MainProcess] doing self._thread.start()
[DEBUG/MainProcess] starting thread to feed data to pipe
[DEBUG/MainProcess] ... done self._thread.start()
[DEBUG/ForkProcess-3334] Queue._after_fork()
[INFO/ForkProcess-3334] child process calling self.run()
[DEBUG/MainProcess] telling queue thread to quit
[DEBUG/MainProcess] Queue.join_thread()
[DEBUG/MainProcess] joining queue thread
[DEBUG/MainProcess] feeder thread got sentinel -- exiting
[INFO/ForkProcess-3334] process shutting down
[DEBUG/MainProcess] ... queue thread joined
[DEBUG/ForkProcess-3334] running all "atexit" finalizers with priority >= 0
[DEBUG/ForkProcess-3334] running the remaining "atexit" finalizers
[INFO/ForkProcess-3334] process exiting with exitcode 0
Metadata
Metadata
Assignees
Projects
Status