Concurrent futures MultiProc backend failing to constrain resources #2700

effigies · 2018-09-11T17:12:06Z

Summary

We've recently had 3 separate reports that fMRIPrep is running into memory allocation errors, since moving to the new MultiProc backend, based on concurrent.futures. In at least two of the cases, setting the plugin to LegacyMultiProc resolved the issue.

This suggests that the measures put into place to reduce the memory footprint of subprocesses no longer work for concurrent.futures.

Related: nipreps/fmriprep#1259

The text was updated successfully, but these errors were encountered:

dPys · 2018-09-19T01:52:35Z

Hi @effigies , I can also confirm that these errors do not appear to be specific to fmriprep either:

When running pynets on HPC using a forkserver and restricted to the available resources on a single compute node:

exception calling callback for <Future at 0x2adf42c4d7f0 state=finished raised BrokenProcessPool>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
    callback(self)
  File "/opt/conda/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 143, in _async_callback
    result = args.result()
  File "/opt/conda/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

This tends to occur at varying points during the workflow depending on the resources allocated to MultiProc at runtime.

As you noted, using LegacyMultiProc appears to resolve the issue.

-Derek

This PR relates to nipy#2700, and should fix the problem underlying nipy#2548. I first considered adding a control thread that monitors the `Pool` of workers, but that would require a large overhead keeping track of PIDs and polling very often. Just adding the core file of [bpo-22393](python/cpython#10441) should fix nipy#2548

oesteban mentioned this issue Nov 9, 2018

[FIX] LegacyMultiProc hangs up indefinitely #2773

Closed

1 task

shnizzedy mentioned this issue Nov 12, 2020

🐛 Node copy_xform failed to run on host FCP-INDI/C-PAC#1362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concurrent futures MultiProc backend failing to constrain resources #2700

Concurrent futures MultiProc backend failing to constrain resources #2700

effigies commented Sep 11, 2018

dPys commented Sep 19, 2018

Uh oh!

Concurrent futures MultiProc backend failing to constrain resources #2700

Concurrent futures MultiProc backend failing to constrain resources #2700

Comments

effigies commented Sep 11, 2018

Summary

dPys commented Sep 19, 2018

Uh oh!