Skip to content

Concurrent futures MultiProc backend failing to constrain resources #2700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
effigies opened this issue Sep 11, 2018 · 1 comment
Open

Comments

@effigies
Copy link
Member

Summary

We've recently had 3 separate reports that fMRIPrep is running into memory allocation errors, since moving to the new MultiProc backend, based on concurrent.futures. In at least two of the cases, setting the plugin to LegacyMultiProc resolved the issue.

This suggests that the measures put into place to reduce the memory footprint of subprocesses no longer work for concurrent.futures.

Related: nipreps/fmriprep#1259

@dPys
Copy link

dPys commented Sep 19, 2018

Hi @effigies , I can also confirm that these errors do not appear to be specific to fmriprep either:

When running pynets on HPC using a forkserver and restricted to the available resources on a single compute node:

exception calling callback for <Future at 0x2adf42c4d7f0 state=finished raised BrokenProcessPool>
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/concurrent/futures/_base.py", line 324, in _invoke_callbacks
    callback(self)
  File "/opt/conda/lib/python3.6/site-packages/nipype/pipeline/plugins/multiproc.py", line 143, in _async_callback
    result = args.result()
  File "/opt/conda/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/opt/conda/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

This tends to occur at varying points during the workflow depending on the resources allocated to MultiProc at runtime.

As you noted, using LegacyMultiProc appears to resolve the issue.

-Derek

oesteban added a commit to oesteban/nipype that referenced this issue Nov 9, 2018
This PR relates to nipy#2700, and should fix the problem
underlying nipy#2548.

I first considered adding a control thread that monitors
the `Pool` of workers, but that would require a large overhead
keeping track of PIDs and polling very often.

Just adding the core file of [bpo-22393](python/cpython#10441)
should fix nipy#2548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants