Skip to content

nbdev_prepare fails with concurrent.futures.process.BrokenProcessPool exception #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MichaelJFishmanBA opened this issue Jul 27, 2022 · 24 comments

Comments

@MichaelJFishmanBA
Copy link

nbdev_prepare is giving me a concurrent.futures.process.BrokenProcessPool on a new project.

Environment

Python: 3.8.13
nbdev: 2.0.1
OS: macOS Monterey 12.4
Processor: Apple M1

Steps to produce this error

gh repo create # then followed prompts
cd nbdev_test # the name of the repo I made
nbev_new
nbdev_install_hooks
nbdev_prepare

Error message

objc[25641]: +[NSMutableString initialize] may have been in progress in another thread when fork() was called.
objc[25641]: +[NSMutableString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
Traceback (most recent call last):
  File "/Users/michael.fishman/miniconda3/envs/ml8/bin/nbdev_prepare", line 8, in <module>
    sys.exit(prepare())
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/nbdev/shortcuts.py", line 98, in prepare
    _c(nbdev_test)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/nbdev/shortcuts.py", line 20, in _c
    def _c(f, *args, **kwargs): return f.__wrapped__(*args, **kwargs)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/nbdev/test.py", line 86, in nbdev_test
    results = parallel(test_nb, files, skip_flags=skip_flags, force_flags=force_flags, n_workers=n_workers, pause=pause, do_print=do_print)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/fastcore/parallel.py", line 117, in parallel
    return L(r)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/fastcore/foundation.py", line 98, in __call__
    return super().__call__(x, *args, **kwargs)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/fastcore/foundation.py", line 106, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/site-packages/fastcore/basics.py", line 63, in listify
    elif is_iter(o): res = list(o)
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/Users/michael.fishman/miniconda3/envs/ml8/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
@warner-benjamin
Copy link

This might be due to #648 (the second comment)

nbdev_new doesn't convert tst_flags from nbdev1's pipe format slow|cuda to nbdev2 format slow cuda. The nbdev1 format can cause broken pool errors in addition to test errors.

Can you try that and see if it works?

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

I'm getting this too but while developing on the nbdev repo itself, which I assume doesn't have the tst_flags issue you mentioned above. What's interesting is that it only happens with nbdev_prepare and even though prepare is simply:

#|export
def prepare():
    "Export, test, and clean notebooks"
    _c(nbdev_export)
    _c(nbdev_test)
    _c(nbdev_clean)

running each of nbdev_export, nbdev_test, nbdev_clean works fine from the CLI

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

If you set threadpool=True here: https://github.com/fastai/nbdev/blob/master/nbdev/test.py#L86 to tell fastcore to use ThreadPoolExecutor vs ProcessPoolExecutor the error goes away.

ProcessPoolExecutor comes with a warning: https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor

The main module must be importable by worker subprocesses. This means that ProcessPoolExecutor will not work in the interactive interpreter.

which could be related to the issue here.

The final question is why the same issue isn't triggered when running nbdev_test on its own. It's only triggered when running nbdev_test from within nbdev_prepare (which kind of sounds like the warning from ProcessPoolExecutor).

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

In a Python file e.g. foo.py add:

from nbdev.test import nbdev_test
nbdev_test()

and run it via python foo.py. This will succeed.

Open an IPython repl and run the exact same code and it fails the same way as nbdev_prepare.

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

Issue comes from this line: https://github.com/fastai/nbdev/blob/master/nbdev/test.py#L38

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

In 16_shortcuts.ipynb if you comment out all the imports except the ones actually needed by prepare e.g:

# import sys, shutil
# from pkg_resources import iter_entry_points as ep
# from os import system
# from fastcore.utils import *
# from fastcore.script import *

# from nbdev.read import get_config
from nbdev.test import nbdev_test
from nbdev.clean import nbdev_clean
from nbdev.doclinks import nbdev_export
# from nbdev.cli import *

(and comment out a few other things that fail at import time e.g. @call_parse (unrelated function)).

Then nbdev_prepare works fine. So there's some import that is having a side effect causing issues with process pool

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

Just commenting out from nbdev.cli import * resolves the issue, although cli has a ton of imports

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

In nbdev.cli you just have to comment out from fastcore.net import *. This makes sense as fastcore.net imports from .parallel import *. All the other imports in .net look fine and parallel is exactly the issue we're having.

So I think if we only do from fastcore.net import * in the function that needs it would fix the problem. And the problem seems to be multiple imports of the parallel stuff by the forked processes?

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

Unfortunately with wildcard imports it's very difficult to tell what imports are used (unless I haven't figured out the right tools?). I think it's just these two functions: https://github.com/fastai/nbdev/blob/master/nbdev/cli.py#L167 and https://github.com/fastai/nbdev/blob/master/nbdev/cli.py#L167

@jph00
Copy link
Contributor

jph00 commented Jul 28, 2022

Unfortunately with wildcard imports it's very difficult to tell what imports are used (unless I haven't figured out the right tools?)

Heh - no it's not you, it's me. ;) I'm planning to write a tool that optionally converts them to explicit imports. Might be nice to also have something which simply reports what's being used.

@MichaelJFishmanBA
Copy link
Author

MichaelJFishmanBA commented Jul 28, 2022

Thanks for looking into this!

@warner-benjamin The settings.ini file for my project has

tst_flags = notest

I'm not sure if/how that should be modified.

@dleen

Based on the following snippet, I also think cli.py is just using urlopen and urljson from fastcore.net:

fastcore_net_exports = ['url_default_headers', 'ExceptionsHTTP', 'urlquote', 'urlwrap', 'HTTP4xxClientError', 'HTTP5xxServerError', 'urlopen',
           'urlread', 'urljson', 'urlcheck', 'urlclean', 'urlretrieve', 'urldest', 'urlsave', 'urlvalid', 'urlrequest',
           'urlsend', 'do_request', 'start_server', 'start_client', 'HTTP400BadRequestError',
           'HTTP401UnauthorizedError', 'HTTP402PaymentRequiredError', 'HTTP403ForbiddenError', 'HTTP404NotFoundError',
           'HTTP405MethodNotAllowedError', 'HTTP406NotAcceptableError', 'HTTP407ProxyAuthRequiredError',
           'HTTP408RequestTimeoutError', 'HTTP409ConflictError', 'HTTP410GoneError', 'HTTP411LengthRequiredError',
           'HTTP412PreconditionFailedError', 'HTTP413PayloadTooLargeError', 'HTTP414URITooLongError',
           'HTTP415UnsupportedMediaTypeError', 'HTTP416RangeNotSatisfiableError', 'HTTP417ExpectationFailedError',
           'HTTP418AmAteapotError', 'HTTP421MisdirectedRequestError', 'HTTP422UnprocessableEntityError',
           'HTTP423LockedError', 'HTTP424FailedDependencyError', 'HTTP425TooEarlyError', 'HTTP426UpgradeRequiredError',
           'HTTP428PreconditionRequiredError', 'HTTP429TooManyRequestsError', 'HTTP431HeaderFieldsTooLargeError',
           'HTTP451LegalReasonsError']

with open("nbdev/cli.py") as f:
    s_code = f.read()

for x in fastcore_net_exports:
    if x in s_code:
        print(x)

I agree it would be nice if an IDE could help with this import tracking.

After commenting out the fastcore.net import and copying code from fastcore.net into cli.py as needed, I still get the same error.

Code I copied in:

# Copied from fastcore.net
import urllib
from urllib.parse import urlencode,urlparse,urlunparse
from urllib.request import Request
import json
_opener = urllib.request.build_opener()

def urlquote(url):
    "Update url's path with `urllib.parse.quote`"
    subdelims = "!$&'()*+,;="
    gendelims = ":?#[]@"
    safe = subdelims+gendelims+"%/"
    p = list(urlparse(url))
    p[2] = urllib.parse.quote(p[2], safe=safe)
    for i in range(3,6): p[i] = urllib.parse.quote(p[i], safe=safe)
    return urlunparse(p)


def urlwrap(url, data=None, headers=None):
    "Wrap `url` in a urllib `Request` with `urlquote`"
    return url if isinstance(url,Request) else Request(urlquote(url), data=data, headers=headers or {})


def urlopen(url, data=None, headers=None, timeout=None, **kwargs):
    "Like `urllib.request.urlopen`, but first `urlwrap` the `url`, and encode `data`"
    if kwargs and not data: data=kwargs
    if data is not None:
        if not isinstance(data, (str,bytes)): data = urlencode(data)
        if not isinstance(data, bytes): data = data.encode('ascii')
    return _opener.open(urlwrap(url, data=data, headers=headers), timeout=timeout)

class HTTP4xxClientError(HTTPError):
    "Base class for client exceptions (code 4xx) from `url*` functions"
    pass

ExceptionsHTTP = {}

_httperrors = (
    (400,'Bad Request'),(401,'Unauthorized'),(402,'Payment Required'),(403,'Forbidden'),(404,'Not Found'),
    (405,'Method Not Allowed'),(406,'Not Acceptable'),(407,'Proxy Auth Required'),(408,'Request Timeout'),
    (409,'Conflict'),(410,'Gone'),(411,'Length Required'),(412,'Precondition Failed'),(413,'Payload Too Large'),
    (414,'URI Too Long'),(415,'Unsupported Media Type'),(416,'Range Not Satisfiable'),(417,'Expectation Failed'),
    (418,'Am A teapot'),(421,'Misdirected Request'),(422,'Unprocessable Entity'),(423,'Locked'),(424,'Failed Dependency'),
    (425,'Too Early'),(426,'Upgrade Required'),(428,'Precondition Required'),(429,'Too Many Requests'),
    (431,'Header Fields Too Large'),(451,'Legal Reasons')
)

for code,msg in _httperrors:
    nm = f'HTTP{code}{msg.replace(" ","")}Error'
    cls = get_class(nm, 'url', 'hdrs', 'fp', sup=HTTP4xxClientError, msg=msg, code=code)
    globals()[nm] = ExceptionsHTTP[code] = cls

def urlread(url, data=None, headers=None, decode=True, return_json=False, return_headers=False, timeout=None, **kwargs):
    "Retrieve `url`, using `data` dict or `kwargs` to `POST` if present"
    try:
        with urlopen(url, data=data, headers=headers, timeout=timeout, **kwargs) as u: res,hdrs = u.read(),u.headers
    except HTTPError as e:
        if 400 <= e.code < 500: raise ExceptionsHTTP[e.code](e.url, e.hdrs, e.fp) from None
        else: raise

    if decode: res = res.decode()
    if return_json: res = loads(res)
    return (res,dict(hdrs)) if return_headers else res

def urljson(url, data=None, timeout=None):
    "Retrieve `url` and decode json"
    res = urlread(url, data=data, timeout=timeout)
    return json.loads(res) if res else {}


# End of fastcore.net stuff

@jph00
Copy link
Contributor

jph00 commented Jul 28, 2022

This is fixed now. Thanks for all your helpful investigations

@jph00 jph00 closed this as completed Jul 28, 2022
@jph00
Copy link
Contributor

jph00 commented Jul 28, 2022

(Having said that - the fact that importing fastcore.parallel multiple times can break things is not good! I will look into it. If anyone can figure out what's going on, do let me know...)

@MichaelJFishmanBA
Copy link
Author

MichaelJFishmanBA commented Jul 28, 2022

@jph00
It looks like specifically calling urllib.request.build_opener() is what causes the problem.

I commented out the fastcore.net import and just added in

import urllib
_opener = urllib.request.build_opener()

and the same error occurs. Not sure why though.

Thanks for fixing it! And nbdev, fastai, etc.

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

Confirming that the commit Jeremy pushed fixes it for me

@MichaelJFishmanBA
Copy link
Author

It fixed it for me as well.

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

@MichaelJFishmanBA depending on whether you were editing the notebook or cli.py directly running prepare might have overwritten your changes leading to false positives when you were commenting out stuff and testing. That's what happened to me a few times during debugging

It really was the urllib stuff...

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

If anyone can figure out what's going on, do let me know...

@jph00

I haven't figured it out but I think there are a few key pieces of evidence:

here's what I think is happening:

  1. the main process runs, and at some point imports the parallel module which triggers
    try:
        if sys.platform == 'darwin' and IN_NOTEBOOK: set_start_method("fork")
    except: pass
    
    from: https://github.com/fastai/fastcore/blob/master/fastcore/parallel.py#L19
  2. the forked process when parallel is run also imports a module that imports parallel. The above code is run again because it isn't protected by __main__ (or any equivalent)
  3. it causes some sort of fork bomb and crashes?

Now something about this reasoning isn't quite correct as I tested out moving the above code inside the parallel method as that should only be called once by the parent process and not the child ones (unless they are also running their particular assigned job in parallel also, but this doesn't seem to be the case) which should achieve the equivalent of using a __main__ guard.

An alternative is to use "spawn" instead of "fork" but I don't know the history behind the decision to specify fork.

Update

After writing all the above I came across this post. And guess what, this works:

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES nbdev_prepare

I have no idea if setting this environment variable before running the process is a bad idea, or a terrible idea however.

@seeM
Copy link
Contributor

seeM commented Jul 28, 2022

@dleen would you mind sharing the latest minimal reproducible example of this issue?

I've been battling with this (and related multiprocessing) issues on mac as well, and can share a few thoughts:

  • We set the start method to fork on macOS in IPython which includes execnb - nbdev's internal notebook runner (it's the default method in Linux), to be able to use functions defined in a repl/notebook with process parallelism. With spawn or forkserver the function can't be pickled thus can't be used with multiprocessing (IIUC).
  • You might want to upgrade to >= 3.9.11/3.10.3/3.11 for concurrent.futures.ProcessPoolExecutor can deadlock when tcmalloc is used python/cpython#90622 - though this won't fix every instance of the problem. IIUC it reverted a prior performance enhancement which dynamically spawned workers instead of spawning all of them upfront. Doing so with fork in an already running thread could cause deadlocks, and on macOS exits with objc[25641]: +[NSMutableString initialize] may have been in progress in another thread when fork() was called. (unless you set the environment variable you mentioned).

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

After the recent fix you can add back:

#|export
from __future__ import annotations
import warnings

from fastcore.net import * <-------- this import

from nbdev.read import *
from nbdev.sync import *
from nbdev.process import *
from nbdev.processors import *
from nbdev.doclinks import *

from execnb.nbio import *
from fastcore.utils import *
from fastcore.script import call_parse
from fastcore import shutil

from urllib.error import HTTPError
from contextlib import redirect_stdout
import os, tarfile, subprocess, sys

to 10_cli.ipynb, then export it, and run prepare, you should get the error.

(Strangely adding from fastcore.parallel import * doesn't trigger the issue... which was my original hypothesis that .net was importing .parallel which was the root cause)

Btw my Python version is 3.9.13...

@seeM
Copy link
Contributor

seeM commented Jul 28, 2022

Minimal reproducible example:

from execnb.shell import CaptureShell
from fastcore.parallel import parallel
from urllib.request import _get_proxies

_get_proxies() # <-- breaks

def f(x): CaptureShell()
if __name__ == '__main__': parallel(f, [1])

@dleen
Copy link
Contributor

dleen commented Jul 28, 2022

Wow what a strange situation... can confirm that this fails for me too.

Wonder if it has to do with the threading here? https://github.com/python/cpython/blob/main/Modules/_scproxy.c#L178

@seeM
Copy link
Contributor

seeM commented Jul 29, 2022

@dleen yes that seems like the issue

These articles are super helpful:

Current fix in mind is to change fastcore.net to not run build_opener on import. But this is still not as robust as we'd like

@jph00
Copy link
Contributor

jph00 commented Jul 29, 2022

Current fix in mind is to change fastcore.net to not run build_opener on import

I've made this fix now. Thanks gang!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants