numcodecs.blosc mutex leaked semaphore warning #230

rc-conway · 2020-05-04T12:49:42Z

Minimal, reproducible code sample, a copy-pastable example if possible

import zarr

Problem description

Using multiprocessing in a cython context, if zarr is in the import space I get a leaked semaphore warning.

I suspect the issue is the global mutex variable (the lock) in numcodecs.blosc is possibly being garbage collected prior to the multiprocessing finalizer getting called, so the weakref is dropped and the lock is never unregistered. This is an issue when used in a multiprocessing environment. This results in a leaked semaphore warning for me. I don't think there's a functional issue but I would like the warning to go away.

https://github.com/zarr-developers/numcodecs/blob/master/numcodecs/blosc.pyx#L78

Version and installation information

Please provide the following:

numcodecs 0.6.4
Python 3.8.0
Linux, MacOS
conda

The text was updated successfully, but these errors were encountered:

alimanfoo · 2020-05-04T15:34:19Z

Thanks @rc-conway. Do you have any suggestions for how to avoid this?

jakirkham · 2020-05-05T00:21:52Z

Could we move it to init? Would that help?

rc-conway · 2020-05-07T16:29:44Z

@jakirkham That would the prevent issues caused by having zarr on the import path.

As a just in case
Maybe blosc.destroy might need to call mutex = None to clean it up?

alimanfoo · 2020-05-11T12:07:22Z

For reference, we call init() here which will be called during zarr (numcodecs) import. We also register a call to destroy at exit (here).

jakirkham · 2020-05-22T00:58:41Z

I'm guessing there is also something weird going on when Cython cleans up this object (or doesn't...).

Let's trying moving to init 🙂

jakirkham · 2020-05-22T01:07:52Z

Added PR ( #234 ) to move initialization of the mutex to init. Also mutex is overwritten in destroy, which should trigger reference counting to cleanup the Lock (there doesn't appear to be a method to cleanup the Lock directly). Please take a look and give it a try 🙂

jstriebel · 2022-10-21T11:58:02Z

Since PR #234 was close, I had another look on this. The following shows a minimal reproduction of the problem, together with with potential fixes (coming with caveats, discussed below):

import multiprocessing as mp
from contextlib import nullcontext

if __name__ == '__main__':
    # This must happen before importing blosc,
    # and must be guarded by the if.
    mp.set_start_method("spawn")

from numcodecs import blosc

### The following line doesn't show anymore leaked semaphores:
# blosc.mutex = mp.get_context("fork").Lock

### Or this, removing the mutex completely, only using
### the _get_use_threads() check
# from contextlib import nullcontext
# blosc.mutex = nullcontext()

def f():
    print("inner")

if __name__ == '__main__':
    print("outer")
    p = mp.Process(target=f)
    p.start()
    p.join()

This outputs There appear to be 1 leaked semaphores to clean up at shutdown, commenting in any of the two fixes removes this warning. Using the first solution with mp.get_context("fork").Lock has two caveats:

When using fork, leaked resources are simply not shown, so they might still leak, just the warning is gone. From the python docs

On Unix using the spawn or forkserver start methods will also start a resource tracker process which tracks the unlinked named system resources (such as named semaphores or SharedMemory objects) created by processes of the program. When all processes have exited the resource tracker unlinks any remaining tracked object. Usually there should be none, but if a process was killed by a signal there may be some “leaked” resources. (Neither leaked semaphores nor shared memory segments will be automatically unlinked until the next reboot. This is problematic for both objects because the system allows only a limited number of named semaphores, and shared memory segments occupy some space in the main memory.)
Using a lock from a fork context with spawned processes does not work. Also from the python docs :

Alternatively, you can use get_context() to obtain a context object. Context objects have the same API as the multiprocessing module, and allow one to use multiple start methods in the same program.
…
Note that objects related to one context may not be compatible with processes for a different context. In particular, locks created using the fork context cannot be passed to processes started using the spawn or forkserver start methods.

However, it seems that this might not be a problem, since the blosc calls with the global context are guarded by _get_use_threads, which ensures that the global context is only accessed

from within a single-threaded, single-process program.

(from this code comment), and not from any subprocesses or -threads. Those only use the non-threaded ctx-version of the calls, so the fork-only lock should actually never be used in spawned subprocesses.

Because of the latter point, I'm wondering if the mutex is actually needed at all. It would be great if someone with more knowledge about the blosc internals could comment on that. In this case simply removing the mutex would be fine, which can be simulated by setting it to contextlib.nullcontext() as shown in the second commented fix, which essentially then only uses the rest of the _get_use_threads logic to guard the global context access.

Here is some more code I used for further testing:

import os
import multiprocessing as mp


if __name__ == '__main__':
    # This must happen before importing blosc,
    # and must be guarded by the if.
    mp.set_start_method("spawn")

from numcodecs import blosc

### The following line doesn't show anymore leaked semaphores:
# blosc.mutex = mp.get_context(method="fork").Lock()

### Or this, removing the mutex completely, only using
### the _get_use_threads() check
from contextlib import nullcontext
blosc.mutex = nullcontext()


def get_random_bytes():
    LENGTH = 10**9
    return b"\x00" + os.urandom(LENGTH) + b"\x00"


def f():
    print("inner")
    codec = blosc.Blosc("zstd")
    msg = get_random_bytes()
    print("use threads (inner)", blosc._get_use_threads(), flush=True)
    print("before inner encode", flush=True)
    encoded = codec.encode(msg)
    print("after inner encode", flush=True)
    decoded = codec.decode(encoded)
    assert decoded == msg


if __name__ == '__main__':
    print("outer")
    # When using fork in the global scope,
    # one can still use spawn locally,
    # without leaked semaphore warnings:
    # mp = mp.get_context("spawn")
    p = mp.Process(target=f)
    p.start()
    blosc.set_nthreads(1)  # limit threads to allow subprocess to start
    codec = blosc.Blosc("lz4")
    msg = get_random_bytes()
    print("use threads (outer)", blosc._get_use_threads(), flush=True)
    print("before outer encode", flush=True)
    encoded = codec.encode(msg)
    print("after outer encode", flush=True)
    decoded = codec.decode(encoded)
    assert decoded == msg
    p.join()

jakirkham · 2022-10-21T15:33:28Z

FWIW the PR was closed more because of trying to address issue ( zarr-developers/zarr-python#777 ). So a new PR with those changes or other changes could be opened.

joshmoore · 2022-10-24T07:03:18Z

Do you have a preference as to which change, @jakirkham?

danielsf · 2022-10-24T21:35:31Z

I apologize if I appear to be piling on, but running python 3.9, zarr-py version 2.13.3, I cannot even get this to run without emitting the warning about leaked semaphore objects and hanging (commenting out the zarr import resolves the problem).

import zarr
import multiprocessing

def fn(x):
    return x**2

def main():
    p_list = []
    for ii in range(4):
        p = multiprocessing.Process(target=fn, args=(ii,))
        p.start()
        p_list.append(p)
    for p in p_list:
        p.join()

if __name__ == "__main__":
    main()

While I agree with the original poster that results do not actually seem to be affected by this problem, a fix would be greatly appreciated.

joshmoore · 2022-10-31T20:50:05Z

No piling on perceived, @danielsf. Extra data points welcome. Ping, @jakirkham.

jakirkham mentioned this issue May 19, 2020

Leaking semaphores tracking issue. dask/distributed#1078

Closed

jakirkham mentioned this issue May 22, 2020

Create/free mutex in init/destroy #234

Closed

7 tasks

dstansby added the bug label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

numcodecs.blosc mutex leaked semaphore warning #230

numcodecs.blosc mutex leaked semaphore warning #230

rc-conway commented May 4, 2020 •

edited

Loading

alimanfoo commented May 4, 2020

Uh oh!

jakirkham commented May 5, 2020

Uh oh!

rc-conway commented May 7, 2020

Uh oh!

alimanfoo commented May 11, 2020

Uh oh!

jakirkham commented May 22, 2020

Uh oh!

jakirkham commented May 22, 2020

Uh oh!

jstriebel commented Oct 21, 2022 •

edited

Loading

Uh oh!

jakirkham commented Oct 21, 2022

Uh oh!

joshmoore commented Oct 24, 2022

Uh oh!

danielsf commented Oct 24, 2022

Uh oh!

joshmoore commented Oct 31, 2022

Uh oh!

numcodecs.blosc mutex leaked semaphore warning #230

numcodecs.blosc mutex leaked semaphore warning #230

Comments

rc-conway commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Minimal, reproducible code sample, a copy-pastable example if possible

Problem description

Version and installation information

alimanfoo commented May 4, 2020

Uh oh!

jakirkham commented May 5, 2020

Uh oh!

rc-conway commented May 7, 2020

Uh oh!

alimanfoo commented May 11, 2020

Uh oh!

jakirkham commented May 22, 2020

Uh oh!

jakirkham commented May 22, 2020

Uh oh!

jstriebel commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Oct 21, 2022

Uh oh!

joshmoore commented Oct 24, 2022

Uh oh!

danielsf commented Oct 24, 2022

Uh oh!

joshmoore commented Oct 31, 2022

Uh oh!

rc-conway commented May 4, 2020 •

edited

Loading

jstriebel commented Oct 21, 2022 •

edited

Loading