Skip to content

A deadlock is encountered when invoking the 'subscribe' function. #2883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
suxb201 opened this issue Aug 9, 2023 · 5 comments
Open

A deadlock is encountered when invoking the 'subscribe' function. #2883

suxb201 opened this issue Aug 9, 2023 · 5 comments

Comments

@suxb201
Copy link
Contributor

suxb201 commented Aug 9, 2023

Version:
redis-py 4.5.5 used in a multi-threads program

Platform:
Python 3.9.7 on Linux 5.10 aarch64

Description:
At times, invoking the 'subscribe' function may result in a hang, and based on the stack analysis, it appears that a deadlock might be occurring.

...
Thread 0x0000ffffaca0f150 (most recent call first):
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 1497 in release # try lock self._lock again
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1426 in reset
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1418 in __del__
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 949 in __init__
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 1492 in make_connection
  File "/root/miniconda3/lib/python3.9/site-packages/redis/connection.py", line 1452 in get_connection # with self._lock
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1469 in execute_command
  File "/root/miniconda3/lib/python3.9/site-packages/redis/client.py", line 1634 in subscribe
...

Client is created with:

host="localhost",
port=self._port,
password="*****",
single_connection_client=True,  # might be helpful 
socket_timeout=30,
socket_connect_timeout=10,
ssl=self._is_tls,
ssl_ca_certs=f"xxxxxx/tls/ca.crt"

My program:

...
pubsub = c.pubsub()
pubsub.subscribe(channel) # hang
...
@jaroslawporada
Copy link

@andymccurdy
I believe this commit caused it: 32a5840

I have same deadlock problem with redis as celery broker

', '  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 2114, in execute
    conn = self.connection_pool.get_connection("MULTI", self.shard_hint)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1456, in get_connection
    connection = self.make_connection()
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1496, in make_connection
    return self.connection_class(**self.connection_kwargs)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 960, in __init__
    super().__init__(**kwargs)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 647, in __init__
    self.retry = Retry(NoBackoff(), 1)
', '  File "/usr/local/lib/python3.11/site-packages/redis/backoff.py", line 42, in __init__
    super().__init__(0)
', '  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 1418, in __del__
    self.reset()
', '  File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 1426, in reset
    self.connection_pool.release(self.connection)
', '  File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 1501, in release
    with self._lock:

Can we bring back RLock?

@jaroslawporada
Copy link

I think that during getting new connection, garbage collector delete PubSub causing this situation.

@lan17
Copy link

lan17 commented Jan 11, 2025

Running into same issue when using Redis as backend for celery.

@jaroslawporada
Copy link

def patch_redis_connectionpool_lock():
    from hashlib import sha256
    from inspect import getsource
    import threading

    from redis.connection import ConnectionPool

    OLD_METHOD_SHA = (  # pylint: disable=invalid-name
        b"some hash"
    )

    def reset(self):
        self._lock = threading.RLock()  # pylint: disable=protected-access
        self._created_connections = 0  # pylint: disable=protected-access
        self._available_connections = []  # pylint: disable=protected-access
        self._in_use_connections = set()  # pylint: disable=protected-access

        # this must be the last operation in this method. while reset() is
        # called when holding _fork_lock, other threads in this process
        # can call _checkpid() which compares self.pid and os.getpid() without
        # holding any lock (for performance reasons). keeping this assignment
        # as the last operation ensures that those other threads will also
        # notice a pid difference and block waiting for the first thread to
        # release _fork_lock. when each of these threads eventually acquire
        # _fork_lock, they will notice that another thread already called
        # reset() and they will immediately release _fork_lock and continue on.
        self.pid = os.getpid()

    assert OLD_METHOD_SHA == sha256(getsource(ConnectionPool.reset).encode()).digest()
    ConnectionPool.reset = reset

For now I patched this method to previous working implementation till someone merge my MR.

@lan17
Copy link

lan17 commented Jan 11, 2025

BTW, I still ran into deadlocking issues even with RLock, so I am not sure if its the culprit.

My setup is using Redis as Celery backend and having lots of coroutines awaiting the AsyncResult.ready and AsyncResult.get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants