You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version: 4.6.0 connecting to a cluster version 6.2.7
Platform: Python 3.8 on Ubuntu 22.10 / Centos 7
Description: If one of the nodes in the cluster becomes unreachable in a way that returns TimeoutError, the client spirals down into and unrecoverable state
Consider this small snippet, that generates pipelines and executes them in random keys simulating some busy client.
While this script is running, if I tear down one of the nodes in the cluster, in a way that connection attempts from the client return TimeoutError, two things happen:
The client never recovers from the TimeoutError, even if I replace the server on a different IP, the unreachable node is kept in the nodes cache and continuously tried for each further iteration of the pipelines
The TimeoutError returns in the middle of the pipeline, and all the associated connections in the connection pool of all the nodes involved are not released, and additional pipeline commands (which still hit the TimeoutError) eventually fill up the connection pool to their max capacity, blocking any further connection.
I have tried to add a pipeline.reset() command in case of exception raised, but reading the method code, it doesn't really release any connection (there are a few TODOs for the WATCH case though)
In fact, if in my virtual environment, if I alter the except clause in ClusterPipeline._send_cluster_command to include TimeoutError, the client recovers correctly and connections don't pile up, but I don't know if this could lead to some other side effects.
The text was updated successfully, but these errors were encountered:
Version: 4.6.0 connecting to a cluster version 6.2.7
Platform: Python 3.8 on Ubuntu 22.10 / Centos 7
Description: If one of the nodes in the cluster becomes unreachable in a way that returns TimeoutError, the client spirals down into and unrecoverable state
Consider this small snippet, that generates pipelines and executes them in random keys simulating some busy client.
While this script is running, if I tear down one of the nodes in the cluster, in a way that connection attempts from the client return
TimeoutError
, two things happen:TimeoutError
, even if I replace the server on a different IP, the unreachable node is kept in the nodes cache and continuously tried for each further iteration of the pipelinesTimeoutError
returns in the middle of the pipeline, and all the associated connections in the connection pool of all the nodes involved are not released, and additional pipeline commands (which still hit the TimeoutError) eventually fill up the connection pool to their max capacity, blocking any further connection.I have tried to add a
pipeline.reset()
command in case of exception raised, but reading the method code, it doesn't really release any connection (there are a few TODOs for the WATCH case though)During my tests, I've noticed that the way errors are treated in the
ClusterPipeline._send_cluster_command
here https://github.com/redis/redis-py/blob/v4.6.0/redis/cluster.py#L2001 is slightly different than theRedisCluster._execute_command
here: https://github.com/redis/redis-py/blob/v4.6.0/redis/cluster.py#L1135.The
RedisCluster
method reinitializes the nodes cache also in case ofTimeoutError
.In fact, if in my virtual environment, if I alter the except clause in
ClusterPipeline._send_cluster_command
to includeTimeoutError
, the client recovers correctly and connections don't pile up, but I don't know if this could lead to some other side effects.The text was updated successfully, but these errors were encountered: