Skip to content

Connection does not re-establish for 15 minutes when running on Linux #1848

@bcage29

Description

@bcage29

To simulate a network failure we reboot both the primary and replica nodes in an Azure Cache for Redis instance and have found that the library reacts differently based on the host it is deployed to.

Application

  • .NET 5.0 app (uses a factory and lazy implementation for Redis Connection)
  • StackExchange.Redis 2.2.62

Expected Result

  1. Both nodes go down at the same time (or within a small time window).
  2. The application will report StackExchange.Redis.RedisConnectionException exceptions.
  3. The nodes will restart and be available approximately 1 minute after they go down.
  4. The library will reconnect approximately 1 minute after the nodes went down.

Windows & Docker on Windows Result

The application reconnects approximately 1 minute after the nodes went down as expected.

Error:

StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: SET N4BDN; It was not possible to connect to the redis server(s). There was an authentication failure; check that passwords (or client certificates) are configured correctly. ConnectTimeout, mc: 1/1/0, mgr: 10 of 10 available, clientName: 02cbef6fa5b6, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=1,Free=32766,Min=200,Max=32767), v: 2.2.62.27853
       ---> StackExchange.Redis.RedisConnectionException: It was not possible to connect to the redis server(s). There was an authentication failure; check that passwords (or client certificates) are configured correctly. ConnectTimeout
         --- End of inner exception stack trace ---
         at StackExchange.Redis.ConnectionMultiplexer.ThrowFailed[T](TaskCompletionSource`1 source, Exception unthrownException) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2802
      --- End of stack trace from previous location ---

Load Test Result

dockerWin10

Linux Result

The application throws TimeoutExceptions and does not reconnect for 15 minutes.

Error:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 5570ms elapsed, timeout is 5000ms), command=SET, next: SET FAO1X, inst: 0, qu: 0, qs: 12, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: <instancename>.redis.cache.windows.net:6380, mc: 1/1/0, mgr: 10 of 10 available, clientName: SandboxHost-637654330433879470, IOCP: (Busy=0,Free=1000,Min=200,Max=1000), WORKER: (Busy=2,Free=32765,Min=200,Max=32767), v: 2.2.62.27853 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Load Test Result

linuxContainer

Observations

  • When running on a Linux server you can update the sysctl setting net.ipv4.tcp_retries2. This setting decides the total time before a connection failure is declared. Lowering this setting to '5', I found that the application threw the correct type of errors StackExchange.Redis.RedisConnectionException and reconnected approximately 1 minute after the nodes went down. The downside to making this change is that it is a TCP setting for the server and if have multiple applications running on that server, they are all affected.
  • Installing Docker on the Linux server, updating the sysctl setting net.ipv4.tcp_retries2 to 5 and running the application as a container did not reconnect quickly. Updating the setting did not have any impact when the application reconnected. It reconnected after 15 minutes.
  • Following the Best Practices guide, you should be implementing a ForceReconnect method to handle these types of scenarios. The documentation also says,

Don't call ForceReconnect for Timeouts, just for RedisConnectionExceptions or SocketExceptions

  • In this situation, when the application is running on Linux it throws TimeoutExceptions, which the documentation says do not call the ForceReconnect code.

Questions

  • Is this something that can be handled or improved in the StackExchange.Redis library?
  • Are there Best Practice TCP Settings that should be used when running on Linux?
  • Should the Best Practice be to call ForceReconnect on TimeoutExceptions when running on Linux and also when you encounter RedisConnectionExceptions?

Referenced Issues

#1782
#1822

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions