Skip to content

Thread safe sentinel #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 4, 2017

Conversation

lalinsky
Copy link

@lalinsky lalinsky commented Oct 1, 2017

@lalinsky lalinsky requested review from capkovic, vetyy and ms7s October 1, 2017 08:27
In the current version, the sentinel code tries to close all
connections immediately after discovering there is a new master.
This is a problem in multi-threaded environment, because
neither `ConnectionPool.disconnect` nor `Connection.disconnect` are
thread-safe. If you call `SentinelConnectionPool.disconnect` after master
failover, that will close all connections that are potentially used
from other threads, causing all kinds of errors.

This change avoids that behavior by adding acquire/release checks, so
connections that don't belong the current master are never returned
to the pool and they are closed instead.
@lalinsky lalinsky force-pushed the thread-safe-sentinel branch from 38a0157 to bbf553c Compare October 2, 2017 15:26
self.get_master_address()
return False
if self.is_master:
if self.master_address != (connection.host, connection.port):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't a similar check be made for the case when self.is_master is False? (i.e. you called slave_for before and the slave you are connected to changed to master)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slave_for can actually also give you a master connection (only as a fallback, but it still can), so I consider slave_for as "give me a connection to any redis server, preferably slave" and don't think we need to disconnect such connections

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay then.

def _check_connection(self, connection):
if connection.to_be_disconnected:
connection.disconnect()
self.get_master_address()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves the master every every time when a connection is returned to the pool, which might be several times (one for each connection in the pool) after a master failover. This could probably be avoided if we kept track of the "generation" (count how many times the failover has happened) and not doing the refresh if the connection is not from the last generation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is, is it a problem? During a failover we do 2*n resolves instead of n.

There is currently no good place to watch for the master change. This to_be_disconnected flag is basically just a hint.

The correct solution would be to subscribe to the sentinel pub/sub, watch for master changes and do the generation change immediately after that happens.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't cause any problems.

@lalinsky lalinsky merged commit bbf553c into exponea:2.10.6-exponea Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants