-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Changed the PubSub's health check command to be performed only on the… #1733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… first command execution.
Codecov Report
@@ Coverage Diff @@
## master #1733 +/- ##
=======================================
Coverage 89.04% 89.04%
=======================================
Files 53 53
Lines 11051 11054 +3
=======================================
+ Hits 9840 9843 +3
Misses 1211 1211
Continue to review full report at Codecov.
|
@bmerry @chayim |
Arguably there is a race condition where a thread might It potentially makes the health check less useful though. If the pubsub is used for a bit, then all the subscriptions are removed, then it is used again 10 minutes later, there will be no health check at that point. Maybe that's a reasonable trade-off for correctness. |
See #1737 for the closing PR/ |
… first command execution.
Pull Request check-list
Please make sure to review and check all of these items:
$ tox
pass with this change (including linting)?NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Description of change
Closing #1720:
I reproduced the PubSub bug described in #1720 and was able to find the RC.
When the PubSub's 'execute_command' method is being called it passes a 'health_check' bool to determine if it needs to run a health check. The 'health_check' value is set to not self.subscribed, which checks if the pubsub instance has any items in the channels/patterns lists. That means, that we perform a health check within the execute_command function only if we are not yet subscribed. All subsequent commands, after the first subscription, should be executed without performing a health check, since the channels/patterns list is no longer empty.
The pubsub's 'get_messages()' method can be used to poll published messages after a pubsub instance has been created. If a poller thread is created (thread that waits on get_message()) it will listen on the same socket as the pubsub execute_command is listening to when it performs a health check. Hence, we should not send a healthcheck using the pubsub execute_command function after the poller thread is initiated, since then it will be racing the poller thread to read the response from the socket.
In the example in #1720 we see the following flow:
Therefore, we shouldn't use self.channels and self.patterns to determine whether a health check needs to be executed, but we should have another variable to indicate whether this is the first command execution, and if so, to run a health check.
However, a poller thread may be started before subscribing to a channel, e.g. :
In this case, the health check will be performed and we will still get a race reading from the socket with the poller thread.
So, my suggestion is to add a new 'cmd_execution_health_check' variable initiated with 'True' to the pubsub class and to set it to False on:
health checks are being done by the get_message() method, so no need to execute it also from the main command execution.
This change fixes the reported bug.