-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Tiebreakers: move into the handshake #1931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently the way we handshake is to get everything configured, wait for a tracer to complete, and then issue the tiebreakers to all servers if they are in play. This complicates a few things with respect to timings, duplication, and write paths being a one-off for tie breakers, which I tripped on hard in #1912. In this, we instead move the tie breaker fetch as part of AutoConfigure as a fire-and-forget-process-the-result-later setup with a dedicated processor. This all happens before the tracer fires moving us to the next connection phase (added comments) so we should be safe. It should reduce both complexity and overall connection times proportional to endpoint latency (since we wait for completion right now). What needs adding here is tests with us disabling commands like INFO, GET, etc. and ensuring things still behave as we want. In the overall, the tie breaker is slightly less isolated but _should_ be happening in the same order and with the same exception if any - no net result change is intended there with respect to how we do or don't error along the way. But we never want a connection to fail _because of a tiebreaker_ and I think that warrants a few tests: - [ ] Disable `INFO` and see if we can connect - [ ] Disable `GET` and see if we can connect - [ ] Store some invalid TieBreaker and see if we can connect (e.g. make it a hash instead of a string) ...and maybe others?
mgravell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments, but: 👍
| var status = tieBreakers[i].Status; | ||
| switch (status) | ||
| var server = servers[i]; | ||
| string serverResult = server.TieBreakerResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we nuke this here? or not worth the additional race conditions that would introduce for overlapped/staggered connects; meh, probably not worth the complexity risk - just wanted to share a thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, I like the thinking - seems like we'd want to change where it's stored so it's transient in a connection state object or some such and not stored on the server but as a lookup that has the lifetime of the connect handshake perhaps?
|
Follow-up: going to look at moving this into a "handshake state" object we pass around to AutoConfigure and into the result processor, but that's a bigger refactor so merging here then revisiting. |
Annnnnnd this is why we add tests, we would have tried to issue the GET and never connected in previous code, bad Craver, bad!
|
@mgravell Tweaked code here a bit (to not issue tie breakers when GET is disabled) and added tests, ready for eyes when back in action! |
Currently the way we handshake is to get everything configured, wait for a tracer to complete, and then issue the tiebreakers to all servers if they are in play. This complicates a few things with respect to timings, duplication, and write paths being a one-off for tie breakers, which I tripped on hard in #1912.
In this, we instead move the tie breaker fetch as part of AutoConfigure as a fire-and-forget-process-the-result-later setup with a dedicated processor. This all happens before the tracer fires moving us to the next connection phase (added comments) so we should be safe. It should reduce both complexity and overall connection times proportional to endpoint latency (since we wait for completion right now).
What needs adding here is tests with us disabling commands like INFO, GET, etc. and ensuring things still behave as we want. In the overall, the tie breaker is slightly less isolated but should be happening in the same order and with the same exception if any - no net result change is intended there with respect to how we do or don't error along the way. But we never want a connection to fail because of a tiebreaker and I think that warrants a few tests:
INFOand see if we can connectGETand see if we can connect...and maybe others?