Skip to content

Conversation

@NickCraver
Copy link
Collaborator

@NickCraver NickCraver commented Dec 14, 2021

Currently the way we handshake is to get everything configured, wait for a tracer to complete, and then issue the tiebreakers to all servers if they are in play. This complicates a few things with respect to timings, duplication, and write paths being a one-off for tie breakers, which I tripped on hard in #1912.

In this, we instead move the tie breaker fetch as part of AutoConfigure as a fire-and-forget-process-the-result-later setup with a dedicated processor. This all happens before the tracer fires moving us to the next connection phase (added comments) so we should be safe. It should reduce both complexity and overall connection times proportional to endpoint latency (since we wait for completion right now).

What needs adding here is tests with us disabling commands like INFO, GET, etc. and ensuring things still behave as we want. In the overall, the tie breaker is slightly less isolated but should be happening in the same order and with the same exception if any - no net result change is intended there with respect to how we do or don't error along the way. But we never want a connection to fail because of a tiebreaker and I think that warrants a few tests:

  • Disable INFO and see if we can connect
  • Disable GET and see if we can connect
  • Store some invalid TieBreaker and see if we can connect (e.g. make it a hash instead of a string)
    ...and maybe others?

Currently the way we handshake is to get everything configured, wait for a tracer to complete, and then issue the tiebreakers to all servers if they are in play. This complicates a few things with respect to timings, duplication, and write paths being a one-off for tie breakers, which I tripped on hard in #1912.

In this, we instead move the tie breaker fetch as part of AutoConfigure as a fire-and-forget-process-the-result-later setup with a dedicated processor. This all happens before the tracer fires moving us to the next connection phase (added comments) so we should be safe. It should reduce both complexity and overall connection times proportional to endpoint latency (since we wait for completion right now).

What needs adding here is tests with us disabling commands like INFO, GET, etc. and ensuring things still behave as we want. In the overall, the tie breaker is slightly less isolated but _should_ be happening in the same order and with the same exception if any - no net result change is intended there with respect to how we do or don't error along the way. But we never want a connection to fail _because of a tiebreaker_ and I think that warrants a few tests:

- [ ] Disable `INFO` and see if we can connect
- [ ] Disable `GET` and see if we can connect
- [ ] Store some invalid TieBreaker and see if we can connect (e.g. make it a hash instead of a string)
...and maybe others?
Copy link
Collaborator

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments, but: 👍

var status = tieBreakers[i].Status;
switch (status)
var server = servers[i];
string serverResult = server.TieBreakerResult;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we nuke this here? or not worth the additional race conditions that would introduce for overlapped/staggered connects; meh, probably not worth the complexity risk - just wanted to share a thought

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I like the thinking - seems like we'd want to change where it's stored so it's transient in a connection state object or some such and not stored on the server but as a lookup that has the lifetime of the connect handshake perhaps?

@NickCraver
Copy link
Collaborator Author

Follow-up: going to look at moving this into a "handshake state" object we pass around to AutoConfigure and into the result processor, but that's a bigger refactor so merging here then revisiting.

Annnnnnd this is why we add tests, we would have tried to issue the GET and never connected in previous code, bad Craver, bad!
@NickCraver
Copy link
Collaborator Author

@mgravell Tweaked code here a bit (to not issue tie breakers when GET is disabled) and added tests, ready for eyes when back in action!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants