- 
                Notifications
    
You must be signed in to change notification settings  - Fork 138
 
Ignore nodes with all connections broken during schema agreement. #1355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore nodes with all connections broken during schema agreement. #1355
Conversation
| 
           
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - found a typo
| return Err(SchemaAgreementError::RequestError( | ||
| RequestAttemptError::BrokenConnectionError(err.clone()), | ||
| )); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit: Schema agreement: Ignore BrokenConnectionError
Do we need this clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we iterate over &SchemaNodeResult, so we probably need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Maybe it would be somehow possible to avoid it by doing some magic on iterators, but it would most likely complicate the code even more, which I'd really like to avoid.
This clone only happens in error condition that should be exceedingly rare (all connections to all nodes are broken) I don't see the need to optimize this case.
| self.handle_auto_await_schema_agreement(&response, coordinator.node().host_id) | ||
| .await?; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing coordinator turned out to be useful for driver internals as well 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without it I would probably need to add some plumbing to pass through the id.
81dc369    to
    25bb876      
    Compare
  
    Schema agreement logic will become more complicated in next commits. Splitting this function should aid readability.
25bb876    to
    5be085e      
    Compare
  
    5be085e    to
    978edf6      
    Compare
  
    This should fix scylladb#1240 Such fix also makes sense from another perspective: `await_scheme_agreement` doc comment says "Awaits schema agreement among all reachable nodes.". If all connections to a given node are broken, we can definitely conclude that the node is not reachable. Previously I thought that doing this would introduce a bug: what if a coordinator of DDL becomes unreachable after returning a response, but before agreement is reached? We could reach agreement on old schema version! Now I see that this issue is pre-existing: `await_schema_agreement` reads ClusterState itself, so the following race is possible: - Driver sends DDL - Coordinator responds and dies - Driver reads the response - Driver detects that the coordinator is dead, notes that in ClusterState - Driver tries to perform schema agreement, and does that without using the coordinator. This issue will be fixed in next commits.
This fixes the issue described in parent commit. Internal schema agreement APIs now accept a `Option<Uuid>` that is a host id of a node that must be a part of schema agreement. For user-requested agreements it will be None, and for agreements after DDL it will be the coordinator.
978edf6    to
    9caec57      
    Compare
  
    | { | ||
| // Case 1: Paused node is a coordinator for DDL. | ||
| // DDL needs to fail. | ||
| let result = run_some_ddl_with_paused_node( | ||
| NodeIdentifier::HostId(host_ids[1]), | ||
| 1, | ||
| &session, | ||
| &mut running_proxy, | ||
| ) | ||
| .await; | ||
| assert_matches!( | ||
| result, | ||
| Err(ExecutionError::SchemaAgreementError( | ||
| SchemaAgreementError::RequestError( | ||
| RequestAttemptError::BrokenConnectionError(_) | ||
| ) | ||
| )) | ||
| ) | ||
| } | ||
| 
               | 
          ||
| { | ||
| // Case 2: Paused node is NOT a coordinator for DDL. | ||
| // DDL should succeed, because auto schema agreement only needs available nodes to agree. | ||
| let result = run_some_ddl_with_paused_node( | ||
| NodeIdentifier::HostId(host_ids[2]), | ||
| 1, | ||
| &session, | ||
| &mut running_proxy, | ||
| ) | ||
| .await; | ||
| assert_matches!(result, Ok(_)) | ||
| } | ||
| 
               | 
          ||
| { | ||
| // Case 3: Paused node is a coordinator for DDL, and is used by control connection. | ||
| // It is the same as case 1, but paused node is also control connection. | ||
| // DDL needs to fail. | ||
| let result = run_some_ddl_with_paused_node( | ||
| NodeIdentifier::HostId(host_ids[0]), | ||
| 0, | ||
| &session, | ||
| &mut running_proxy, | ||
| ) | ||
| .await; | ||
| assert_matches!( | ||
| result, | ||
| Err(ExecutionError::SchemaAgreementError( | ||
| SchemaAgreementError::RequestError( | ||
| RequestAttemptError::BrokenConnectionError(_) | ||
| ) | ||
| )) | ||
| ) | ||
| } | ||
| 
               | 
          ||
| { | ||
| // Case 4: Paused node is NOT a coordinator for DDL, but is used by control connection. | ||
| // It is the same as case 2, but paused node is also control connection. | ||
| // DDL should succeed, because auto schema agreement only needs available nodes to agree, | ||
| // and control connection is not used for that at all. | ||
| let result = run_some_ddl_with_paused_node( | ||
| NodeIdentifier::HostId(host_ids[1]), | ||
| 0, | ||
| &session, | ||
| &mut running_proxy, | ||
| ) | ||
| .await; | ||
| assert_matches!(result, Ok(_)) | ||
| } | ||
| 
               | 
          ||
| running_proxy | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 I'd also like to see a test case for the new CoordinatorAbsent (I may have changed the name) variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also would like to see such a test case, but unfortunately I have no idea how to write it.
What would need to happen for this error to appear:
- We send DDL to some node X
 - Node X responds.
 - We go to 
handle_auto_await_schema_agreementand start awaiting - Some schema check attempts may happen that end in schema still being diverged.
 - Node X becomes unreachable and driver notices it, making its pool 
Broken<- this is the important part - X pool is not present in next schema check, the new error is returned.
 
Do you have any idea how to recreate such race in a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, the pool becomes Broken if the last connection is removed. A connection is removed once an error is encountered on that connection. If you make proxy break the connection, driver should immediately recognize it, correct?
If we mock (with the proxy) the first schema read as diverged, then drop the connection, won't this be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use proxy rules to synchronize with the driver by putting some requests on the hold while we are breaking the connection on another node - use the delay setting of the proxy rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm against any timing-based synchronization, it is a straight path towards flaky tests. Tomorrow I'll look into writing this test without using delays, but I'm not sure it will be possible with the current capabilities of the proxy.
36f7245    to
    e9a6a53      
    Compare
  
    | 
           Addressed @wprzytula comments, apart from one because I don't know how to write requested test.  | 
    
e9a6a53    to
    0156169      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
one question: Why do we distinguish the test cases where paused node is used for control connection? Does control connection somehow affect the schema agreement after DDL?
Schema agreement awaits agreement on all reachable nodes - even the doc comment of
await_schema_agreementsays so:/// Awaits schema agreement among all reachable nodes.When is the node not reachable? Until now, only nodes with
MaybePoolConnections::Brokenwere treated as such, but there is another equally important case: node where all the requests return BrokenConnectionError.Not handling this case means that it is possible for DDL requests to fail simply because some unrelated node stopped responding to requests at the wrong time.
This PR fixes that. Now if BrokenConnectionError is returned on all connections to the node when fetching schema version, we treat the node as unreachable and ignore its result for purpose of schema agreement check.
I uncovered another issue: if DDL coordinator is unreachable (be either old or new definition), it may be possible to converge on old schema version. I also fixed that by introducing optional parameter to schema awaiting functions,
that allow specifying a node that needs to succeed in schema fetching (otherwise the check is considered failed).
Fixes: #1240
Pre-review checklist
I have provided docstrings for the public items that I want to introduce.I have adjusted the documentation in./docs/source/.Fixes:annotations to PR description.