Skip to content

Conversation

@Lorak-mmk
Copy link
Collaborator

@Lorak-mmk Lorak-mmk commented May 13, 2025

Schema agreement awaits agreement on all reachable nodes - even the doc comment of await_schema_agreement says so:

    /// Awaits schema agreement among all reachable nodes.

When is the node not reachable? Until now, only nodes with MaybePoolConnections::Broken were treated as such, but there is another equally important case: node where all the requests return BrokenConnectionError.
Not handling this case means that it is possible for DDL requests to fail simply because some unrelated node stopped responding to requests at the wrong time.

This PR fixes that. Now if BrokenConnectionError is returned on all connections to the node when fetching schema version, we treat the node as unreachable and ignore its result for purpose of schema agreement check.

I uncovered another issue: if DDL coordinator is unreachable (be either old or new definition), it may be possible to converge on old schema version. I also fixed that by introducing optional parameter to schema awaiting functions,
that allow specifying a node that needs to succeed in schema fetching (otherwise the check is considered failed).

Fixes: #1240

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk marked this pull request as draft May 13, 2025 15:24
@Lorak-mmk Lorak-mmk requested review from muzarski and wprzytula May 13, 2025 15:24
@github-actions
Copy link

github-actions bot commented May 13, 2025

cargo semver-checks found no API-breaking changes in this PR.
Checked commit: 0156169

@Lorak-mmk Lorak-mmk marked this pull request as ready for review May 14, 2025 06:32
Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - found a typo

Comment on lines +2211 to +2288
return Err(SchemaAgreementError::RequestError(
RequestAttemptError::BrokenConnectionError(err.clone()),
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit: Schema agreement: Ignore BrokenConnectionError

Do we need this clone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we iterate over &SchemaNodeResult, so we probably need it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Maybe it would be somehow possible to avoid it by doing some magic on iterators, but it would most likely complicate the code even more, which I'd really like to avoid.
This clone only happens in error condition that should be exceedingly rare (all connections to all nodes are broken) I don't see the need to optimize this case.

Comment on lines +1119 to +1120
self.handle_auto_await_schema_agreement(&response, coordinator.node().host_id)
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing coordinator turned out to be useful for driver internals as well 🎉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, without it I would probably need to add some plumbing to pass through the id.

@Lorak-mmk Lorak-mmk force-pushed the fix-schema-broken-connection branch from 81dc369 to 25bb876 Compare May 14, 2025 20:43
Schema agreement logic will become more complicated in next commits.
Splitting this function should aid readability.
@Lorak-mmk Lorak-mmk force-pushed the fix-schema-broken-connection branch from 25bb876 to 5be085e Compare May 26, 2025 15:07
@Lorak-mmk Lorak-mmk force-pushed the fix-schema-broken-connection branch from 5be085e to 978edf6 Compare May 26, 2025 15:39
Lorak-mmk added 2 commits May 26, 2025 17:44
This should fix scylladb#1240
Such fix also makes sense from another perspective:
`await_scheme_agreement` doc comment says "Awaits schema agreement among
all reachable nodes.". If all connections to a given node are broken, we
can definitely conclude that the node is not reachable.

Previously I thought that doing this would introduce a bug: what if a
coordinator of DDL becomes unreachable after returning a response, but
before agreement is reached? We could reach agreement on old schema
version!
Now I see that this issue is pre-existing: `await_schema_agreement`
reads ClusterState itself, so the following race is possible:
- Driver sends DDL
- Coordinator responds and dies
- Driver reads the response
- Driver detects that the coordinator is dead, notes that in
ClusterState
- Driver tries to perform schema agreement, and does that without using
the coordinator.

This issue will be fixed in next commits.
This fixes the issue described in parent commit.

Internal schema agreement APIs now accept a `Option<Uuid>` that is a
host id of a node that must be a part of schema agreement. For
user-requested agreements it will be None, and for agreements after DDL
it will be the coordinator.
@Lorak-mmk Lorak-mmk force-pushed the fix-schema-broken-connection branch from 978edf6 to 9caec57 Compare May 26, 2025 15:45
Comment on lines 123 to 148
{
// Case 1: Paused node is a coordinator for DDL.
// DDL needs to fail.
let result = run_some_ddl_with_paused_node(
NodeIdentifier::HostId(host_ids[1]),
1,
&session,
&mut running_proxy,
)
.await;
assert_matches!(
result,
Err(ExecutionError::SchemaAgreementError(
SchemaAgreementError::RequestError(
RequestAttemptError::BrokenConnectionError(_)
)
))
)
}

{
// Case 2: Paused node is NOT a coordinator for DDL.
// DDL should succeed, because auto schema agreement only needs available nodes to agree.
let result = run_some_ddl_with_paused_node(
NodeIdentifier::HostId(host_ids[2]),
1,
&session,
&mut running_proxy,
)
.await;
assert_matches!(result, Ok(_))
}

{
// Case 3: Paused node is a coordinator for DDL, and is used by control connection.
// It is the same as case 1, but paused node is also control connection.
// DDL needs to fail.
let result = run_some_ddl_with_paused_node(
NodeIdentifier::HostId(host_ids[0]),
0,
&session,
&mut running_proxy,
)
.await;
assert_matches!(
result,
Err(ExecutionError::SchemaAgreementError(
SchemaAgreementError::RequestError(
RequestAttemptError::BrokenConnectionError(_)
)
))
)
}

{
// Case 4: Paused node is NOT a coordinator for DDL, but is used by control connection.
// It is the same as case 2, but paused node is also control connection.
// DDL should succeed, because auto schema agreement only needs available nodes to agree,
// and control connection is not used for that at all.
let result = run_some_ddl_with_paused_node(
NodeIdentifier::HostId(host_ids[1]),
0,
&session,
&mut running_proxy,
)
.await;
assert_matches!(result, Ok(_))
}

running_proxy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 I'd also like to see a test case for the new CoordinatorAbsent (I may have changed the name) variant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also would like to see such a test case, but unfortunately I have no idea how to write it.
What would need to happen for this error to appear:

  • We send DDL to some node X
  • Node X responds.
  • We go to handle_auto_await_schema_agreement and start awaiting
  • Some schema check attempts may happen that end in schema still being diverged.
  • Node X becomes unreachable and driver notices it, making its pool Broken <- this is the important part
  • X pool is not present in next schema check, the new error is returned.

Do you have any idea how to recreate such race in a test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, the pool becomes Broken if the last connection is removed. A connection is removed once an error is encountered on that connection. If you make proxy break the connection, driver should immediately recognize it, correct?
If we mock (with the proxy) the first schema read as diverged, then drop the connection, won't this be enough?

Copy link
Collaborator

@wprzytula wprzytula May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use proxy rules to synchronize with the driver by putting some requests on the hold while we are breaking the connection on another node - use the delay setting of the proxy rules.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against any timing-based synchronization, it is a straight path towards flaky tests. Tomorrow I'll look into writing this test without using delays, but I'm not sure it will be possible with the current capabilities of the proxy.

@wprzytula wprzytula added the bug Something isn't working label May 26, 2025
@Lorak-mmk Lorak-mmk force-pushed the fix-schema-broken-connection branch 2 times, most recently from 36f7245 to e9a6a53 Compare May 26, 2025 16:10
@Lorak-mmk
Copy link
Collaborator Author

Addressed @wprzytula comments, apart from one because I don't know how to write requested test.

@Lorak-mmk Lorak-mmk requested review from muzarski and wprzytula May 26, 2025 16:15
@Lorak-mmk Lorak-mmk force-pushed the fix-schema-broken-connection branch from e9a6a53 to 0156169 Compare May 27, 2025 09:12
Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

one question: Why do we distinguish the test cases where paused node is used for control connection? Does control connection somehow affect the schema agreement after DDL?

@Lorak-mmk Lorak-mmk added this to the 1.2.0 milestone May 27, 2025
@Lorak-mmk Lorak-mmk merged commit 28ed6c4 into scylladb:main May 27, 2025
12 checks passed
@wprzytula wprzytula mentioned this pull request May 27, 2025
@Lorak-mmk Lorak-mmk deleted the fix-schema-broken-connection branch May 28, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Schema change statement fails if one node is not responding

3 participants