-
Notifications
You must be signed in to change notification settings - Fork 1.1k
chore: force shutdown connections on takeover #6142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: kpr33
Are you sure you want to change the base?
Conversation
Signed-off-by: Kostas Kyrimis <[email protected]>
src/server/dflycmd.cc
Outdated
| #include <utility> | ||
|
|
||
| #include "absl/cleanup/cleanup.h" | ||
| #include "absl/debugging/stacktrace.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the hell 😮💨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. 1 suggestion posted.
Comment augment review to trigger a new review at any time.
Signed-off-by: Kostas Kyrimis <[email protected]>
| << ", is_sending=" << dfly_conn->IsSending() << ", idle_read=" << idle_read | ||
| << ", stuck_sending=" << stuck_sending; | ||
| } | ||
| conn_refs.push_back(dfly_conn->Borrow()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shut the connections anyway regardless of if they are stuck on send or not. The rationale behind this is that a connection can become active at any point if the client starts reading from the socket. If that's the case, we will fail again because the connection won't show up as stuck. To mitigate this, I force shutdown all connections that were not closed previously
| ForceShutdownStuckConnections(uint64_t(timeout)); | ||
|
|
||
| // Safety net. | ||
| facade::DispatchTracker tracker{sf_->GetNonPriviligedListeners(), cntx->conn(), false, false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really needed to wait again tbh. We forcefully kill connections and the ones that were in good state were already shutdown (because of the previous dispatch tracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connections stuck on socket writes are not all the kinds of bad connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, but I kill them all anyway. Even for connections stuck on sent, it could be that the client reads so the send gets unstuck. We should shutdown the connection regardless as we can't know when the client will read
| for (auto& ref : conn_refs) { | ||
| facade::Connection* conn = ref.Get(); | ||
| if (conn) { | ||
| conn->ShutdownSelfBlocking(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should do it from the connection thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its unsafe to do so from foreign threads, WeakRef has it in the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I will also sent a separate PR as we have this bug in another place as well
| vector<facade::Connection::WeakRef> conn_refs; | ||
| auto cb = [&](unsigned thread_index, util::Connection* conn) { | ||
| facade::Connection* dcon = static_cast<facade::Connection*>(conn); | ||
| LOG(INFO) << dcon->DebugInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe vlog? and more meaningful message
dranikpg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it today. I haven't edited the code in years, but it seems that we don't really have clearly defined boundaries anymore in what the process should be in theory
The rationale a long time ago was really simple: once we set TAKEN_OVER state, not all connections might have finished running transactions. So to keep data integrity/operation atomicity, we have to wait for all of them to finish. Nothing more.
It was easiest to base this mechanism just on a new message type, because we already had the queue set up with different types.
| for (auto& ref : conn_refs) { | ||
| facade::Connection* conn = ref.Get(); | ||
| if (conn) { | ||
| conn->ShutdownSelfBlocking(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its unsafe to do so from foreign threads, WeakRef has it in the comments
| return; | ||
| } | ||
|
|
||
| bool idle_read = phase == Phase::READ_SOCKET && dfly_conn->idle_time() > timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is idle_read really an interesting state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, being blocked on pipeline overflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but that's for the logs, we kill all the connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand actually. How idle read can happen for stuck connections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romange because in DispatchSingle we block when pipeline is over limits but we set last_interaction_ = time(nullptr); after we block. This shall reflect that.
Either way I kill all the connections unconditionally. I added the logs because I wanted us to be aware of those stucked cases
| ForceShutdownStuckConnections(uint64_t(timeout)); | ||
|
|
||
| // Safety net. | ||
| facade::DispatchTracker tracker{sf_->GetNonPriviligedListeners(), cntx->conn(), false, false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connections stuck on socket writes are not all the kinds of bad connections
| shard_set->pool()->AwaitFiberOnAll([&](unsigned index, auto* pb) { | ||
| sf_->CancelBlockingOnThread(); | ||
| tracker.TrackOnThread(); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can shut them down from this callback. The most important detail is that we care only about connections that have been running before taken over was set. We can't track them anymore here, but for example, if a connection connected in-between ForceShutdownStuckConnections and this callback and caused you trouble - we don't care about it - it can't do any mutable operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't track them anymore here, but for example, if a connection connected in-between ForceShutdownStuckConnections and this callback and caused you trouble - we don't care about i
That's true but is there a point to "filter/select" these? Just kill them all. We are on force path anyway
| return; | ||
| } | ||
|
|
||
| bool idle_read = phase == Phase::READ_SOCKET && dfly_conn->idle_time() > timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, being blocked on pipeline overflow
| // Helper for takeover flow. Sometimes connections get stuck on send (because of pipelines) | ||
| // and this causes the takeover flow to fail because checkpoint messages are not processed. | ||
| // This function force shuts down those connection and allows the node to complete the takeover. | ||
| void ForceShutdownStuckConnections(uint64_t timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't necessarily a pipeline, it can also be a connection stuck on a large send
|
I will work and it will be more reliable than what we currently have, I just have this feeling that we overcomplicated because of not changing the message approach. Please see the small comments
Maybe this will be a good moment to transition to a better approach |
| ABSL_DECLARE_FLAG(bool, info_replication_valkey_compatible); | ||
| ABSL_DECLARE_FLAG(uint32_t, replication_timeout); | ||
| ABSL_DECLARE_FLAG(uint32_t, shard_repl_backlog_len); | ||
| ABSL_FLAG(bool, experimental_force_takeover, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer we have the flag and improve TAKEOVER than adding those micro behaviors.
Are you suggesting this or was it informational ? I mean, we got Checkpoint which we don't process because we are stuck. Am I missing something ? |
|
@kostasrim can you please write in the PR description the existing algorithm behind takeover? specifically, Is it a stateful flow operation where we tell listener to stop accepting new connections, mark all connections as shutting down and trying to break the existing flows or we just try to break the existing connections while new ones may be added, during this time ? Maybe it's time to revisit the algorithm and to implement something sound? |
I read it after I wrote my comment and I approve this message :) |
I updated the description with the algorithm + some small explanations.
Yes but plz read what I wrote in "thinking out loud" #6114 (comment). I would like to revisit this and I am happy to write a small doc internally on what to do but I would first wait for the changes in the connection fiber/pipelining we are implementing. Once that's in place, we will adjust/redesign accordingly. When we become fully asynchronous for example, a My rationale for this PR is to for now have a temporary solution and allow dragonfly to complete the takeover in an intrusive/forceful way until we slowly transition to a new approach. |
|
I added additional logs which I hope will shed more light on why takeover fails BESIDES the stuck send explanation. |
Signed-off-by: Kostas Kyrimis <[email protected]>
|
@dranikpg hope I didn't forget anything |
TakeOver algorithm as is
Imagine the simple setup: node 1 master, node 2 replica and node 2 attempts takeover on node 1.
For node 1:Steps that affect TakeOver result/state:
Optional steps that improve other flows:
Notes
Dragonfly is still accepting new connections during the takeover. However, once dragonfly switches to TAKEN_OVER state, connections won’t be able to execute commands and they will get “busy loading” error
Checkpoints are needed for data integrity. Once all the connections process the checkpoints it means that any other command in their dispatch queue will fail with busy loading. Hence, after step(2) there won’t be any change in the state (storage) and after step(3) the taking over node is on data parity.
This PR:
A step between 2 and 3 to fix non responding connections and force the takeover by forcefully shutting them down.
Follows up #6135. Add a flag
experimental_force_takeover. If takeover times out and this flag is set dragonfly will force shutdown open connections such that it doesn't fail the takeover.Resolves: #6114