Skip to content

Commit af7d703

Browse files
cyrilloskyukhin
authored andcommitted
txn_limbo: filter incoming synchro requests
When we receive synchro requests we can't just apply them blindly because in worst case they may come from split-brain configuration (where a cluster split into several clusters and each one has own leader elected, then clusters are trying to merge back into the original one). We need to do our best to detect such disunity and force these nodes to rejoin from the scratch for data consistency sake. Thus when we're processing requests we pass them to the packet filter first which validates their contents and refuse to apply if they violate consistency. Depending on request type each packet traverses an appropriate chain. filter_generic(): a common chain for any synchro packet. 1) request:replica_id = 0 allowed for PROMOTE request only. 2) request:replica_id should match limbo:owner_id, IOW the limbo migration should be noticed by all instances in the cluster. filter_confirm_rollback(): a chain for CONFIRM | ROLLBACK packets. 1) Zero lsn is disallowed for such requests. filter_promote_demote(): a chain for PROMOTE | DEMOTE packets. 1) The requests should come in with nonzero term, otherwise the packet is corrupted. 2) The request's term should not be less than maximal known one, iow it should not come in from nodes which didn't notice raft epoch changes and living in the past. filter_queue_boundaries(): a common finalization chain. 1) If LSN of the request matches current confirmed LSN the packet is obviously correct to process. 2) If LSN is less than confirmed LSN then the request is wrong, we have processed the requested LSN already. 3) If LSN is greater than confirmed LSN then a) If limbo is empty we can't do anything, since data is already processed and should issue an error; b) If there is some data in the limbo then requested LSN should be in range of limbo's [first; last] LSNs, thus the request will be able to commit and rollback limbo queue. Note the filtration is disabled during initial configuration where we apply requests from the only source of truth (either the remote master, or our own journal), so no split brain is possible. In order to make split-brain checks work, the applier nopify filter now passes synchro requests from obsolete term without nopifying them. Also, now ANY asynchronous request coming from an instance with obsolete term is treated as a split-brain. Think of it as of a syncrhonous request committed with a malformed quorum. Closes #5295 NO_DOC=it's literally below Co-authored-by: Serge Petrenko <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> @TarantoolBot document Title: new error type: ER_SPLIT_BRAIN If for some reason the cluster had 2 leaders working independently (for example, user has mistakenly lovered the quorum below N / 2 + 1), then once such leaders and their followers try connecting to each other, they will receive the ER_SPLIT_BRAIN error, and the connection will be aborted. This is done to preserve data integrity. Once the user notices such an error he or she has to manually inspect the data on both the split halves, choose a way to restore the data, and rebootstrap one of the halves from the other.
1 parent 9eab286 commit af7d703

File tree

9 files changed

+583
-63
lines changed

9 files changed

+583
-63
lines changed

changelogs/unreleased/gh-5295-split-brain-detection.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,8 @@
22

33
* Fixed a possible split-brain when old synchro queue owner might finalize the
44
transactions in presence of a new synchro queue owner (gh-5295).
5+
6+
* Fixed servers not noticing possible split-brain situations, for example when
7+
multiple leaders were working independently due to manually lowered quorum.
8+
Once a node discovers that it received some foreign data, it immediately
9+
stops replication from such a node with ER_SPLIT_BRAIN error (gh-5295).

src/box/applier.cc

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1200,23 +1200,27 @@ applier_synchro_filter_tx(struct stailq *rows)
12001200
if (!txn_limbo_is_replica_outdated(&txn_limbo, row->replica_id))
12011201
return;
12021202

1203-
if (stailq_last_entry(rows, struct applier_tx_row, next)->row.wait_sync)
1204-
goto nopify;
1205-
12061203
/*
1207-
* Not waiting for sync and not a synchro request - this make it already
1208-
* NOP or an asynchronous transaction not depending on any synchronous
1209-
* ones - let it go as is.
1210-
*/
1211-
if (!iproto_type_is_synchro_request(row->type))
1212-
return;
1213-
/*
1214-
* Do not NOPify promotion, otherwise won't even know who is the limbo
1215-
* owner now.
1204+
* We do not nopify promotion/demotion and confirm/rollback.
1205+
* Such syncrhonous requests should be filtered by txn_limbo to detect
1206+
* possible split brain situations.
1207+
*
1208+
* This means the only filtered out transactions are synchronous ones or
1209+
* the ones depending on them.
1210+
*
1211+
* Any asynchronous transaction from an obsolete term is a marker of
1212+
* split-brain by itself: consider it a synchronous transaction, which
1213+
* is committed with quorum 1.
12161214
*/
1217-
if (iproto_type_is_promote_request(row->type))
1215+
struct xrow_header *last_row =
1216+
&stailq_last_entry(rows, struct applier_tx_row, next)->row;
1217+
if (!last_row->wait_sync) {
1218+
if (iproto_type_is_dml(last_row->type)) {
1219+
tnt_raise(ClientError, ER_SPLIT_BRAIN,
1220+
"got an async transaction from an old term");
1221+
}
12181222
return;
1219-
nopify:;
1223+
}
12201224
struct applier_tx_row *item;
12211225
stailq_foreach_entry(item, rows, next) {
12221226
row = &item->row;

src/box/box.cc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3996,6 +3996,13 @@ box_cfg_xc(void)
39963996
if (box_set_election_mode() != 0)
39973997
diag_raise();
39983998

3999+
/*
4000+
* Enable split brain detection once node is fully recovered or
4001+
* bootstrapped. No split brain could happen during bootstrap or local
4002+
* recovery.
4003+
*/
4004+
txn_limbo_filter_enable(&txn_limbo);
4005+
39994006
title("running");
40004007
say_info("ready to accept requests");
40014008

src/box/errcode.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,7 @@ struct errcode_record {
296296
/*241 */_(ER_WRONG_SPACE_UPGRADE_OPTIONS, "Wrong space upgrade options: %s") \
297297
/*242 */_(ER_NO_ELECTION_QUORUM, "Not enough peers connected to start elections: %d out of minimal required %d")\
298298
/*243 */_(ER_SSL, "%s") \
299+
/*244 */_(ER_SPLIT_BRAIN, "Split-Brain discovered: %s") \
299300

300301
/*
301302
* !IMPORTANT! Please follow instructions at start of the file

0 commit comments

Comments
 (0)