Skip to content

Commit 4cb101c

Browse files
committed
incorporate Simon's feedback
1 parent cf52c67 commit 4cb101c

File tree

4 files changed

+36
-41
lines changed

4 files changed

+36
-41
lines changed

modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ Verify that the following conditions exist before proceeding with failover:
9393
* Topics should be in `ACTIVE` state (not `FAULTED`).
9494
* Replication lag should be reasonable for your RPO requirements.
9595

96-
**Understanding replication lag**
96+
==== Understanding replication lag
9797

9898
Use xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] to check lag, which shows the message count difference between source and shadow partitions:
9999

@@ -135,7 +135,7 @@ Name: <topic-name>, State: ACTIVE
135135
1 2345 2579 2568 11
136136
----
137137

138-
The partition information shows:
138+
The partition information shows the following:
139139

140140
* **SRC_LSO**: Source partition last stable offset
141141
* **SRC_HWM**: Source partition high watermark

modules/manage/pages/disaster-recovery/shadowing/failover.adoc

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,15 @@ When you initiate failover, Redpanda performs the following operations:
3737

3838
Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported.
3939

40+
NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity.
41+
4042
== Failover commands
4143

4244
You can perform failover at different levels of granularity to match your disaster recovery needs:
4345

4446
=== Individual topic failover
4547

46-
To fail over a specific shadow topic while leaving other topics in the shadow link still replicating:
48+
To fail over a specific shadow topic while leaving other topics in the shadow link still replicating, run:
4749

4850
[,bash]
4951
----
@@ -54,7 +56,7 @@ Use this approach when you need to selectively failover specific workloads or wh
5456

5557
=== Complete shadow link failover (cluster failover)
5658

57-
To fail over all shadow topics associated with the shadow link simultaneously:
59+
To fail over all shadow topics associated with the shadow link simultaneously, run:
5860

5961
[,bash]
6062
----
@@ -82,6 +84,7 @@ Force deleting a shadow link is irreversible and immediately fails over all topi
8284
The shadow link itself has a simple state model:
8385

8486
* **`ACTIVE`**: Shadow link is operating normally, replicating data
87+
* **`PAUSED`**: Shadow link replication is temporarily halted by user action
8588

8689
Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics.
8790

@@ -93,10 +96,11 @@ Individual shadow topics progress through specific states during failover:
9396
* **`FAULTED`**: Shadow topic has encountered an error and is not replicating
9497
* **`FAILING_OVER`**: Failover initiated, replication stopping
9598
* **`FAILED_OVER`**: Failover completed successfully, topic fully writable
99+
* **`PAUSED`**: Replication temporarily halted by user action
96100

97101
== Monitor failover progress
98102

99-
Monitor failover progress using the status command:
103+
To monitor failover progress using the status command, run:
100104

101105
[,bash]
102106
----
@@ -105,7 +109,7 @@ rpk shadow status <shadow-link-name>
105109

106110
The output shows individual topic states and any issues encountered during the failover process. For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`].
107111

108-
**Task states during monitoring:**
112+
Task states during monitoring:
109113

110114
* **`ACTIVE`**: Task is operating normally and replicating data
111115
* **`FAULTED`**: Task encountered an error and requires attention
@@ -140,6 +144,8 @@ After successful failover, your shadow cluster exhibits the following characteri
140144

141145
== Failover considerations and limitations
142146

147+
Before implementing failover procedures, understand these key considerations that affect your disaster recovery strategy and operational planning.
148+
143149
**Data consistency:**
144150

145151
* Some data loss may occur due to replication lag at the time of failover.

modules/manage/pages/disaster-recovery/shadowing/monitor.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ rpk shadow status <shadow-link-name>
4242

4343
For troubleshooting specific issues, you can use command options to show individual status sections. See xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] for available status options. The status output includes:
4444

45-
* **Shadow link state**: Overall operational state (`ACTIVE`).
46-
* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`).
45+
* **Shadow link state**: Overall operational state (`ACTIVE`, `PAUSED`).
46+
* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`, `PAUSED`).
4747
* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks].
4848
* **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM).
4949

modules/manage/pages/disaster-recovery/shadowing/overview.adoc

Lines changed: 22 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,9 @@ Shadowing replicates:
3434

3535
Shadowing addresses enterprise disaster recovery requirements driven by regulatory compliance and business continuity needs. Organizations typically want to minimize both recovery time objective (RTO) and recovery point objective (RPO), and Shadowing asynchronous replication helps you achieve both goals by reducing data loss during regional outages and enabling rapid application recovery.
3636

37-
The architecture follows an active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates. If a disaster occurs, you can failover the shadow topics using the
38-
ifndef::env-cloud[Admin API]
39-
ifdef::env-cloud[Data Plane API]
40-
or `rpk`, making them fully writable. At that point, you can redirect your applications to the shadow cluster, which becomes the new production cluster.
37+
The architecture follows an active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates. If a disaster occurs, you can failover the shadow topics, making them fully writable. At that point, you can redirect your applications to the shadow cluster, which becomes the new production cluster.
38+
39+
NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity.
4140

4241

4342
ifdef::env-cloud[]
@@ -138,40 +137,30 @@ Partition count is always replicated to ensure the shadow topic matches the sour
138137

139138
The <<shadow-link-tasks,Source Topic Sync task>> handles topic property replication. For topic properties, Redpanda follows these replication rules:
140139

141-
[cols="1,1,1"]
142-
143-
|===
144-
|Never replicated |Always replicated |Always replicated +
145-
(unless `exclude_default` is `true`)
146-
147-
|`redpanda.remote.readreplica`
148-
|`max.message.bytes`
149-
|`compression.type`
150-
151-
|`redpanda.remote.recovery`
152-
|`cleanup.policy`
153-
|`retention.bytes`
140+
**Never replicated**
154141

155-
|`redpanda.remote.allowgaps`
156-
|`message.timestamp.type`
157-
|`retention.ms`
142+
* `redpanda.remote.readreplica`
143+
* `redpanda.remote.recovery`
144+
* `redpanda.remote.allowgaps`
145+
* `redpanda.virtual.cluster.id`
146+
* `redpanda.leaders.preference`
147+
* `redpanda.cloud_topic.enabled`
158148

159-
|`redpanda.virtual.cluster.id`
160-
|
161-
|`delete.retention.ms`
149+
**Always replicated**
162150

163-
|`redpanda.leaders.preference`
164-
|
165-
|`replication.factor`
151+
* `max.message.bytes`
152+
* `cleanup.policy`
153+
* `message.timestamp.type`
166154

167-
|`redpanda.cloud_topic.enabled`
168-
|
169-
|`min.compaction.lag.ms`
155+
**Always replicated (unless `exclude_default` is `true`)**
170156

171-
|
172-
|
173-
|`max.compaction.lag.ms`
174-
|===
157+
* `compression.type`
158+
* `retention.bytes`
159+
* `retention.ms`
160+
* `delete.retention.ms`
161+
* `replication.factor`
162+
* `min.compaction.lag.ms`
163+
* `max.compaction.lag.ms`
175164

176165
To replicate additional topic properties, explicitly list them in `synced_shadow_topic_properties`.
177166

0 commit comments

Comments
 (0)