You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/manage/pages/disaster-recovery/shadowing/failover-runbook.adoc
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -93,7 +93,7 @@ Verify that the following conditions exist before proceeding with failover:
93
93
* Topics should be in `ACTIVE` state (not `FAULTED`).
94
94
* Replication lag should be reasonable for your RPO requirements.
95
95
96
-
**Understanding replication lag**
96
+
==== Understanding replication lag
97
97
98
98
Use xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] to check lag, which shows the message count difference between source and shadow partitions:
99
99
@@ -135,7 +135,7 @@ Name: <topic-name>, State: ACTIVE
135
135
1 2345 2579 2568 11
136
136
----
137
137
138
-
The partition information shows:
138
+
The partition information shows the following:
139
139
140
140
* **SRC_LSO**: Source partition last stable offset
Copy file name to clipboardExpand all lines: modules/manage/pages/disaster-recovery/shadowing/failover.adoc
+10-4Lines changed: 10 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,13 +37,15 @@ When you initiate failover, Redpanda performs the following operations:
37
37
38
38
Topic failover is irreversible. Once failed over, topics cannot return to shadow mode, and automatic fallback to the original source cluster is not supported.
39
39
40
+
NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity.
41
+
40
42
== Failover commands
41
43
42
44
You can perform failover at different levels of granularity to match your disaster recovery needs:
43
45
44
46
=== Individual topic failover
45
47
46
-
To fail over a specific shadow topic while leaving other topics in the shadow link still replicating:
48
+
To fail over a specific shadow topic while leaving other topics in the shadow link still replicating, run:
47
49
48
50
[,bash]
49
51
----
@@ -54,7 +56,7 @@ Use this approach when you need to selectively failover specific workloads or wh
54
56
55
57
=== Complete shadow link failover (cluster failover)
56
58
57
-
To fail over all shadow topics associated with the shadow link simultaneously:
59
+
To fail over all shadow topics associated with the shadow link simultaneously, run:
58
60
59
61
[,bash]
60
62
----
@@ -82,6 +84,7 @@ Force deleting a shadow link is irreversible and immediately fails over all topi
82
84
The shadow link itself has a simple state model:
83
85
84
86
* **`ACTIVE`**: Shadow link is operating normally, replicating data
87
+
* **`PAUSED`**: Shadow link replication is temporarily halted by user action
85
88
86
89
Shadow links do not have dedicated failover states. Instead, the link's operational status is determined by the collective state of its shadow topics.
87
90
@@ -93,10 +96,11 @@ Individual shadow topics progress through specific states during failover:
93
96
* **`FAULTED`**: Shadow topic has encountered an error and is not replicating
* **`PAUSED`**: Replication temporarily halted by user action
96
100
97
101
== Monitor failover progress
98
102
99
-
Monitor failover progress using the status command:
103
+
To monitor failover progress using the status command, run:
100
104
101
105
[,bash]
102
106
----
@@ -105,7 +109,7 @@ rpk shadow status <shadow-link-name>
105
109
106
110
The output shows individual topic states and any issues encountered during the failover process. For detailed command options, see xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`].
107
111
108
-
**Task states during monitoring:**
112
+
Task states during monitoring:
109
113
110
114
* **`ACTIVE`**: Task is operating normally and replicating data
111
115
* **`FAULTED`**: Task encountered an error and requires attention
@@ -140,6 +144,8 @@ After successful failover, your shadow cluster exhibits the following characteri
140
144
141
145
== Failover considerations and limitations
142
146
147
+
Before implementing failover procedures, understand these key considerations that affect your disaster recovery strategy and operational planning.
148
+
143
149
**Data consistency:**
144
150
145
151
* Some data loss may occur due to replication lag at the time of failover.
Copy file name to clipboardExpand all lines: modules/manage/pages/disaster-recovery/shadowing/monitor.adoc
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,8 +42,8 @@ rpk shadow status <shadow-link-name>
42
42
43
43
For troubleshooting specific issues, you can use command options to show individual status sections. See xref:reference:rpk/rpk-shadow/rpk-shadow-status.adoc[`rpk shadow status`] for available status options. The status output includes:
44
44
45
-
* **Shadow link state**: Overall operational state (`ACTIVE`).
46
-
* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`).
45
+
* **Shadow link state**: Overall operational state (`ACTIVE`, `PAUSED`).
46
+
* **Individual topic states**: Current state of each replicated topic (`ACTIVE`, `FAULTED`, `FAILING_OVER`, `FAILED_OVER`, `PAUSED`).
47
47
* **Task status**: Health of replication tasks across brokers (`ACTIVE`, `FAULTED`, `NOT_RUNNING`, `LINK_UNAVAILABLE`). For details about shadow link tasks, see xref:manage:disaster-recovery/shadowing/setup.adoc#shadow-link-tasks[Shadow link tasks].
48
48
* **Lag information**: Replication lag per partition showing source vs shadow high watermarks (HWM).
Copy file name to clipboardExpand all lines: modules/manage/pages/disaster-recovery/shadowing/overview.adoc
+22-33Lines changed: 22 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,10 +34,9 @@ Shadowing replicates:
34
34
35
35
Shadowing addresses enterprise disaster recovery requirements driven by regulatory compliance and business continuity needs. Organizations typically want to minimize both recovery time objective (RTO) and recovery point objective (RPO), and Shadowing asynchronous replication helps you achieve both goals by reducing data loss during regional outages and enabling rapid application recovery.
36
36
37
-
The architecture follows an active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates. If a disaster occurs, you can failover the shadow topics using the
38
-
ifndef::env-cloud[Admin API]
39
-
ifdef::env-cloud[Data Plane API]
40
-
or `rpk`, making them fully writable. At that point, you can redirect your applications to the shadow cluster, which becomes the new production cluster.
37
+
The architecture follows an active-passive pattern. The source cluster processes all production traffic while the shadow cluster remains in read-only mode, continuously receiving updates. If a disaster occurs, you can failover the shadow topics, making them fully writable. At that point, you can redirect your applications to the shadow cluster, which becomes the new production cluster.
38
+
39
+
NOTE: To avoid a split-brain scenario after failover, ensure that all clients are reconfigured to point to the shadow cluster before resuming write activity.
41
40
42
41
43
42
ifdef::env-cloud[]
@@ -138,40 +137,30 @@ Partition count is always replicated to ensure the shadow topic matches the sour
138
137
139
138
The <<shadow-link-tasks,Source Topic Sync task>> handles topic property replication. For topic properties, Redpanda follows these replication rules:
0 commit comments