Skip to content

Commit 1905b94

Browse files
committed
add shadowing metrics
1 parent aea6e82 commit 1905b94

File tree

2 files changed

+95
-17
lines changed

2 files changed

+95
-17
lines changed

modules/manage/pages/disaster-recovery/shadowing/monitor.adoc

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -56,36 +56,34 @@ Shadowing provides comprehensive metrics to track replication performance and he
5656
|===
5757
|Metric |Type |Description
5858

59-
|`redpanda_shadow_link_shadow_lag`
59+
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`]
6060
|Gauge
6161
|The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor by `shadow_link_name`, `topic`, and `partition` to understand replication lag for each partition.
6262

63-
|`redpanda_shadow_link_total_bytes_fetched`
64-
|Count
63+
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_fetched[`redpanda_shadow_link_total_bytes_fetched`]
64+
|Counter
6565
|The total number of bytes fetched by a sharded replicator (bytes received by the client). Labeled by `shadow_link_name` and `shard` to track data transfer volume from the source cluster.
6666

67-
|`redpanda_shadow_link_total_bytes_written`
68-
|Count
67+
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_bytes_written[`redpanda_shadow_link_total_bytes_written`]
68+
|Counter
6969
|The total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor data written to the shadow cluster.
7070

71-
|`redpanda_shadow_link_client_errors`
72-
|Count
73-
|The number of errors seen by the client. Track by `shadow_link_name` and `shard` to identify connection or protocol issues between clusters.
74-
75-
|`redpanda_shadow_link_shadow_topic_state`
71+
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_topic_state[`redpanda_shadow_link_shadow_topic_state`]
7672
|Gauge
7773
|Number of shadow topics in the respective states. Labeled by `shadow_link_name` and `state` to monitor topic state distribution across your shadow links.
7874

79-
|`redpanda_shadow_link_total_records_fetched`
80-
|Count
75+
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_fetched[`redpanda_shadow_link_total_records_fetched`]
76+
|Counter
8177
|The total number of records fetched by the sharded replicator (records received by the client). Monitor by `shadow_link_name` and `shard` to track message throughput from the source.
8278

83-
|`redpanda_shadow_link_total_records_written`
84-
|Count
79+
|xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_written[`redpanda_shadow_link_total_records_written`]
80+
|Counter
8581
|The total number of records written by a sharded replicator (records written to the write_at_offset_stm). Uses `shadow_link_name` and `shard` labels to monitor message throughput to the shadow cluster.
8682
|===
8783

88-
See also: xref:reference:public-metrics-reference.adoc[]
84+
For detailed descriptions of each metric, including usage examples and label definitions, see xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference].
85+
86+
See also: xref:reference:public-metrics-reference.adoc#shadow-link-metrics[Shadow Link metrics reference]
8987

9088
== Monitoring best practices
9189

@@ -106,8 +104,7 @@ rpk shadow status <shadow-link-name> | grep -E "LAG|Lag"
106104

107105
Configure monitoring alerts for following conditions, which indicate problems with Shadowing:
108106

109-
* **High replication lag**: When `redpanda_shadow_link_shadow_lag` exceeds your RPO requirements
110-
* **Connection errors**: When `redpanda_shadow_link_client_errors` increases rapidly
107+
* **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your RPO requirements
111108
* **Topic state changes**: When topics move to `FAULTED` state
112109
* **Task failures**: When replication tasks enter `FAULTED` or `NOT_RUNNING` states
113110
* **Throughput drops**: When bytes/records fetched drops significantly

modules/reference/pages/public-metrics-reference.adoc

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2343,6 +2343,87 @@ Total number of bytes uploaded for the topic to object storage.
23432343
- `redpanda_namespace`
23442344
- `redpanda_topic`
23452345

2346+
---
2347+
2348+
== Shadow Link metrics
2349+
2350+
=== redpanda_shadow_link_shadow_lag
2351+
2352+
The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor this metric to understand replication lag for each partition and ensure your RPO requirements are being met.
2353+
2354+
*Type*: gauge
2355+
2356+
*Labels*:
2357+
2358+
- `shadow_link_name` - Name of the shadow link
2359+
- `topic` - Topic name
2360+
- `partition` - Partition identifier
2361+
2362+
---
2363+
2364+
=== redpanda_shadow_link_shadow_topic_state
2365+
2366+
Number of shadow topics in the respective states. Monitor this metric to track the health and status distribution of shadow topics across your shadow links.
2367+
2368+
*Type*: gauge
2369+
2370+
*Labels*:
2371+
2372+
- `shadow_link_name` - Name of the shadow link
2373+
- `state` - Topic state (active, failed, paused, failing_over, failed_over, promoting, promoted)
2374+
2375+
---
2376+
2377+
=== redpanda_shadow_link_total_bytes_fetched
2378+
2379+
Total number of bytes fetched by a sharded replicator (bytes received by the client). Use this metric to track data transfer volume from the source cluster.
2380+
2381+
*Type*: counter
2382+
2383+
*Labels*:
2384+
2385+
- `shadow_link_name` - Name of the shadow link
2386+
- `shard` - Shard identifier
2387+
2388+
---
2389+
2390+
=== redpanda_shadow_link_total_bytes_written
2391+
2392+
Total number of bytes written by a sharded replicator (bytes written to the write_at_offset_stm). Use this metric to monitor data written to the shadow cluster.
2393+
2394+
*Type*: counter
2395+
2396+
*Labels*:
2397+
2398+
- `shadow_link_name` - Name of the shadow link
2399+
- `shard` - Shard identifier
2400+
2401+
---
2402+
2403+
=== redpanda_shadow_link_total_records_fetched
2404+
2405+
Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster.
2406+
2407+
*Type*: counter
2408+
2409+
*Labels*:
2410+
2411+
- `shadow_link_name` - Name of the shadow link
2412+
- `shard` - Shard identifier
2413+
2414+
---
2415+
2416+
=== redpanda_shadow_link_total_records_written
2417+
2418+
Total number of records written by a sharded replicator (records written to the write_at_offset_stm). Use this metric to monitor message throughput to the shadow cluster.
2419+
2420+
*Type*: counter
2421+
2422+
*Labels*:
2423+
2424+
- `shadow_link_name` - Name of the shadow link
2425+
- `shard` - Shard identifier
2426+
23462427
== Related topics
23472428

23482429
* xref:manage:monitoring.adoc[Learn how to monitor Redpanda]

0 commit comments

Comments
 (0)