-
Notifications
You must be signed in to change notification settings - Fork 47
add shadowing metrics #1496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add shadowing metrics #1496
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR updates documentation across three files. The Antora playbook configuration is modified to source cloud-docs from a specific feature branch instead of main. The disaster recovery shadowing monitor documentation is expanded with cross-references to metric documentation and additional best practices guidance. The public metrics reference documentation receives a new Shadow Link metrics section describing eight metrics with their types and labels. Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Areas requiring attention:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-authored-by: Paulo Borges <[email protected]>
3118ad6 to
aea6e82
Compare
473f4e0 to
a90f4de
Compare
# Conflicts: # modules/manage/pages/disaster-recovery/shadowing/monitor.adoc
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
local-antora-playbook.yml(1 hunks)modules/manage/pages/disaster-recovery/shadowing/monitor.adoc(2 hunks)modules/reference/pages/public-metrics-reference.adoc(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-16T19:33:20.420Z
Learnt from: Feediver1
Repo: redpanda-data/docs PR: 1153
File: modules/reference/pages/properties/topic-properties.adoc:45-50
Timestamp: 2025-07-16T19:33:20.420Z
Learning: In the Redpanda documentation, topic property cross-references like <<max.compaction.lag.ms>> and <<min.compaction.lag.ms>> require corresponding property definition sections with anchors like [[maxcompactionlagms]] and [[mincompactionlagms]] to prevent broken links.
Applied to files:
modules/reference/pages/public-metrics-reference.adocmodules/manage/pages/disaster-recovery/shadowing/monitor.adoc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Redirect rules - redpanda-docs-preview
- GitHub Check: Header rules - redpanda-docs-preview
- GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (3)
local-antora-playbook.yml (1)
19-20: Verify this is a temporary feature branch reference.The cloud-docs source is pinned to a specific feature branch instead of
main. Confirm this is intentional for development and that it will be updated back tomainbefore final merge.modules/manage/pages/disaster-recovery/shadowing/monitor.adoc (2)
59-86: Cross-references depend on anchor definitions in public-metrics-reference.adoc.This table uses xref links to target specific metric anchors (e.g.,
#redpanda_shadow_link_client_errors). These links will fail unless the corresponding anchor definitions are added to public-metrics-reference.adoc. Ensure anchors are added to that file before this documentation is published.
94-118: Comprehensive monitoring guidance.The health check procedures and alert conditions are well-structured and provide clear guidance for operators. The new "Link unavailability" alert and cross-reference to Shadow link tasks improve documentation completeness.
| [[redpanda_shadow_link_shadow_lag]] | ||
| === redpanda_shadow_link_shadow_lag | ||
|
|
||
| The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor this metric to understand replication lag for each partition and ensure your RPO requirements are being met. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The lag of the shadow partition against the source partition, calculated as source partition LSO (Last Stable Offset) minus shadow partition HWM (High Watermark). Monitor this metric to understand replication lag for each partition and ensure your RPO requirements are being met. | |
| The lag of the shadow partition against the source partition, calculated as source partition last stable offset (LSO) minus shadow partition high watermark (HWM). Monitor this metric to understand replication lag for each partition and ensure your recovery point objective (RPO) requirements are being met. |
|
|
||
| * **High replication lag**: When `redpanda_shadow_link_shadow_lag` exceeds your RPO requirements | ||
| * **Connection errors**: When `redpanda_shadow_link_client_errors` increases rapidly | ||
| * **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your RPO requirements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your RPO requirements | |
| * **High replication lag**: When xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_shadow_lag[`redpanda_shadow_link_shadow_lag`] exceeds your recovery point objective (RPO) requirements |
micheleRP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a little suggestion with acronyms, but lgtm!
IoannisRP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from a comment on total_records, everything else looks good 👍
| * xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_fetched[`redpanda_shadow_link_total_records_fetched`] - Monitor message throughput from source cluster | ||
| * xref:reference:public-metrics-reference.adoc#redpanda_shadow_link_total_records_written[`redpanda_shadow_link_total_records_written`] - Track message throughput to shadow cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure throughput is a good word here. Throughput means rate, while this is the total records fetched/written.
Total number of records fetched by the sharded replicator (records received by the client). Monitor this metric to track message throughput from the source cluster.
When it is next to the explanation, it's good. When it's alone, it might give the wrong impression of what this metric is.
Description
Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
What's new
Metrics reference
Shadowing monitor
Checks