Skip to content

DOC-11497 Docs for obs: Enabling troubleshooting hot spots externally (e.g., logs or metrics) #19577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Binary file added src/current/images/v25.2/detect-hotspots-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v25.2/detect-hotspots-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v25.2/detect-hotspots-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v25.2/detect-hotspots-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v25.2/detect-hotspots-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/current/images/v25.2/detect-hotspots-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 82 additions & 0 deletions src/current/v25.2/detect_hotspots.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: Detect Hotspots
summary: Learn how to detect hotspots using real-time monitoring and historical logs in CockroachDB.
toc: true
---

This page provides practical guidance on identifying common [hotspots]({% link {{ page.version.version }}/understand-hotspots.md %}) in CockroachDB clusters, using real-time monitoring and historical logs.

```
[Start]
|
[Is there a KV Latch Contention Alert?]
|
|-- Yes --> [Does popular key log exist?]
| |
| |-- Yes (write hotspot) --> [(A) Find hot ranges log, find table index] → [Mitigate hot key (find queries and refactor app)]
| |
| |-- No --> [Some other reason for latch contention]
|
|
|-- No --> [Is there a CPU metrics Alert?]
|
|-- Yes --> [Does popular key log exist?]
|
|-- Yes (read hotspot) → [Go to (A) Find hot ranges log]
|
|-- No --> [Does clear access log exist?]
|
|-- Yes --> [(B) Find hot ranges log, find table index] → [Mitigate hot index (change schema)]
|
|-- No --> [Some other reason for CPU skew]
```

This guide helps diagnose and mitigate issues related to KV latch contention and CPU usage alerts in a CockroachDB cluster. Use this workflow to identify potential hotspots and optimize query and schema performance.

## Before you begin

- Ensure you have access to the DB Console and relevant logs.
- Confirm that you have the necessary permissions to view metrics and modify the application or schema.

## Troubleshooting Steps

### 1. Check for KV Latch Contention Alert

If a KV latch contention alert is triggered:

- **Check if a popular key log exists:**
- **Yes (write hotspot):**
1. Locate the hot ranges log.
2. Identify the associated table and index.
3. Mitigate the hot key:
- Locate queries that target the hotspot.
- Refactor the application logic to distribute the load more evenly.
- **No:**
- Investigate other potential causes of latch contention.

If no KV latch contention alert is present, proceed to the next step.

### 2. Check for CPU Metrics Alert

If a CPU metrics alert is triggered:

- **Check if a popular key log exists:**
- **Yes (read hotspot):**
- Refer to the steps above:
1. Locate the hot ranges log.
2. Identify the associated table and index.
3. Mitigate the hot key:
- Locate queries that target the hotspot.
- Refactor the application logic.

- **If no popular key log exists, check for a clear access log:**
- **Yes (hot index):**
1. Locate the hot ranges log.
2. Identify the associated table and index.
3. Mitigate the hot index:
- Modify the schema to balance index usage, such as splitting or reorganizing indexes.

- **No:**
- Investigate other potential causes of CPU skew.

If no CPU metrics alert is present, no further action is needed.
378 changes: 378 additions & 0 deletions src/current/v25.2/detect_hotspots_all.md

Large diffs are not rendered by default.

Loading