Skip to content

Conversation

amotl
Copy link
Member

@amotl amotl commented Sep 14, 2025

About

Continue adding integration guides from the community forum.

Preview

References

Copy link

coderabbitai bot commented Sep 14, 2025

Walkthrough

Adds a Kafka URL pattern to docs linkcheck ignore anchors, converts an external MSSQL grid-item link to an internal reference and hidden toctree on the Debezium index page, and introduces a new Debezium tutorial documenting end-to-end MSSQL → Kafka → CrateDB replication using Debezium and JDBC sink.

Changes

Cohort / File(s) Summary of Changes
Docs config: linkcheck anchors ignore
docs/conf.py
Added a Kafka documentation URL pattern and accompanying comment to linkcheck_anchors_ignore_for_url; updated PyPI pattern formatting (trailing comma). No other logic changes.
Debezium docs: index update
docs/integrate/debezium/index.md
Replaced an external MSSQL grid-item link with an internal reference (debezium-tutorial) and added a hidden toctree entry pointing to the tutorial.
Debezium docs: new tutorial
docs/integrate/debezium/tutorial.md
Added a new tutorial with step-by-step setup for MSSQL, CrateDB, Zookeeper, Kafka, a custom Debezium image, Debezium source connector config, CrateDB JDBC sink config, Docker/REST commands, and verification steps demonstrating insert/update replication.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as User
    participant KC as Kafka Connect (REST)
    participant DZ as Debezium SQL Server Connector
    participant K as Kafka
    participant SK as JDBC Sink (CrateDB)
    participant MS as MSSQL
    participant CR as CrateDB

    Note over U,KC: Deploy connectors via REST
    U->>KC: POST /connectors (Debezium source config)
    KC-->>U: 201 Created
    U->>KC: POST /connectors (CrateDB sink config)
    KC-->>U: 201 Created

    rect rgb(236,248,255)
    Note over MS,DZ: CDC enabled on MSSQL
    MS-->>DZ: CDC events
    DZ->>K: Produce events to topics
    end

    rect rgb(240,255,240)
    SK->>K: Consume topics
    SK->>CR: Upsert rows (INSERT/UPDATE)
    CR-->>SK: ACK
    end

    Note over U,CR: Verify replication
    U->>CR: SELECT ... (validate replicated data)
    CR-->>U: Result set
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

new content, cross linking

Suggested reviewers

  • hammerhead
  • karynzv

Poem

I hop through docs with eager cheer,
Debezium streams and Kafka near.
MSSQL whispers change,
CrateDB greets each range.
A crunchy carrot for guides new 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly describes the introduction of a Debezium tutorial focused on replicating data from MSSQL, which matches the main addition of a tutorial document and related documentation changes.
Description Check ✅ Passed The pull request description references adding integration guides from the community forum, provides a preview link to the new Debezium guide, and cites relevant issue GH-102, which aligns with the changes to documentation and tutorial content.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch debezium

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@amotl amotl force-pushed the debezium branch 2 times, most recently from acbdf10 to 5251470 Compare September 14, 2025 16:55
@amotl amotl added the refactoring Changing shape or layout, or moving content around. label Sep 23, 2025
@amotl amotl requested review from hammerhead and karynzv September 30, 2025 08:49
@amotl amotl marked this pull request as ready for review September 30, 2025 08:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
docs/integrate/debezium/tutorial.md (2)

138-138: Consider adding a note about obtaining the download URL.

The placeholder ************* for the kafka-connect-jdbc download URL is appropriate for a tutorial, but consider adding a brief note that users should obtain the actual download link from the Confluent Hub page linked on line 133.


273-273: Consider hosting the image within the repository.

The tutorial references an external image hosted on a Discourse CDN. This creates a dependency on the external service and the image may break if the Discourse post is modified or removed.

Consider:

  1. Downloading and hosting the image in the repository (e.g., in docs/_static/images/)
  2. Using a relative path reference
  3. Alternatively, replacing the image with text description of the expected result

This ensures the tutorial remains functional even if external resources become unavailable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88820d8 and 9846d14.

📒 Files selected for processing (3)
  • docs/conf.py (1 hunks)
  • docs/integrate/debezium/index.md (2 hunks)
  • docs/integrate/debezium/tutorial.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/integrate/debezium/tutorial.md

23-23: Spaces inside link text

(MD039, no-space-in-links)


42-42: Hard tabs
Column: 1

(MD010, no-hard-tabs)


43-43: Hard tabs
Column: 1

(MD010, no-hard-tabs)


44-44: Hard tabs
Column: 1

(MD010, no-hard-tabs)


45-45: Hard tabs
Column: 1

(MD010, no-hard-tabs)


63-63: Hard tabs
Column: 1

(MD010, no-hard-tabs)


64-64: Hard tabs
Column: 1

(MD010, no-hard-tabs)


65-65: Hard tabs
Column: 1

(MD010, no-hard-tabs)


67-67: Hard tabs
Column: 1

(MD010, no-hard-tabs)


68-68: Hard tabs
Column: 1

(MD010, no-hard-tabs)


69-69: Hard tabs
Column: 1

(MD010, no-hard-tabs)


70-70: Hard tabs
Column: 1

(MD010, no-hard-tabs)


71-71: Hard tabs
Column: 1

(MD010, no-hard-tabs)


101-101: Hard tabs
Column: 1

(MD010, no-hard-tabs)


102-102: Hard tabs
Column: 1

(MD010, no-hard-tabs)


103-103: Hard tabs
Column: 1

(MD010, no-hard-tabs)


104-104: Hard tabs
Column: 1

(MD010, no-hard-tabs)


144-144: Hard tabs
Column: 23

(MD010, no-hard-tabs)


160-160: Hard tabs
Column: 1

(MD010, no-hard-tabs)


161-161: Hard tabs
Column: 1

(MD010, no-hard-tabs)


162-162: Hard tabs
Column: 1

(MD010, no-hard-tabs)


182-182: Hard tabs
Column: 1

(MD010, no-hard-tabs)


183-183: Hard tabs
Column: 1

(MD010, no-hard-tabs)


184-184: Hard tabs
Column: 1

(MD010, no-hard-tabs)


185-185: Hard tabs
Column: 1

(MD010, no-hard-tabs)


186-186: Hard tabs
Column: 1

(MD010, no-hard-tabs)


187-187: Hard tabs
Column: 1

(MD010, no-hard-tabs)


191-191: Hard tabs
Column: 1

(MD010, no-hard-tabs)


193-193: Hard tabs
Column: 1

(MD010, no-hard-tabs)


196-196: Hard tabs
Column: 1

(MD010, no-hard-tabs)


196-196: Hard tabs
Column: 89

(MD010, no-hard-tabs)


218-218: Hard tabs
Column: 26

(MD010, no-hard-tabs)


220-220: Hard tabs
Column: 1

(MD010, no-hard-tabs)


221-221: Hard tabs
Column: 1

(MD010, no-hard-tabs)


221-221: Hard tabs
Column: 33

(MD010, no-hard-tabs)


222-222: Hard tabs
Column: 1

(MD010, no-hard-tabs)


222-222: Hard tabs
Column: 52

(MD010, no-hard-tabs)


224-224: Hard tabs
Column: 1

(MD010, no-hard-tabs)


225-225: Hard tabs
Column: 1

(MD010, no-hard-tabs)


229-229: Hard tabs
Column: 1

(MD010, no-hard-tabs)


230-230: Hard tabs
Column: 1

(MD010, no-hard-tabs)


231-231: Hard tabs
Column: 1

(MD010, no-hard-tabs)


231-231: Hard tabs
Column: 29

(MD010, no-hard-tabs)


232-232: Hard tabs
Column: 1

(MD010, no-hard-tabs)


233-233: Hard tabs
Column: 1

(MD010, no-hard-tabs)

🔇 Additional comments (6)
docs/conf.py (1)

86-91: LGTM! Clear and helpful linkcheck configuration.

The addition of the Kafka URL pattern to linkcheck_anchors_ignore_for_url is appropriate for the Debezium tutorial. The inline comment clearly explains the issue with Kafka documentation anchors.

docs/integrate/debezium/index.md (2)

30-32: LGTM! Proper use of internal reference.

The change from an external URL to an internal reference (debezium-tutorial) correctly links to the new tutorial document. This ensures consistent navigation within the documentation.


44-48: LGTM! Proper Sphinx toctree configuration.

The hidden toctree with maxdepth: 1 is the appropriate way to include the tutorial in the documentation structure while keeping it hidden from the main navigation sidebar. The reference to tutorial correctly points to the tutorial.md file.

docs/integrate/debezium/tutorial.md (3)

107-128: LGTM! Proper Kafka and Zookeeper setup for demo purposes.

The Docker commands and Kafka topic creation are appropriate for a tutorial environment. The reference to production settings documentation is helpful.


175-237: LGTM! Solid connector configurations.

Both the source (MSSQL) and destination (CrateDB) connector configurations are well-structured:

  • Source connector properly configured for CDC with appropriate Kafka topics
  • Destination connector correctly uses upsert mode with primary key
  • The ExtractNewRecordState transform properly unwraps CDC event structure
  • PostgreSQL JDBC protocol usage for CrateDB is correct

247-277: LGTM! Clear testing instructions and conclusion.

The testing section provides practical, step-by-step verification of both INSERT and UPDATE operations, demonstrating the end-to-end CDC replication flow. The conclusion effectively summarizes the benefits of the approach.

@amotl amotl force-pushed the debezium branch 3 times, most recently from a99556d to 8594878 Compare September 30, 2025 13:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/integrate/debezium/tutorial.md (1)

23-23: Fix punctuation for improved readability.

The sentence structure can be improved by replacing the comma with a period before introducing Debezium.

Apply this diff:

-Enter Debezium, [Debezium](https://debezium.io/) is a standard open-source system, built on top of Kafka, which allows to capture changes on a source database system and replicate them on another system without having to write custom scripts.
+Enter Debezium. [Debezium](https://debezium.io/) is a standard open-source system, built on top of Kafka, which allows to capture changes on a source database system and replicate them on another system without having to write custom scripts.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9846d14 and 8594878.

📒 Files selected for processing (3)
  • docs/conf.py (1 hunks)
  • docs/integrate/debezium/index.md (2 hunks)
  • docs/integrate/debezium/tutorial.md (1 hunks)
🧰 Additional context used
🪛 Gitleaks (8.28.0)
docs/integrate/debezium/tutorial.md

[high] 88-88: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 222-222: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (17)
docs/conf.py (1)

88-93: LGTM! Configuration correctly extended for Kafka documentation links.

The addition of the Kafka domain pattern to linkcheck_anchors_ignore_for_url is appropriate given the new Debezium tutorial references Kafka documentation. The explanatory comment and trailing comma follow existing conventions.

docs/integrate/debezium/index.md (2)

30-32: LGTM! Link correctly changed to internal reference.

The MSSQL grid item link has been properly updated from an external URL to an internal reference (debezium-tutorial), which correctly points to the new tutorial document.


44-48: LGTM! Toctree structure properly configured.

The hidden toctree with maxdepth: 1 correctly integrates the new tutorial into the documentation structure while keeping it out of the main navigation display.

docs/integrate/debezium/tutorial.md (14)

37-46: LGTM! MSSQL database and table setup is correct.

The SQL commands properly create the database, table with identity primary key, and default values. The structure is appropriate for CDC demonstration.


49-55: LGTM! Security setup follows best practices.

The SQL commands create a dedicated login and user for Debezium with appropriate role-based permissions (db_datareader and custom debeziumrole), following the principle of least privilege.


59-72: LGTM! CDC enablement is properly configured.

The SQL commands correctly enable CDC at the database level, create a dedicated filegroup for CDC data, and enable CDC on the specific table with appropriate role permissions.


78-81: LGTM! CrateDB setup is appropriate for tutorial.

The Docker command correctly configures a single-node CrateDB instance with proper port mappings and heap size suitable for local testing.


87-89: LGTM! User creation with appropriate credentials.

The example password is suitable for tutorial purposes. In production, users should replace with a strong password as indicated by the context.

Note: Static analysis flagged this as a potential secret, but it's an example password for tutorial purposes and is expected here.


99-105: LGTM! CrateDB table definition correctly mirrors source structure.

The table definition properly includes:

  • Primary key matching the source for upsert operations
  • TIMESTAMP type for datetime conversion
  • Helpful inline comments explaining design decisions

113-116: LGTM! Kafka and Zookeeper setup is correct.

The Docker commands properly configure container networking with appropriate port mappings and host resolution for inter-container communication.


120-125: LGTM! Kafka topics correctly configured for Connect.

The topic creation commands properly set cleanup.policy=compact for the configuration and offset topics, which is required for Kafka Connect's internal state management.


136-143: LGTM! Setup instructions are clear.

The bash commands properly prepare the directory structure and extract the Kafka Connect JDBC connector. The placeholder URL at line 138 correctly indicates users need to obtain the actual download link from Confluent Hub.


144-152: LGTM! Dockerfile correctly extends Debezium image.

The Dockerfile properly:

  • Uses the official Debezium Connect base image
  • Switches to root for installation operations
  • Copies the JDBC connector files
  • Downloads the PostgreSQL JDBC driver (needed for CrateDB connectivity)
  • Switches back to non-root user for security

156-167: LGTM! Kafka Connect configuration is correct.

The docker run command properly configures:

  • Required Kafka Connect properties (group.id, storage topics)
  • Network host resolution for container communication
  • Bootstrap servers pointing to local Kafka
  • JSON converters for key/value serialization

213-237: LGTM! Destination connector properly configured for CrateDB.

The JdbcSinkConnector configuration correctly:

  • Uses PostgreSQL JDBC URL (CrateDB is wire-protocol compatible)
  • Configures upsert mode with primary key for handling updates
  • Applies Debezium's ExtractNewRecordState transform to unwrap CDC envelopes
  • Disables auto-creation/evolution since the table is pre-defined

Note: Static analysis flagged line 222 as a potential secret, but this is the same example password used consistently in the tutorial and is expected.


247-272: LGTM! Testing sequence demonstrates end-to-end functionality.

The testing section effectively demonstrates:

  • INSERT operation replication from MSSQL to CrateDB
  • UPDATE operation handling via CDC and upsert
  • Verification queries on the CrateDB side

This provides clear validation that the setup is working correctly.


275-277: LGTM! Conclusion effectively summarizes the benefits.

The conclusion clearly states the value proposition: using Debezium enables change data capture from various database systems to CrateDB without custom development, allowing users to leverage CrateDB's analytical capabilities.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/integrate/debezium/tutorial.md (1)

273-273: External image reference noted in previous review.

As flagged in a previous review comment, line 273 references an external Discourse CDN image. Consider downloading the image into the repository (e.g., docs/integrate/debezium/images/update-result.png) or replacing with a code block showing expected output to ensure documentation works offline and the link doesn't break.

🧹 Nitpick comments (1)
docs/integrate/debezium/tutorial.md (1)

148-148: Update PostgreSQL JDBC driver to latest stable (42.7.8)
docs/integrate/debezium/tutorial.md:148 – replace “postgresql-42.5.4.jar” with “postgresql-42.7.8.jar” in the curl command.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8594878 and cad0906.

📒 Files selected for processing (3)
  • docs/conf.py (1 hunks)
  • docs/integrate/debezium/index.md (2 hunks)
  • docs/integrate/debezium/tutorial.md (1 hunks)
🧰 Additional context used
🪛 Gitleaks (8.28.0)
docs/integrate/debezium/tutorial.md

[high] 88-88: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 222-222: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (10)
docs/conf.py (1)

88-93: LGTM! Linkcheck configuration properly extended.

The addition of the Kafka domain pattern to linkcheck_anchors_ignore_for_url is appropriate given the anchor-checking issues with Kafka documentation. The explanatory comment is helpful, and the trailing comma addition on line 90 improves maintainability.

docs/integrate/debezium/tutorial.md (7)

1-26: Excellent introduction and problem framing.

The introduction effectively establishes the business case for CDC replication from OLTP to OLAP systems and clearly positions Debezium and CrateDB as solutions. The narrative flow is logical and accessible.


27-72: MSSQL setup steps are clear and comprehensive.

The SQL commands for database setup, user provisioning, and CDC enablement follow MSSQL best practices. The placeholder password on line 50 is appropriate for tutorial documentation.


73-105: CrateDB setup is well-documented with helpful inline comments.

The Docker setup command, user creation, permissions, and table DDL are all appropriate. The inline comments on lines 101-102 explaining the PK requirement and default value handling add valuable context.


106-128: Kafka and Zookeeper setup is straightforward and appropriate.

The Docker commands and Kafka topic creation steps are correct for a Debezium setup. The reference to Kafka topic configuration documentation on line 127 aligns well with the linkcheck anchor ignore pattern added in docs/conf.py.


175-199: Source connector configuration is functional.

The Debezium SQL Server source connector configuration is complete and will work as documented. Note that deprecated properties (table.whitelist and legacy database.history.kafka.* settings) are tracked in issue #350 for future updates.


209-246: Destination connector configuration is properly structured.

The JDBC sink connector configuration correctly uses the PostgreSQL JDBC driver (CrateDB's wire protocol), sets up upsert mode for CDC updates, and includes the necessary Debezium transform to extract the record state. The configuration is complete and functional.


275-278: Conclusion effectively summarizes the tutorial benefits.

The conclusion concisely reinforces the value proposition of using Debezium for CDC replication to CrateDB without custom development.

docs/integrate/debezium/index.md (2)

29-33: Grid item correctly updated to reference internal tutorial.

The MSSQL grid item now properly references the new tutorial using link-type: ref and target debezium-tutorial, which matches the label defined on line 1 of docs/integrate/debezium/tutorial.md.


44-49: Toctree configuration is appropriate.

The hidden toctree with maxdepth: 1 provides navigation to the tutorial without cluttering the table of contents. The reference Tutorial <tutorial> correctly points to the new tutorial file.

@amotl
Copy link
Member Author

amotl commented Sep 30, 2025

@coderabbitai: Please create a dedicated ticket about your suggestion to update PostgreSQL JDBC driver to latest stable (42.7.8). Thanks!

-- docs/integrate/debezium/tutorial.md (1)

148-148: Update PostgreSQL JDBC driver to latest stable (42.7.8)
docs/integrate/debezium/tutorial.md:148 – replace “postgresql-42.5.4.jar” with “postgresql-42.7.8.jar” in the curl command.

Copy link

coderabbitai bot commented Sep 30, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create a GitHub issue for updating PostgreSQL JDBC driver version in the Debezium tutorial

gh issue create \
  --title "Update PostgreSQL JDBC driver to version 42.7.8 in Debezium tutorial" \
  --body "## Description

Update the PostgreSQL JDBC driver version in the Debezium tutorial from 42.5.4 to 42.7.8 (latest stable).

## Location

\`docs/integrate/debezium/tutorial.md:148\`

Replace:
\`\`\`
postgresql-42.5.4.jar
\`\`\`

With:
\`\`\`
postgresql-42.7.8.jar
\`\`\`

## Context

This issue was created based on a review suggestion in PR #289.

**References:**
- PR: https://github.com/crate/cratedb-guide/pull/289
- Requested by: @amotl" \
  --assignee amotl \
  --repo crate/cratedb-guide

Length of output: 49


@amotl, I've created issue #351 to track the PostgreSQL JDBC driver update to version 42.7.8. You can view it here: #351

The issue includes the specific location in the tutorial and details about the version change needed. 🐰

@amotl amotl merged commit fe1ab3a into main Sep 30, 2025
3 checks passed
@amotl amotl deleted the debezium branch September 30, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring Changing shape or layout, or moving content around.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants