Skip to content

Conversation

@amotl
Copy link
Member

@amotl amotl commented Aug 25, 2025

@coderabbitai
Copy link

coderabbitai bot commented Aug 25, 2025

Warning

Rate limit exceeded

@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 56 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 96fc965 and 8c0a2ae.

📒 Files selected for processing (3)
  • docs/ingest/etl/index.md (2 hunks)
  • docs/integrate/dlt/index.md (1 hunks)
  • docs/integrate/ingestr/index.md (1 hunks)

Walkthrough

Adds documentation pages for dlt and ingestr, inserts them into the integrations index and toctree, and updates the ETL/tools page with a new grouping header and entries; documentation-only changes, no code or API modifications.

Changes

Cohort / File(s) Summary
ETL docs update
docs/ingest/etl/index.md
Added a new grouping header "Use data pipeline programming frameworks and platforms." and inserted dlt and ingestr entries into the Dataflow / Pipeline / Code-first section and the alphabetical list.
New integration pages
docs/integrate/dlt/index.md, docs/integrate/ingestr/index.md
Added new integration pages: dlt (Python ELT) and ingestr (CLI) with descriptions, synopses, concrete usage examples, coverage/compatibility notes, learn/resources links, and visuals/badges.
Integrations index update
docs/integrate/index.md
Inserted dlt and ingestr into the integrations index/toctree alongside existing entries.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

new content, sanding-1200, guidance, cross linking

Suggested reviewers

  • karynzv
  • surister
  • hammerhead

Poem

I hop through docs with carrot cheer,
Dlt and ingestr now appear.
Pages planted, indices neat,
Pipelines hum with tiny feet.
A rabbit nod — docs fresh and clear. 🐇🥕

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dlt-ingestr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (12)
docs/ingest/etl/index.md (2)

41-45: Add dlt to the alphabetical section for consistency

You introduced a {ref}dlt entry in the Code-first card but it’s missing from the “Alphabetically sorted” list below.

Apply this diff in the alphabetical block:

- - {ref}`dbt`
+ - {ref}`dbt`
+ - {ref}`dlt`

51-55: Add ingestr to the alphabetical section and ensure proper sort order

You added {ref}ingestr in the Code-first card, but it’s not listed under “Alphabetically sorted”. It should appear between iceberg and influxdb (i-n-g-e… comes before i-n-f-l…).

Apply this diff in the alphabetical block:

- - {ref}`iceberg`
- - {ref}`influxdb`
+ - {ref}`iceberg`
+ - {ref}`ingestr`
+ - {ref}`influxdb`
docs/integrate/ingestr/index.md (5)

11-14: Tighten wording and fix pluralization

Improve flow and grammar in the intro sentence; also make it explicit that dlt is a dependency.

-[ingestr] is a command-line application that allows copying data from any
-source into any destination database. It supports CrateDB on the source
-and the destination side. ingestr uses {ref}`dlt`.
+[ingestr] is a command-line application for copying data from any source
+to any destination database. It supports CrateDB on both the source and
+destination sides. ingestr builds on {ref}`dlt`.

55-62: Grammar fix and clarify “source vs. target URL” note

Small grammar fix and a minor rephrase for clarity.

-Please note there a subtle differences in the CrateDB source vs. target URL.
+Please note there are subtle differences between the CrateDB source and target URLs.
 While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect,
 `--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL
 with a protocol schema designating CrateDB. The source adapter uses
 CrateDB's HTTP protocol, while the destination adapter uses CrateDB's
 PostgreSQL interface.

67-69: Minor grammar: add a missing conjunction

Add “and” for readability.

-ingestr supports migration from 20-plus databases, data platforms, analytics
-engines, including all [databases supported by SQLAlchemy].
+ingestr supports migration from 20-plus databases, data platforms, and analytics
+engines, including all [databases supported by SQLAlchemy].

72-80: Avoid category drift and duplication in “Databases” list

Entries like “Amazon S3”, “Google Sheets”, “Elasticsearch”, and “Apache Solr” are not databases (and S3 is duplicated under Object stores). Consider keeping “Databases” strictly DB engines and moving others to their respective sections to avoid duplication and reduce maintenance burden.

Would you like me to propose a cleaned, de-duplicated set for each rubric based on the upstream ingestr docs?


126-129: Use the defined link or remove it to appease MD053

You defined “[sources supported by ingestr]” but never reference it. Either use it once in the Coverage section or remove it.

Option A (use it):

-ingestr supports migration from 20-plus databases, data platforms, and analytics
-engines, including all [databases supported by SQLAlchemy].
+ingestr supports migration from 20-plus databases, data platforms, and analytics
+engines, including all [databases supported by SQLAlchemy]. See the full list of
+[sources supported by ingestr].

Option B (remove it):

-[sources supported by ingestr]: https://bruin-data.github.io/ingestr/supported-sources/
docs/integrate/dlt/index.md (4)

13-17: Tone down superlative and keep phrasing consistent with ETL index

Avoid absolute popularity claims and keep wording consistent with the ETL page (“popular production-ready…”).

-[dlt] (data load tool)--think ELT as Python code--is the most popular
-production-ready Python library for moving data. It loads data from
+[dlt] (data load tool)—think ELT as Python code—is a popular,
+production-ready Python library for moving data. It loads data from
 various and often messy data sources into well-structured, live datasets.
 dlt is used by {ref}`ingestr`.

23-26: Avoid “AI code editor” phrasing

“AI code editor” is oddly specific and out of scope for docs. Keep it neutral.

-  models. Simply import dlt in your favorite AI code editor, or add it to your Jupyter
+  models. Simply import dlt in your favorite code editor, or add it to your Jupyter
   Notebook.

76-87: Helpful “Learn” resources; consider a quick install note

Optional: add a short note on installing the CrateDB destination (e.g., pip install dlt-cratedb) to reduce friction.

For example, after the Synopsis header:

 ## Synopsis
+
+Prerequisites:
+- Install dlt and the CrateDB destination adapter:
+  `pip install dlt dlt-cratedb`

4-9: Make the logo clickable

Minor UX: wrap the logo in a link instead of having the image followed by a separate “[dlt]” link token.

-![dlt logo](https://cdn.sanity.io/images/nsq559ov/production/7f85e56e715b847c5519848b7198db73f793448d-82x25.svg?w=2000&auto=format){loading=lazy}[dlt]
+<a href="https://dlthub.com/" target="_blank" rel="noopener noreferrer">
+  <img src="https://cdn.sanity.io/images/nsq559ov/production/7f85e56e715b847c5519848b7198db73f793448d-82x25.svg?w=2000&auto=format" alt="dlt logo" loading="lazy">
+</a>
docs/integrate/index.md (1)

16-23: Include a Sphinx linkcheck step in your CI

I verified that there’s a docs/Makefile (but no top-level Makefile), so running make linkcheck at the repo root won’t work. To catch broken links or typos on your new pages, add a CI job that invokes the linkcheck target in the docs folder.

• File to update: your CI workflow (e.g. .github/workflows/…yml, tox.ini, or noxfile.py)
• Docs Makefile location: docs/Makefile

Suggested snippet:

#!/bin/bash
# from your project root
make -C docs linkcheck || true
# or:
cd docs
make linkcheck || true

Add this into your existing CI pipeline so that external URLs and internal refs in docs/integrate/index.md (and other new pages) are validated on every commit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ece2d47 and 318b4af.

📒 Files selected for processing (4)
  • docs/ingest/etl/index.md (1 hunks)
  • docs/integrate/dlt/index.md (1 hunks)
  • docs/integrate/index.md (2 hunks)
  • docs/integrate/ingestr/index.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-09T16:19:43.966Z
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.

Applied to files:

  • docs/integrate/ingestr/index.md
🪛 LanguageTool
docs/integrate/ingestr/index.md

[grammar] ~13-~13: Ensure spelling is correct
Context: ...on the source and the destination side. ingestr uses {ref}dlt. ::::{grid} :::{grid-...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~17-~17: There might be a mistake here.
Context: ... {ref}dlt. ::::{grid} :::{grid-item} - Single command: ingestr allows copying...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...refresh and incremental loading modes. ::: :::{grid-item} ![ingestr in a nutsh...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ental loading modes. ::: :::{grid-item} ![ingestr in a nutshell](https://github....

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...sources/demo.gif?raw=true){loading=lazy} ::: :::: ## Synopsis Invoke ingestr ...

(QB_NEW_EN)


[grammar] ~55-~55: There might be a mistake here.
Context: ...--dest-table 'doc.sample' ``` :::{note} Please note there a subtle differences i...

(QB_NEW_EN)


[grammar] ~56-~56: There might be a mistake here.
Context: ...es in the CrateDB source vs. target URL. While --source-uri=crate://... address...

(QB_NEW_EN)


[grammar] ~67-~67: There might be a mistake here.
Context: ...lus databases, data platforms, analytics engines, including all [databases suppor...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...d by SQLAlchemy]. :::{rubric} Databases ::: Actian Data Platform, Vector, Actian...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: ... SQLAlchemy]. :::{rubric} Databases ::: Actian Data Platform, Vector, Actian X, ...

(QB_NEW_EN)


[grammar] ~72-~72: There might be a mistake here.
Context: ... Ingres, Amazon Athena, Amazon Redshift, Amazon S3, Apache Drill, Apache Druid, A...

(QB_NEW_EN)


[grammar] ~74-~74: There might be a mistake here.
Context: ..., Databricks, Denodo, DuckDB, EXASOL DB, Elasticsearch, Firebird, Firebolt, Googl...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ... PostgreSQL, Rockset, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, Snowflake, SQLi...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...B, YDB, YugabyteDB. :::{rubric} Brokers ::: Amazon Kinesis, Apache Kafka (Amazon...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...DB, YugabyteDB. :::{rubric} Brokers ::: Amazon Kinesis, Apache Kafka (Amazon MSK...

(QB_NEW_EN)


[grammar] ~85-~85: There might be a mistake here.
Context: ...nda, RobustMQ) :::{rubric} File formats ::: CSV, JSONL/NDJSON, Parquet :::{rubr...

(QB_NEW_EN)


[grammar] ~86-~86: There might be a mistake here.
Context: ... RobustMQ) :::{rubric} File formats ::: CSV, JSONL/NDJSON, Parquet :::{rubric} ...

(QB_NEW_EN)


[grammar] ~89-~89: There might be a mistake here.
Context: ...JSON, Parquet :::{rubric} Object stores ::: Amazon S3, Google Cloud Storage :::...

(QB_NEW_EN)


[grammar] ~90-~90: There might be a mistake here.
Context: ..., Parquet :::{rubric} Object stores ::: Amazon S3, Google Cloud Storage :::{rub...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ...ogle Cloud Storage :::{rubric} Services ::: Airtable, Asana, GitHub, Google Ads,...

(QB_NEW_EN)


[grammar] ~94-~94: There might be a mistake here.
Context: ... Cloud Storage :::{rubric} Services ::: Airtable, Asana, GitHub, Google Ads, Goo...

(QB_NEW_EN)


[grammar] ~95-~95: There might be a mistake here.
Context: ...oogle Analytics, Google Sheets, HubSpot, Notion, Personio, Salesforce, Slack, Str...

(QB_NEW_EN)


[grammar] ~115-~115: There might be a mistake here.
Context: ...card} Examples: Use ingestr with CrateDB 🔗 https://github.com/crate/cratedb-...

(QB_NEW_EN)


[grammar] ~116-~116: There might be a mistake here.
Context: ...b-examples/tree/main/application/ingestr :link-type: url Executable code examples...

(QB_NEW_EN)


[grammar] ~117-~117: There might be a mistake here.
Context: ...main/application/ingestr :link-type: url Executable code examples / rig that demo...

(QB_NEW_EN)


[grammar] ~118-~118: There might be a mistake here.
Context: ... that demonstrates how to use ingestr to load data from Kafka to CrateDB. ::: ::...

(QB_NEW_EN)


[grammar] ~119-~119: There might be a mistake here.
Context: ...estr to load data from Kafka to CrateDB. ::: :::: [databases supported by SQL...

(QB_NEW_EN)

docs/integrate/dlt/index.md

[grammar] ~1-~1: There might be a mistake here.
Context: (dlt)= # dlt ```{div} .float-right .text-right !...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ... ELT as Python code--is the most popular production-ready Python library for movi...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...f}ingestr. ::::{grid} :::{grid-item} - Just code: no need to use any backends...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...luding APIs, files, databases, and more. ::: :::: ## Synopsis Load data from ...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/integrate/ingestr/index.md

104-104: Bare URL used

(MD034, no-bare-urls)


110-110: Bare URL used

(MD034, no-bare-urls)


116-116: Bare URL used

(MD034, no-bare-urls)


128-128: Link and image reference definitions should be needed
Unused link or image reference definition: "sources supported by ingestr"

(MD053, link-image-reference-definitions)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (6)
docs/ingest/etl/index.md (1)

75-77: Nice touch with the card footers

Using +++ to add short category guidance improves scanability without cluttering the cards.

docs/integrate/ingestr/index.md (2)

115-121: Great cross-link to runnable examples

The examples card pointing to cratedb-examples is very helpful for users to get started quickly.


51-53: Remove the postgresql:// fallback—cratedb:// is correct for ingestr

According to the official ingestr documentation, when using the ingestr ingest CLI to target CrateDB over the PostgreSQL wire protocol, you must use the cratedb:// URL scheme rather than postgresql:// (supported in ingestr v0.13.61 and later). (cratedb-toolkit.readthedocs.io)

No changes needed in docs/integrate/ingestr/index.md—the existing

--dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \

is valid and should remain as-is.

Likely an incorrect or invalid review comment.

docs/integrate/dlt/index.md (2)

36-53: Good, minimal working example

The filesystem-to-CrateDB snippet is clear and uses the cratedb destination consistently with the toctree page. Nice.


63-70: LGTM: SQL database → CrateDB example

Concise and mirrors the first example; consistent DSN and dataset usage.

docs/integrate/index.md (1)

29-41: LGTM: Added dlt and ingestr to the toctree

Entries are alphabetically placed and align with the new integration pages and anchors ((dlt)=, (ingestr)=).

@amotl amotl marked this pull request as ready for review August 25, 2025 00:14
- {ref}`dlt`

dlt is a popular production-ready Python library for moving data:
Think ELT as Python code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Software-defined ELT 😄

[dlt] (data load tool)--think ELT as Python code--is the most popular
production-ready Python library for moving data. It loads data from
various and often messy data sources into well-structured, live datasets.
dlt is used by {ref}`ingestr`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the wording any time. For the first version, I usually copy the upstream slogan verbatim. Sometimes it is good, sometimes not. Feel free to submit any suggestions and patches how to do it differently.

Or are you specifically referring to how we cross-link to the page about ingestr?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (10)
docs/integrate/ingestr/index.md (6)

46-53: CSV → CrateDB example is accurate; consider noting the required ingestr version.

The csv:// source and cratedb:// destination forms are correct and mirror upstream docs. To reduce “works on my machine” issues, add a short note that this requires ingestr ≥ 0.13.61.

Apply this diff to append a version hint right after the example:

  ingestr ingest \
    --source-uri 'csv://input.csv' \
    --source-table 'sample' \
    --dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \
    --dest-table 'doc.sample'

+:::{note}
+The CrateDB destination URL scheme (cratedb://) is supported starting with ingestr version 0.13.61 or newer.
+:::


Reference: ingestr’s CrateDB page and toolkit notes. ([bruin-data.github.io](https://bruin-data.github.io/ingestr/supported-sources/cratedb.html), [cratedb-toolkit.readthedocs.io](https://cratedb-toolkit.readthedocs.io/io/ingestr/index.html?utm_source=chatgpt.com))

---

`55-62`: **Terminology: say “URI scheme” (not “protocol schema”) and link to the official doc.**

Minor wording tweak for precision; also add a link to the upstream CrateDB source/destination page for readers to drill into options. 

```diff
-While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect,
-`--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL
-with a protocol schema designating CrateDB. The source adapter uses
+While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect,
+`--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL
+with a distinct URI scheme designating CrateDB. The source adapter uses
 CrateDB's HTTP protocol, while the destination adapter uses CrateDB's
 PostgreSQL interface.
+
+See: the ingestr CrateDB source/destination reference.

You can add a footnote or inline link target to the reference. (bruin-data.github.io)


70-76: Brand/style nits in Coverage: fix a few proper names.

These are minor, but polishing brand capitalization avoids distractions.

-CockroachDB, CrateDB, Firebird, HyperSQL (hsqldb), IBM DB2 and Informix, 
+CockroachDB, CrateDB, Firebird, HyperSQL (HSQLDB), IBM Db2 and Informix,
 Microsoft Access, Microsoft SQL Server, MonetDB, MySQL and MariaDB, 
-OpenGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, 
+openGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, 
 SQLite, TiDB, YDB, YugabyteDB
@@
-Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB, 
-EXASOL DB, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server, 
+Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB, 
+Exasol, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server,
 Impala, Kinetica, Rockset, Snowflake, Teradata Vantage
@@
-Apache Drill, Apache Druid, Apache Hive and Presto, Clickhouse, Elasticsearch, 
+Apache Drill, Apache Druid, Apache Hive and Presto, ClickHouse, Elasticsearch,
 InfluxDB, MongoDB, OpenSearch

Also applies to: 79-86


88-91: Don’t overclaim Kafka distributions unless explicitly supported.

Upstream lists Kafka generically; it doesn’t enumerate MSK/Confluent/Redpanda in the ingestr docs. To avoid implying vendor-specific support nuances, keep this category to “Apache Kafka” only.

-:::{rubric} Message Brokers
-:::
-Amazon Kinesis, Apache Kafka (Amazon MSK, Confluent Kafka, Redpanda, RobustMQ)
+:::{rubric} Message Brokers
+:::
+Amazon Kinesis, Apache Kafka

Reference: Kafka support page and overall supported sources index. (bruin-data.github.io)


100-104: Avoid “etc.” in lists; point to the canonical sources list instead.

“etc.” ages poorly and weakens clarity. Suggest ending with “and more” plus a link to the Supported Sources list.

-Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot,
-Notion, Personio, Salesforce, Slack, Stripe, Zendesk, etc.
+Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot,
+Notion, Personio, Salesforce, Slack, Stripe, Zendesk, and more
+([see the full list]).

Add a link definition at the bottom:

+[see the full list]: https://bruin-data.github.io/ingestr/

Upstream landing page lists all categories. (bruin-data.github.io)


110-120: markdownlint MD034: bare URLs inside grid-item-card.

If the CI enforces MD034, two options: (1) convert to reference-style links in normal body copy, or (2) suppress MD034 for these directive blocks. MyST’s “:link:” fields often trigger MD034 even though they’re not markdown paragraphs.

If you opt for (1), replace the grid cards with a short bulleted list and use reference links:

-:::{grid-item-card} Documentation: ingestr CrateDB source
-:link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source
-:link-type: url
-Documentation about the CrateDB source adapter for ingestr.
-:::
+* Documentation: ingestr CrateDB source — see [ingestr CrateDB source].

Add definitions near the bottom:

+[ingestr CrateDB source]: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source
+[ingestr CrateDB destination]: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#destination
+[Examples: Use ingestr with CrateDB]: https://github.com/crate/cratedb-examples/tree/main/application/ingestr

If you prefer to keep the cards, consider disabling MD034 for these lines in the linter config.

Also applies to: 122-129

docs/integrate/dlt/index.md (4)

52-56: Avoid assuming a local CrateDB password in DSNs.

The built-in “crate” user on localhost typically has no password by default. Using an explicit “crate:crate@” can mislead users. Recommend removing the password (or adding a note that credentials depend on cluster auth settings).

-    destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"),
+    destination=dlt.destinations.cratedb("postgresql://crate@localhost:5432/"),
@@
-    destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"),
+    destination=dlt.destinations.cratedb("postgresql://crate@localhost:5432/"),

Reference: CrateDB connection-string examples. (cratedb.com)

Also applies to: 69-73


36-41: Install step is fine; consider adding a one-liner about the adapter’s status.

Optional: add “The dlt-cratedb adapter is currently shipped as a separate package until it’s upstreamed into dlt” to set expectations for readers. Link to PyPI. (pypi.org)


78-87: markdownlint MD034 on bare URLs inside grid-item-card.

Same consideration as the ingestr page: either rework these into reference links in plain lists or suppress MD034 for directive fields if your pipeline flags them.

Also applies to: 88-94, 95-100


105-107: Remove unused link definition to satisfy MD053.

The link reference “[databases supported by SQLAlchemy]” is defined but not used in this page.

-[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/
 [dlt]: https://dlthub.com/
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3466c5d and 96fc965.

📒 Files selected for processing (4)
  • docs/ingest/etl/index.md (2 hunks)
  • docs/integrate/dlt/index.md (1 hunks)
  • docs/integrate/index.md (2 hunks)
  • docs/integrate/ingestr/index.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/integrate/index.md
🧰 Additional context used
🪛 LanguageTool
docs/integrate/dlt/index.md

[grammar] ~1-~1: There might be a mistake here.
Context: (dlt)= # dlt ```{div} .float-right .text-right !...

(QB_NEW_EN)


[grammar] ~13-~13: There might be a mistake here.
Context: ...)—think ELT as Python code—is a popular, production-ready Python library for movi...

(QB_NEW_EN)


[grammar] ~20-~20: There might be a mistake here.
Context: ...f}ingestr. ::::{grid} :::{grid-item} - Just code: no need to use any backends...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...luding APIs, files, databases, and more. ::: :::: ## Synopsis Prerequisites: ...

(QB_NEW_EN)


[grammar] ~36-~36: There might be a mistake here.
Context: ... ::: :::: ## Synopsis Prerequisites: Install dlt and the CrateDB destination ...

(QB_NEW_EN)

docs/ingest/etl/index.md

[grammar] ~242-~242: There might be a mistake here.
Context: ...f}azure-functions - {ref}dbt - {ref}dlt - {ref}dms - {ref}dynamodb - {ref}`est...

(QB_NEW_EN)


[grammar] ~243-~243: There might be a mistake here.
Context: ...tions - {ref}dbt - {ref}dlt - {ref}dms - {ref}dynamodb - {ref}estuary` - {ref}...

(QB_NEW_EN)


[grammar] ~244-~244: There might be a mistake here.
Context: ...}dbt - {ref}dlt - {ref}dms - {ref}dynamodb - {ref}estuary - {ref}flink - {ref}`ho...

(QB_NEW_EN)


[grammar] ~245-~245: There might be a mistake here.
Context: ... - {ref}dms - {ref}dynamodb - {ref}estuary - {ref}flink - {ref}hop - {ref}iceber...

(QB_NEW_EN)


[grammar] ~246-~246: There might be a mistake here.
Context: ...{ref}dynamodb - {ref}estuary - {ref}flink - {ref}hop - {ref}iceberg - {ref}`infl...

(QB_NEW_EN)


[grammar] ~247-~247: There might be a mistake here.
Context: ... - {ref}estuary - {ref}flink - {ref}hop - {ref}iceberg - {ref}influxdb - {ref}...

(QB_NEW_EN)


[grammar] ~248-~248: There might be a mistake here.
Context: ...ary - {ref}flink - {ref}hop - {ref}iceberg - {ref}influxdb - {ref}ingestr` - {ref}...

(QB_NEW_EN)


[grammar] ~249-~249: There might be a mistake here.
Context: ...k - {ref}hop - {ref}iceberg - {ref}influxdb - {ref}ingestr - {ref}kafka - {ref}ke...

(QB_NEW_EN)


[grammar] ~250-~250: There might be a mistake here.
Context: ...{ref}iceberg - {ref}influxdb - {ref}ingestr - {ref}kafka - {ref}kestra - {ref}`kin...

(QB_NEW_EN)

docs/integrate/ingestr/index.md

[grammar] ~11-~11: There might be a mistake here.
Context: ...ication for copying data from any source to any destination database. It supports...

(QB_NEW_EN)


[grammar] ~17-~17: There might be a mistake here.
Context: ... {ref}dlt. ::::{grid} :::{grid-item} - Single command: ingestr allows copying...

(QB_NEW_EN)


[grammar] ~25-~25: There might be a mistake here.
Context: ...refresh and incremental loading modes. ::: :::{grid-item} ![ingestr in a nutsh...

(QB_NEW_EN)


[grammar] ~28-~28: There might be a mistake here.
Context: ...ental loading modes. ::: :::{grid-item} ![ingestr in a nutshell](https://github....

(QB_NEW_EN)


[grammar] ~29-~29: There might be a mistake here.
Context: ...sources/demo.gif?raw=true){loading=lazy} ::: :::: ## Synopsis Invoke ingestr ...

(QB_NEW_EN)


[grammar] ~67-~67: There might be a mistake here.
Context: ...databases, data platforms, and analytics engines, including all [databases suppor...

(QB_NEW_EN)


[grammar] ~70-~70: There might be a mistake here.
Context: ...emy]. :::{rubric} Traditional Databases ::: CockroachDB, CrateDB, Firebird, Hype...

(QB_NEW_EN)


[grammar] ~71-~71: There might be a mistake here.
Context: .... :::{rubric} Traditional Databases ::: CockroachDB, CrateDB, Firebird, HyperSQL...

(QB_NEW_EN)


[grammar] ~77-~77: There might be a mistake here.
Context: ...ubric} Cloud Data Warehouses & Analytics ::: Amazon Athena, Amazon Redshift, Data...

(QB_NEW_EN)


[grammar] ~78-~78: There might be a mistake here.
Context: ...c} Cloud Data Warehouses & Analytics ::: Amazon Athena, Amazon Redshift, Databend...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...age :::{rubric} Specialized Data Stores ::: Apache Drill, Apache Druid, Apache H...

(QB_NEW_EN)


[grammar] ~84-~84: There might be a mistake here.
Context: ... :::{rubric} Specialized Data Stores ::: Apache Drill, Apache Druid, Apache Hive ...

(QB_NEW_EN)


[grammar] ~88-~88: There might be a mistake here.
Context: ... OpenSearch :::{rubric} Message Brokers ::: Amazon Kinesis, Apache Kafka (Amazon...

(QB_NEW_EN)


[grammar] ~89-~89: There might be a mistake here.
Context: ...nSearch :::{rubric} Message Brokers ::: Amazon Kinesis, Apache Kafka (Amazon MSK...

(QB_NEW_EN)


[grammar] ~92-~92: There might be a mistake here.
Context: ...nda, RobustMQ) :::{rubric} File Formats ::: CSV, JSONL/NDJSON, Parquet :::{rubr...

(QB_NEW_EN)


[grammar] ~93-~93: There might be a mistake here.
Context: ... RobustMQ) :::{rubric} File Formats ::: CSV, JSONL/NDJSON, Parquet :::{rubric} ...

(QB_NEW_EN)


[grammar] ~96-~96: There might be a mistake here.
Context: ...JSON, Parquet :::{rubric} Object Stores ::: Amazon S3, Google Cloud Storage :::...

(QB_NEW_EN)


[grammar] ~97-~97: There might be a mistake here.
Context: ..., Parquet :::{rubric} Object Stores ::: Amazon S3, Google Cloud Storage :::{rub...

(QB_NEW_EN)


[grammar] ~100-~100: There might be a mistake here.
Context: ...e :::{rubric} SaaS Platforms & Services ::: Airtable, Asana, GitHub, Google Ads,...

(QB_NEW_EN)


[grammar] ~101-~101: There might be a mistake here.
Context: ...::{rubric} SaaS Platforms & Services ::: Airtable, Asana, GitHub, Google Ads, Goo...

(QB_NEW_EN)


[grammar] ~102-~102: There might be a mistake here.
Context: ...oogle Analytics, Google Sheets, HubSpot, Notion, Personio, Salesforce, Slack, Str...

(QB_NEW_EN)


[grammar] ~122-~122: There might be a mistake here.
Context: ...card} Examples: Use ingestr with CrateDB 🔗 https://github.com/crate/cratedb-...

(QB_NEW_EN)


[grammar] ~123-~123: There might be a mistake here.
Context: ...b-examples/tree/main/application/ingestr :link-type: url Executable code examples...

(QB_NEW_EN)


[grammar] ~124-~124: There might be a mistake here.
Context: ...main/application/ingestr :link-type: url Executable code examples / rig that demo...

(QB_NEW_EN)


[grammar] ~125-~125: There might be a mistake here.
Context: ... that demonstrates how to use ingestr to load data from Kafka to CrateDB. ::: ::...

(QB_NEW_EN)


[grammar] ~126-~126: There might be a mistake here.
Context: ...estr to load data from Kafka to CrateDB. ::: :::: [databases supported by SQL...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/integrate/dlt/index.md

83-83: Bare URL used

(MD034, no-bare-urls)


89-89: Bare URL used

(MD034, no-bare-urls)


105-105: Link and image reference definitions should be needed
Unused link or image reference definition: "databases supported by sqlalchemy"

(MD053, link-image-reference-definitions)

docs/integrate/ingestr/index.md

111-111: Bare URL used

(MD034, no-bare-urls)


117-117: Bare URL used

(MD034, no-bare-urls)


123-123: Bare URL used

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build docs
🔇 Additional comments (5)
docs/integrate/ingestr/index.md (1)

37-44: CrateDB → DuckDB example matches upstream syntax (LGTM).

Flags and URI formats align with ingestr’s documented CrateDB source and DuckDB destination usage. Good choice of a read-only sys table. (bruin-data.github.io)

docs/integrate/dlt/index.md (1)

13-17: Statement “dlt is used by ingestr” is correct (LGTM).

ingestr explicitly states it uses dlt and SQLAlchemy internally. Keeping this cross-link is helpful context.

Source: ingestr Core Concepts. (bruin-data.github.io)

docs/ingest/etl/index.md (3)

41-45: New dlt entry reads clearly and fits the section (LGTM).

Concise and aligned with the rest of the descriptions.


51-55: ingestr entry is accurate and consistent with its integration page (LGTM).

Wording matches the new integration doc and avoids promising specific vendors.


237-265: Alphabetical additions are placed correctly (LGTM).

“dlt” after “dbt” and “ingestr” after “influxdb” is correct; anchors resolve to the new pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants