-
Notifications
You must be signed in to change notification settings - Fork 2
Integrate: Add sections about dlt and ingestr #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 56 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughAdds documentation pages for dlt and ingestr, inserts them into the integrations index and toctree, and updates the ETL/tools page with a new grouping header and entries; documentation-only changes, no code or API modifications. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (12)
docs/ingest/etl/index.md (2)
41-45: Add dlt to the alphabetical section for consistencyYou introduced a {ref}
dltentry in the Code-first card but it’s missing from the “Alphabetically sorted” list below.Apply this diff in the alphabetical block:
- - {ref}`dbt` + - {ref}`dbt` + - {ref}`dlt`
51-55: Add ingestr to the alphabetical section and ensure proper sort orderYou added {ref}
ingestrin the Code-first card, but it’s not listed under “Alphabetically sorted”. It should appear betweenicebergandinfluxdb(i-n-g-e… comes before i-n-f-l…).Apply this diff in the alphabetical block:
- - {ref}`iceberg` - - {ref}`influxdb` + - {ref}`iceberg` + - {ref}`ingestr` + - {ref}`influxdb`docs/integrate/ingestr/index.md (5)
11-14: Tighten wording and fix pluralizationImprove flow and grammar in the intro sentence; also make it explicit that dlt is a dependency.
-[ingestr] is a command-line application that allows copying data from any -source into any destination database. It supports CrateDB on the source -and the destination side. ingestr uses {ref}`dlt`. +[ingestr] is a command-line application for copying data from any source +to any destination database. It supports CrateDB on both the source and +destination sides. ingestr builds on {ref}`dlt`.
55-62: Grammar fix and clarify “source vs. target URL” noteSmall grammar fix and a minor rephrase for clarity.
-Please note there a subtle differences in the CrateDB source vs. target URL. +Please note there are subtle differences between the CrateDB source and target URLs. While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect, `--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL with a protocol schema designating CrateDB. The source adapter uses CrateDB's HTTP protocol, while the destination adapter uses CrateDB's PostgreSQL interface.
67-69: Minor grammar: add a missing conjunctionAdd “and” for readability.
-ingestr supports migration from 20-plus databases, data platforms, analytics -engines, including all [databases supported by SQLAlchemy]. +ingestr supports migration from 20-plus databases, data platforms, and analytics +engines, including all [databases supported by SQLAlchemy].
72-80: Avoid category drift and duplication in “Databases” listEntries like “Amazon S3”, “Google Sheets”, “Elasticsearch”, and “Apache Solr” are not databases (and S3 is duplicated under Object stores). Consider keeping “Databases” strictly DB engines and moving others to their respective sections to avoid duplication and reduce maintenance burden.
Would you like me to propose a cleaned, de-duplicated set for each rubric based on the upstream ingestr docs?
126-129: Use the defined link or remove it to appease MD053You defined “[sources supported by ingestr]” but never reference it. Either use it once in the Coverage section or remove it.
Option A (use it):
-ingestr supports migration from 20-plus databases, data platforms, and analytics -engines, including all [databases supported by SQLAlchemy]. +ingestr supports migration from 20-plus databases, data platforms, and analytics +engines, including all [databases supported by SQLAlchemy]. See the full list of +[sources supported by ingestr].Option B (remove it):
-[sources supported by ingestr]: https://bruin-data.github.io/ingestr/supported-sources/docs/integrate/dlt/index.md (4)
13-17: Tone down superlative and keep phrasing consistent with ETL indexAvoid absolute popularity claims and keep wording consistent with the ETL page (“popular production-ready…”).
-[dlt] (data load tool)--think ELT as Python code--is the most popular -production-ready Python library for moving data. It loads data from +[dlt] (data load tool)—think ELT as Python code—is a popular, +production-ready Python library for moving data. It loads data from various and often messy data sources into well-structured, live datasets. dlt is used by {ref}`ingestr`.
23-26: Avoid “AI code editor” phrasing“AI code editor” is oddly specific and out of scope for docs. Keep it neutral.
- models. Simply import dlt in your favorite AI code editor, or add it to your Jupyter + models. Simply import dlt in your favorite code editor, or add it to your Jupyter Notebook.
76-87: Helpful “Learn” resources; consider a quick install noteOptional: add a short note on installing the CrateDB destination (e.g.,
pip install dlt-cratedb) to reduce friction.For example, after the Synopsis header:
## Synopsis + +Prerequisites: +- Install dlt and the CrateDB destination adapter: + `pip install dlt dlt-cratedb`
4-9: Make the logo clickableMinor UX: wrap the logo in a link instead of having the image followed by a separate “[dlt]” link token.
-{loading=lazy}[dlt] +<a href="https://dlthub.com/" target="_blank" rel="noopener noreferrer"> + <img src="https://cdn.sanity.io/images/nsq559ov/production/7f85e56e715b847c5519848b7198db73f793448d-82x25.svg?w=2000&auto=format" alt="dlt logo" loading="lazy"> +</a>docs/integrate/index.md (1)
16-23: Include a Sphinx linkcheck step in your CII verified that there’s a
docs/Makefile(but no top-level Makefile), so runningmake linkcheckat the repo root won’t work. To catch broken links or typos on your new pages, add a CI job that invokes the linkcheck target in thedocsfolder.• File to update: your CI workflow (e.g.
.github/workflows/…yml,tox.ini, ornoxfile.py)
• Docs Makefile location:docs/MakefileSuggested snippet:
#!/bin/bash # from your project root make -C docs linkcheck || true # or: cd docs make linkcheck || trueAdd this into your existing CI pipeline so that external URLs and internal refs in
docs/integrate/index.md(and other new pages) are validated on every commit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
docs/ingest/etl/index.md(1 hunks)docs/integrate/dlt/index.md(1 hunks)docs/integrate/index.md(2 hunks)docs/integrate/ingestr/index.md(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-09T16:19:43.966Z
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.
Applied to files:
docs/integrate/ingestr/index.md
🪛 LanguageTool
docs/integrate/ingestr/index.md
[grammar] ~13-~13: Ensure spelling is correct
Context: ...on the source and the destination side. ingestr uses {ref}dlt. ::::{grid} :::{grid-...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~17-~17: There might be a mistake here.
Context: ... {ref}dlt. ::::{grid} :::{grid-item} - Single command: ingestr allows copying...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...refresh and incremental loading modes. ::: :::{grid-item} ![ingestr in a nutsh...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ental loading modes. ::: :::{grid-item} 
[grammar] ~29-~29: There might be a mistake here.
Context: ...sources/demo.gif?raw=true){loading=lazy} ::: :::: ## Synopsis Invoke ingestr ...
(QB_NEW_EN)
[grammar] ~55-~55: There might be a mistake here.
Context: ...--dest-table 'doc.sample' ``` :::{note} Please note there a subtle differences i...
(QB_NEW_EN)
[grammar] ~56-~56: There might be a mistake here.
Context: ...es in the CrateDB source vs. target URL. While --source-uri=crate://... address...
(QB_NEW_EN)
[grammar] ~67-~67: There might be a mistake here.
Context: ...lus databases, data platforms, analytics engines, including all [databases suppor...
(QB_NEW_EN)
[grammar] ~70-~70: There might be a mistake here.
Context: ...d by SQLAlchemy]. :::{rubric} Databases ::: Actian Data Platform, Vector, Actian...
(QB_NEW_EN)
[grammar] ~71-~71: There might be a mistake here.
Context: ... SQLAlchemy]. :::{rubric} Databases ::: Actian Data Platform, Vector, Actian X, ...
(QB_NEW_EN)
[grammar] ~72-~72: There might be a mistake here.
Context: ... Ingres, Amazon Athena, Amazon Redshift, Amazon S3, Apache Drill, Apache Druid, A...
(QB_NEW_EN)
[grammar] ~74-~74: There might be a mistake here.
Context: ..., Databricks, Denodo, DuckDB, EXASOL DB, Elasticsearch, Firebird, Firebolt, Googl...
(QB_NEW_EN)
[grammar] ~78-~78: There might be a mistake here.
Context: ... PostgreSQL, Rockset, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, Snowflake, SQLi...
(QB_NEW_EN)
[grammar] ~81-~81: There might be a mistake here.
Context: ...B, YDB, YugabyteDB. :::{rubric} Brokers ::: Amazon Kinesis, Apache Kafka (Amazon...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...DB, YugabyteDB. :::{rubric} Brokers ::: Amazon Kinesis, Apache Kafka (Amazon MSK...
(QB_NEW_EN)
[grammar] ~85-~85: There might be a mistake here.
Context: ...nda, RobustMQ) :::{rubric} File formats ::: CSV, JSONL/NDJSON, Parquet :::{rubr...
(QB_NEW_EN)
[grammar] ~86-~86: There might be a mistake here.
Context: ... RobustMQ) :::{rubric} File formats ::: CSV, JSONL/NDJSON, Parquet :::{rubric} ...
(QB_NEW_EN)
[grammar] ~89-~89: There might be a mistake here.
Context: ...JSON, Parquet :::{rubric} Object stores ::: Amazon S3, Google Cloud Storage :::...
(QB_NEW_EN)
[grammar] ~90-~90: There might be a mistake here.
Context: ..., Parquet :::{rubric} Object stores ::: Amazon S3, Google Cloud Storage :::{rub...
(QB_NEW_EN)
[grammar] ~93-~93: There might be a mistake here.
Context: ...ogle Cloud Storage :::{rubric} Services ::: Airtable, Asana, GitHub, Google Ads,...
(QB_NEW_EN)
[grammar] ~94-~94: There might be a mistake here.
Context: ... Cloud Storage :::{rubric} Services ::: Airtable, Asana, GitHub, Google Ads, Goo...
(QB_NEW_EN)
[grammar] ~95-~95: There might be a mistake here.
Context: ...oogle Analytics, Google Sheets, HubSpot, Notion, Personio, Salesforce, Slack, Str...
(QB_NEW_EN)
[grammar] ~115-~115: There might be a mistake here.
Context: ...card} Examples: Use ingestr with CrateDB 🔗 https://github.com/crate/cratedb-...
(QB_NEW_EN)
[grammar] ~116-~116: There might be a mistake here.
Context: ...b-examples/tree/main/application/ingestr :link-type: url Executable code examples...
(QB_NEW_EN)
[grammar] ~117-~117: There might be a mistake here.
Context: ...main/application/ingestr :link-type: url Executable code examples / rig that demo...
(QB_NEW_EN)
[grammar] ~118-~118: There might be a mistake here.
Context: ... that demonstrates how to use ingestr to load data from Kafka to CrateDB. ::: ::...
(QB_NEW_EN)
[grammar] ~119-~119: There might be a mistake here.
Context: ...estr to load data from Kafka to CrateDB. ::: :::: [databases supported by SQL...
(QB_NEW_EN)
docs/integrate/dlt/index.md
[grammar] ~1-~1: There might be a mistake here.
Context: (dlt)= # dlt ```{div} .float-right .text-right !...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ... ELT as Python code--is the most popular production-ready Python library for movi...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...f}ingestr. ::::{grid} :::{grid-item} - Just code: no need to use any backends...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...luding APIs, files, databases, and more. ::: :::: ## Synopsis Load data from ...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/integrate/ingestr/index.md
104-104: Bare URL used
(MD034, no-bare-urls)
110-110: Bare URL used
(MD034, no-bare-urls)
116-116: Bare URL used
(MD034, no-bare-urls)
128-128: Link and image reference definitions should be needed
Unused link or image reference definition: "sources supported by ingestr"
(MD053, link-image-reference-definitions)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (6)
docs/ingest/etl/index.md (1)
75-77: Nice touch with the card footersUsing
+++to add short category guidance improves scanability without cluttering the cards.docs/integrate/ingestr/index.md (2)
115-121: Great cross-link to runnable examplesThe examples card pointing to cratedb-examples is very helpful for users to get started quickly.
51-53: Remove thepostgresql://fallback—cratedb://is correct for ingestrAccording to the official ingestr documentation, when using the
ingestr ingestCLI to target CrateDB over the PostgreSQL wire protocol, you must use thecratedb://URL scheme rather thanpostgresql://(supported in ingestr v0.13.61 and later). (cratedb-toolkit.readthedocs.io)No changes needed in docs/integrate/ingestr/index.md—the existing
--dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \is valid and should remain as-is.
Likely an incorrect or invalid review comment.
docs/integrate/dlt/index.md (2)
36-53: Good, minimal working exampleThe filesystem-to-CrateDB snippet is clear and uses the cratedb destination consistently with the toctree page. Nice.
63-70: LGTM: SQL database → CrateDB exampleConcise and mirrors the first example; consistent DSN and dataset usage.
docs/integrate/index.md (1)
29-41: LGTM: Added dlt and ingestr to the toctreeEntries are alphabetically placed and align with the new integration pages and anchors (
(dlt)=,(ingestr)=).
| - {ref}`dlt` | ||
|
|
||
| dlt is a popular production-ready Python library for moving data: | ||
| Think ELT as Python code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Software-defined ELT 😄
| [dlt] (data load tool)--think ELT as Python code--is the most popular | ||
| production-ready Python library for moving data. It loads data from | ||
| various and often messy data sources into well-structured, live datasets. | ||
| dlt is used by {ref}`ingestr`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change the wording any time. For the first version, I usually copy the upstream slogan verbatim. Sometimes it is good, sometimes not. Feel free to submit any suggestions and patches how to do it differently.
Or are you specifically referring to how we cross-link to the page about ingestr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
docs/integrate/ingestr/index.md (6)
46-53: CSV → CrateDB example is accurate; consider noting the required ingestr version.The csv:// source and cratedb:// destination forms are correct and mirror upstream docs. To reduce “works on my machine” issues, add a short note that this requires ingestr ≥ 0.13.61.
Apply this diff to append a version hint right after the example:
ingestr ingest \ --source-uri 'csv://input.csv' \ --source-table 'sample' \ --dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \ --dest-table 'doc.sample'
+:::{note}
+The CrateDB destination URL scheme (cratedb://) is supported starting with ingestr version 0.13.61 or newer.
+:::Reference: ingestr’s CrateDB page and toolkit notes. ([bruin-data.github.io](https://bruin-data.github.io/ingestr/supported-sources/cratedb.html), [cratedb-toolkit.readthedocs.io](https://cratedb-toolkit.readthedocs.io/io/ingestr/index.html?utm_source=chatgpt.com)) --- `55-62`: **Terminology: say “URI scheme” (not “protocol schema”) and link to the official doc.** Minor wording tweak for precision; also add a link to the upstream CrateDB source/destination page for readers to drill into options. ```diff -While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect, -`--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL -with a protocol schema designating CrateDB. The source adapter uses +While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect, +`--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL +with a distinct URI scheme designating CrateDB. The source adapter uses CrateDB's HTTP protocol, while the destination adapter uses CrateDB's PostgreSQL interface. + +See: the ingestr CrateDB source/destination reference.You can add a footnote or inline link target to the reference. (bruin-data.github.io)
70-76: Brand/style nits in Coverage: fix a few proper names.These are minor, but polishing brand capitalization avoids distractions.
-CockroachDB, CrateDB, Firebird, HyperSQL (hsqldb), IBM DB2 and Informix, +CockroachDB, CrateDB, Firebird, HyperSQL (HSQLDB), IBM Db2 and Informix, Microsoft Access, Microsoft SQL Server, MonetDB, MySQL and MariaDB, -OpenGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, +openGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, SQLite, TiDB, YDB, YugabyteDB @@ -Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB, -EXASOL DB, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server, +Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB, +Exasol, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server, Impala, Kinetica, Rockset, Snowflake, Teradata Vantage @@ -Apache Drill, Apache Druid, Apache Hive and Presto, Clickhouse, Elasticsearch, +Apache Drill, Apache Druid, Apache Hive and Presto, ClickHouse, Elasticsearch, InfluxDB, MongoDB, OpenSearchAlso applies to: 79-86
88-91: Don’t overclaim Kafka distributions unless explicitly supported.Upstream lists Kafka generically; it doesn’t enumerate MSK/Confluent/Redpanda in the ingestr docs. To avoid implying vendor-specific support nuances, keep this category to “Apache Kafka” only.
-:::{rubric} Message Brokers -::: -Amazon Kinesis, Apache Kafka (Amazon MSK, Confluent Kafka, Redpanda, RobustMQ) +:::{rubric} Message Brokers +::: +Amazon Kinesis, Apache KafkaReference: Kafka support page and overall supported sources index. (bruin-data.github.io)
100-104: Avoid “etc.” in lists; point to the canonical sources list instead.“etc.” ages poorly and weakens clarity. Suggest ending with “and more” plus a link to the Supported Sources list.
-Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot, -Notion, Personio, Salesforce, Slack, Stripe, Zendesk, etc. +Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot, +Notion, Personio, Salesforce, Slack, Stripe, Zendesk, and more +([see the full list]).Add a link definition at the bottom:
+[see the full list]: https://bruin-data.github.io/ingestr/Upstream landing page lists all categories. (bruin-data.github.io)
110-120: markdownlint MD034: bare URLs inside grid-item-card.If the CI enforces MD034, two options: (1) convert to reference-style links in normal body copy, or (2) suppress MD034 for these directive blocks. MyST’s “:link:” fields often trigger MD034 even though they’re not markdown paragraphs.
If you opt for (1), replace the grid cards with a short bulleted list and use reference links:
-:::{grid-item-card} Documentation: ingestr CrateDB source -:link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source -:link-type: url -Documentation about the CrateDB source adapter for ingestr. -::: +* Documentation: ingestr CrateDB source — see [ingestr CrateDB source].Add definitions near the bottom:
+[ingestr CrateDB source]: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source +[ingestr CrateDB destination]: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#destination +[Examples: Use ingestr with CrateDB]: https://github.com/crate/cratedb-examples/tree/main/application/ingestrIf you prefer to keep the cards, consider disabling MD034 for these lines in the linter config.
Also applies to: 122-129
docs/integrate/dlt/index.md (4)
52-56: Avoid assuming a local CrateDB password in DSNs.The built-in “crate” user on localhost typically has no password by default. Using an explicit “crate:crate@” can mislead users. Recommend removing the password (or adding a note that credentials depend on cluster auth settings).
- destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"), + destination=dlt.destinations.cratedb("postgresql://crate@localhost:5432/"), @@ - destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"), + destination=dlt.destinations.cratedb("postgresql://crate@localhost:5432/"),Reference: CrateDB connection-string examples. (cratedb.com)
Also applies to: 69-73
36-41: Install step is fine; consider adding a one-liner about the adapter’s status.Optional: add “The dlt-cratedb adapter is currently shipped as a separate package until it’s upstreamed into dlt” to set expectations for readers. Link to PyPI. (pypi.org)
78-87: markdownlint MD034 on bare URLs inside grid-item-card.Same consideration as the ingestr page: either rework these into reference links in plain lists or suppress MD034 for directive fields if your pipeline flags them.
Also applies to: 88-94, 95-100
105-107: Remove unused link definition to satisfy MD053.The link reference “[databases supported by SQLAlchemy]” is defined but not used in this page.
-[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/ [dlt]: https://dlthub.com/
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
docs/ingest/etl/index.md(2 hunks)docs/integrate/dlt/index.md(1 hunks)docs/integrate/index.md(2 hunks)docs/integrate/ingestr/index.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/integrate/index.md
🧰 Additional context used
🪛 LanguageTool
docs/integrate/dlt/index.md
[grammar] ~1-~1: There might be a mistake here.
Context: (dlt)= # dlt ```{div} .float-right .text-right !...
(QB_NEW_EN)
[grammar] ~13-~13: There might be a mistake here.
Context: ...)—think ELT as Python code—is a popular, production-ready Python library for movi...
(QB_NEW_EN)
[grammar] ~20-~20: There might be a mistake here.
Context: ...f}ingestr. ::::{grid} :::{grid-item} - Just code: no need to use any backends...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...luding APIs, files, databases, and more. ::: :::: ## Synopsis Prerequisites: ...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ... ::: :::: ## Synopsis Prerequisites: Install dlt and the CrateDB destination ...
(QB_NEW_EN)
docs/ingest/etl/index.md
[grammar] ~242-~242: There might be a mistake here.
Context: ...f}azure-functions - {ref}dbt - {ref}dlt - {ref}dms - {ref}dynamodb - {ref}`est...
(QB_NEW_EN)
[grammar] ~243-~243: There might be a mistake here.
Context: ...tions - {ref}dbt - {ref}dlt - {ref}dms - {ref}dynamodb - {ref}estuary` - {ref}...
(QB_NEW_EN)
[grammar] ~244-~244: There might be a mistake here.
Context: ...}dbt - {ref}dlt - {ref}dms - {ref}dynamodb - {ref}estuary - {ref}flink - {ref}`ho...
(QB_NEW_EN)
[grammar] ~245-~245: There might be a mistake here.
Context: ... - {ref}dms - {ref}dynamodb - {ref}estuary - {ref}flink - {ref}hop - {ref}iceber...
(QB_NEW_EN)
[grammar] ~246-~246: There might be a mistake here.
Context: ...{ref}dynamodb - {ref}estuary - {ref}flink - {ref}hop - {ref}iceberg - {ref}`infl...
(QB_NEW_EN)
[grammar] ~247-~247: There might be a mistake here.
Context: ... - {ref}estuary - {ref}flink - {ref}hop - {ref}iceberg - {ref}influxdb - {ref}...
(QB_NEW_EN)
[grammar] ~248-~248: There might be a mistake here.
Context: ...ary - {ref}flink - {ref}hop - {ref}iceberg - {ref}influxdb - {ref}ingestr` - {ref}...
(QB_NEW_EN)
[grammar] ~249-~249: There might be a mistake here.
Context: ...k - {ref}hop - {ref}iceberg - {ref}influxdb - {ref}ingestr - {ref}kafka - {ref}ke...
(QB_NEW_EN)
[grammar] ~250-~250: There might be a mistake here.
Context: ...{ref}iceberg - {ref}influxdb - {ref}ingestr - {ref}kafka - {ref}kestra - {ref}`kin...
(QB_NEW_EN)
docs/integrate/ingestr/index.md
[grammar] ~11-~11: There might be a mistake here.
Context: ...ication for copying data from any source to any destination database. It supports...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ... {ref}dlt. ::::{grid} :::{grid-item} - Single command: ingestr allows copying...
(QB_NEW_EN)
[grammar] ~25-~25: There might be a mistake here.
Context: ...refresh and incremental loading modes. ::: :::{grid-item} ![ingestr in a nutsh...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...ental loading modes. ::: :::{grid-item} 
[grammar] ~29-~29: There might be a mistake here.
Context: ...sources/demo.gif?raw=true){loading=lazy} ::: :::: ## Synopsis Invoke ingestr ...
(QB_NEW_EN)
[grammar] ~67-~67: There might be a mistake here.
Context: ...databases, data platforms, and analytics engines, including all [databases suppor...
(QB_NEW_EN)
[grammar] ~70-~70: There might be a mistake here.
Context: ...emy]. :::{rubric} Traditional Databases ::: CockroachDB, CrateDB, Firebird, Hype...
(QB_NEW_EN)
[grammar] ~71-~71: There might be a mistake here.
Context: .... :::{rubric} Traditional Databases ::: CockroachDB, CrateDB, Firebird, HyperSQL...
(QB_NEW_EN)
[grammar] ~77-~77: There might be a mistake here.
Context: ...ubric} Cloud Data Warehouses & Analytics ::: Amazon Athena, Amazon Redshift, Data...
(QB_NEW_EN)
[grammar] ~78-~78: There might be a mistake here.
Context: ...c} Cloud Data Warehouses & Analytics ::: Amazon Athena, Amazon Redshift, Databend...
(QB_NEW_EN)
[grammar] ~83-~83: There might be a mistake here.
Context: ...age :::{rubric} Specialized Data Stores ::: Apache Drill, Apache Druid, Apache H...
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ... :::{rubric} Specialized Data Stores ::: Apache Drill, Apache Druid, Apache Hive ...
(QB_NEW_EN)
[grammar] ~88-~88: There might be a mistake here.
Context: ... OpenSearch :::{rubric} Message Brokers ::: Amazon Kinesis, Apache Kafka (Amazon...
(QB_NEW_EN)
[grammar] ~89-~89: There might be a mistake here.
Context: ...nSearch :::{rubric} Message Brokers ::: Amazon Kinesis, Apache Kafka (Amazon MSK...
(QB_NEW_EN)
[grammar] ~92-~92: There might be a mistake here.
Context: ...nda, RobustMQ) :::{rubric} File Formats ::: CSV, JSONL/NDJSON, Parquet :::{rubr...
(QB_NEW_EN)
[grammar] ~93-~93: There might be a mistake here.
Context: ... RobustMQ) :::{rubric} File Formats ::: CSV, JSONL/NDJSON, Parquet :::{rubric} ...
(QB_NEW_EN)
[grammar] ~96-~96: There might be a mistake here.
Context: ...JSON, Parquet :::{rubric} Object Stores ::: Amazon S3, Google Cloud Storage :::...
(QB_NEW_EN)
[grammar] ~97-~97: There might be a mistake here.
Context: ..., Parquet :::{rubric} Object Stores ::: Amazon S3, Google Cloud Storage :::{rub...
(QB_NEW_EN)
[grammar] ~100-~100: There might be a mistake here.
Context: ...e :::{rubric} SaaS Platforms & Services ::: Airtable, Asana, GitHub, Google Ads,...
(QB_NEW_EN)
[grammar] ~101-~101: There might be a mistake here.
Context: ...::{rubric} SaaS Platforms & Services ::: Airtable, Asana, GitHub, Google Ads, Goo...
(QB_NEW_EN)
[grammar] ~102-~102: There might be a mistake here.
Context: ...oogle Analytics, Google Sheets, HubSpot, Notion, Personio, Salesforce, Slack, Str...
(QB_NEW_EN)
[grammar] ~122-~122: There might be a mistake here.
Context: ...card} Examples: Use ingestr with CrateDB 🔗 https://github.com/crate/cratedb-...
(QB_NEW_EN)
[grammar] ~123-~123: There might be a mistake here.
Context: ...b-examples/tree/main/application/ingestr :link-type: url Executable code examples...
(QB_NEW_EN)
[grammar] ~124-~124: There might be a mistake here.
Context: ...main/application/ingestr :link-type: url Executable code examples / rig that demo...
(QB_NEW_EN)
[grammar] ~125-~125: There might be a mistake here.
Context: ... that demonstrates how to use ingestr to load data from Kafka to CrateDB. ::: ::...
(QB_NEW_EN)
[grammar] ~126-~126: There might be a mistake here.
Context: ...estr to load data from Kafka to CrateDB. ::: :::: [databases supported by SQL...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/integrate/dlt/index.md
83-83: Bare URL used
(MD034, no-bare-urls)
89-89: Bare URL used
(MD034, no-bare-urls)
105-105: Link and image reference definitions should be needed
Unused link or image reference definition: "databases supported by sqlalchemy"
(MD053, link-image-reference-definitions)
docs/integrate/ingestr/index.md
111-111: Bare URL used
(MD034, no-bare-urls)
117-117: Bare URL used
(MD034, no-bare-urls)
123-123: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (5)
docs/integrate/ingestr/index.md (1)
37-44: CrateDB → DuckDB example matches upstream syntax (LGTM).Flags and URI formats align with ingestr’s documented CrateDB source and DuckDB destination usage. Good choice of a read-only sys table. (bruin-data.github.io)
docs/integrate/dlt/index.md (1)
13-17: Statement “dlt is used by ingestr” is correct (LGTM).ingestr explicitly states it uses dlt and SQLAlchemy internally. Keeping this cross-link is helpful context.
Source: ingestr Core Concepts. (bruin-data.github.io)
docs/ingest/etl/index.md (3)
41-45: New dlt entry reads clearly and fits the section (LGTM).Concise and aligned with the rest of the descriptions.
51-55: ingestr entry is accurate and consistent with its integration page (LGTM).Wording matches the new integration doc and avoids promising specific vendors.
237-265: Alphabetical additions are placed correctly (LGTM).“dlt” after “dbt” and “ingestr” after “influxdb” is correct; anchors resolve to the new pages.
About
Using dlt and ingestr with CrateDB for data loading started working just recently.
Preview
References
/cc @hammerhead, @hlcianfagna