Skip to content

Conversation

@kbatuigas
Copy link
Contributor

@kbatuigas kbatuigas commented Jul 12, 2025

Description

This pull request introduces updates to documentation and configuration related to Iceberg integration in Redpanda. The changes include support for JSON schemas, updates to Iceberg schema modes, and adjustments to topic configuration and query examples. Below is a summary of the most important changes:

Iceberg Schema and JSON Schema Support

  • Added support for JSON Schema Draft-07 in Iceberg integration, including requirements for schema dialect declaration and constraints for type definitions. Unsupported features like $ref, default, and conditional typing are documented. [1] [2]

  • Updated the value_schema_latest mode to require fully qualified schema names for Protobuf and clarified that it is incompatible with the Schema Registry wire format. [1] [2]

Documentation Updates

  • Renamed choose-iceberg-mode.adoc to specify-iceberg-schema.adoc and updated its content to reflect the new schema support and integration details. [1] [2]

  • Updated navigation links in nav.adoc to point to the newly renamed and updated Iceberg schema documentation.

Configuration and Examples

  • Modified topic configuration examples to align with the new schema modes and clarified usage of the redpanda.iceberg.mode property. [1] [2]

  • Corrected examples for producing data to topics, including adjustments to JSON formatting and commands.

Additional Enhancements

  • Updated Iceberg type mappings for Protobuf and JSON Schema, including corrections to mapping terminology (e.g., "repeated values" to "list types"). [1] [2]

  • Clarified error handling and dead-letter queue behavior for misconfigured topics, emphasizing the pause in data translation instead of writing to the DLQ.

Resolves https://redpandadata.atlassian.net/browse/
Review deadline: 23 July

Page previews

Specify Iceberg Schema
Query Iceberg Topics > Query examples > Topic with schema
What's New

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@netlify
Copy link

netlify bot commented Jul 12, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 09811b6
🔍 Latest deploy log https://app.netlify.com/projects/redpanda-docs-preview/deploys/68843946ec955c000880a38f
😎 Deploy Preview https://deploy-preview-1207--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 12, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch DOC-1379-document-feature-iceberg-support-for-json-schema

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nvartolomei nvartolomei self-requested a review July 15, 2025 14:04
@paulohtb6 paulohtb6 force-pushed the DOC-1379-document-feature-iceberg-support-for-json-schema branch from 5e48557 to 9e0ef10 Compare July 15, 2025 16:03
Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. You must register a schema in the Schema Registry. Unlike the `value_schema_id_prefix` mode, `value_schema_latest` does not require that producers use the wire format.
Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. You must register a schema in the Schema Registry. For Protobuf, you must use the fully qualified schema name, which includes the package name, for example `com.example.manufacturing.SensorData`.

Producers cannot use the wire format in `value_schema_latest` mode.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a certain format that is expected instead? What can producers use?

Copy link
Contributor

@nvartolomei nvartolomei Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous wording was perfect here. Producers can use anything they like. You can link the "wire format" to https://docs.redpanda.com/current/manage/schema-reg/schema-reg-overview/#wire-format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Producers cannot use the wire format in value_schema_latest mode. Instead the serialized message is expected as-is in the record value." maybe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@kbatuigas kbatuigas marked this pull request as ready for review July 18, 2025 06:27
@kbatuigas kbatuigas requested a review from a team as a code owner July 18, 2025 06:27
@kbatuigas kbatuigas force-pushed the DOC-1379-document-feature-iceberg-support-for-json-schema branch from 672755d to 260d1b7 Compare July 18, 2025 17:46
Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. You must register a schema in the Schema Registry. Unlike the `value_schema_id_prefix` mode, `value_schema_latest` does not require that producers use the wire format.
Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. You must register a schema in the Schema Registry. For Protobuf, you must use the fully qualified schema name, which includes the package name, for example `com.example.manufacturing.SensorData`.

Producers cannot use the wire format in `value_schema_latest` mode.
Copy link
Contributor

@nvartolomei nvartolomei Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous wording was perfect here. Producers can use anything they like. You can link the "wire format" to https://docs.redpanda.com/current/manage/schema-reg/schema-reg-overview/#wire-format

@kbatuigas kbatuigas requested a review from nvartolomei July 21, 2025 16:10
@kbatuigas kbatuigas force-pushed the DOC-1379-document-feature-iceberg-support-for-json-schema branch from 3267bec to 63e4a83 Compare July 22, 2025 22:33
Copy link
Contributor

@nvartolomei nvartolomei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for the iceberg text

@kbatuigas kbatuigas requested a review from mattschumpert July 23, 2025 18:05

You can inspect the DLQ table for records that failed to write to the Iceberg table, and you can take further action on these records, such as transforming and reprocessing them, or debugging issues that occurred upstream.

NOTE: Topic property misconfiguration, such as setting `redpanda.iceberg.mode` to `value_schema_latest` but not specifying the fully qualified schema name, does not cause records to be written to the DLQ table. Instead, Redpanda pauses the topic data translation to the Iceberg table until you fix the misconfiguration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to specify the schema name if your schema uses the TopicNamingStrategy thoug: https://docs.redpanda.com/current/manage/iceberg/choose-iceberg-mode/#override-value-schema-latest-default

IIUC you have a choice of doing that or specifying the fully qualified name, so this is misleading.

At least the docs say that. cc @rockwotj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to specify that it's for overriding the default, and added a cross reference to the doc that explains this override.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

=== Topic with schema (`value_schema_id_prefix` mode)

NOTE: The steps in this section also apply to the `value_schema_latest` mode, except for step 2. The `value_schema_latest` mode doesn't require the Schema Registry wire format, so you'll use your own producer code instead of xref:reference:rpk/rpk-topic/rpk-topic-produce[`rpk topic produce`].
NOTE: The steps in this section also apply to the `value_schema_latest` mode, except the produce step. The `value_schema_latest` mode is not compatible with the Schema Registry wire format. The xref:reference:rpk/rpk-topic/rpk-topic-produce[`rpk topic produce`] command embeds the wire format header, so you must use your own producer code with `value_schema_latest`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r-vasquez I think it would be really great if you could produce NOT using the wire format always. A lot of our customers don't use it, so having be a mode in your profile where you can choose how RPK produces (and maybe consumes) would help customers a lot who don't use the wire format.

Copy link
Contributor

@JakeSCahill JakeSCahill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments for your consideration.

+---------+--------------+--------------------------+
----

== Manage dead-letter queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users are more likely to be scanning for 'how do I debug errors with Iceberg writes'. This heading feels a little system-centric rather than goal focused.

== Manage dead-letter queue

Errors may occur when translating records in the `value_schema_id_prefix` mode to the Iceberg table format; for example, if you do not use the Schema Registry wire format with the magic byte, if the schema ID in the record is not found in the Schema Registry, or if an Avro or Protobuf data type cannot be translated to an Iceberg type.
Errors may occur when translating records in the `value_schema_id_prefix` or `value_schema_latest` modes to the Iceberg table format; for example, if you do not use the Schema Registry wire format with the magic byte, if the schema ID in the record is not found in the Schema Registry, or if a schema data type cannot be translated to an Iceberg type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would restructure this section so that users learn in this order:

  1. what happens when RP encounters an error
  2. examples of errors (bulleted list)
  3. what a DLQ is
  4. structure of DLQ
  5. how to inspect it (example SQL statement)
  6. how to drop invalid records instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion @JakeSCahill , I've restructured this section. I think it makes even more sense now to break this out into its own standalone doc. I'll do that in a later PR.

@kbatuigas kbatuigas force-pushed the DOC-1379-document-feature-iceberg-support-for-json-schema branch from 9d28d0f to 4d5f898 Compare July 26, 2025 02:10
@kbatuigas kbatuigas merged commit 88e640f into beta Jul 26, 2025
7 checks passed
@kbatuigas kbatuigas deleted the DOC-1379-document-feature-iceberg-support-for-json-schema branch July 26, 2025 02:21
paulohtb6 pushed a commit that referenced this pull request Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants