Skip to content

Improve PerTableConfig's extensibility and make it format agnostic #297

@ashvina

Description

@ashvina

This enhancement request stems from 1) the conversation in #293 and 2) emerging needs such as #296. The intent is to clean up the PerTableConfig.

OneTable's sync flow is based on PerTableConfig, which is the input configuration provided by the user. The sync process translates metadata of a single source table into one or more target tables. The user must provide this config for the translation process to be successful.

This image below illustrates the current structure of PerTableConfig.
image

However, the use cases have changed over time and now require more flexibility and compatibility with different table formats. A different location may be required for generating the metadata of the target table. In that case, the path to that location should also be provided. Additionally, the target table may have a connection to another catalog instance. Which means that the target table requires not just a format identifier, but also some of the configurations that are currently provided for a source table only.

The current configuration object includes some configurations that are specific to Iceberg and Hudi formats. These configurations should be wrapped by input configuration instances that are specific to each format.

The following image shows the proposed PerTableConfig.
image

  1. A better way to name a PerTableConfig is a TableSyncConfig, because it is a configuration for synchronizing a table.
  2. Separate the configurations for the sync task, common table configs, and configs specific to formats.
  3. Instead of using only a format identifier, represent target table formats as a table. Create a separate entity called ExternalTable that can be either a source table or a target table. The ExternalTable clearly differentiates between internal representation and external table.

A possible class structure for representing the table config is the topic I want to discuss.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions