Skip to content

Conversation

@steFaiz
Copy link
Contributor

@steFaiz steFaiz commented Nov 28, 2025

SST File format

Linked issue: None

The underlying file structure is based on org.apache.paimon.lookup.sort.SortLookupStoreReader

This PR is about to introduce an SST FileFormat for paimon, which is useful in below scenarios:

  1. PK Lookup accelerate
  2. BTree Global Index

An SST File is designed to serve:

  1. Point queries: lookup a specified key
  2. Range queries: seek to somewhere on which the record is exactly greater than or equal to the target key, then scan the rest of records (user can decide when to stop)
  3. Random access: directly return records specified by selection

Tests

Please See:

  1. org.apache.paimon.format.sst.SstFileFormatTest
  2. org.apache.paimon.format.sst.SstFileTest

API and Format

This PR adds a pair of new interface for FileFormat:

 public abstract class FileFormat {

    public FormatReaderFactory createReaderFactory(
            RowType dataSchemaRowType,
            RowType projectedRowType,
            @Nullable List<Predicate> filters,
            RowType keyType,
            RowType valueType) {
        return createReaderFactory(dataSchemaRowType, projectedRowType, filters);
    }

    public FormatWriterFactory createWriterFactory(
            RowType type, RowType keyType, RowType valueType) {
        return createWriterFactory(type);
    }
}

These two methods will be override by File Formats which need to distinguish keys and values from an input row.

Documentation

Documents will be added ASAP.

@steFaiz steFaiz marked this pull request as draft November 28, 2025 07:26
@steFaiz
Copy link
Contributor Author

steFaiz commented Nov 28, 2025

This PR is stil working on:

  1. modify block cache to read bytes from any SeekableInputStream
  2. optimize memory using for Scan scenario.
  3. optimize file layout

@steFaiz steFaiz closed this Dec 1, 2025
@steFaiz steFaiz reopened this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant