Skip to content

appendRowGroup will loose pageIndex #2808

@asfimport

Description

@asfimport

Currently, 
org.apache.parquet.hadoop.ParquetFileWriter#appendFile(org.apache.parquet.io.InputFile) uses appendRowGroup method to concate parquet row group. However, appendRowGroup method looses column index.

// code placeholder
  public void appendRowGroup(SeekableInputStream from, BlockMetaData rowGroup,
                             boolean dropColumns) throws IOException {
      ....
      // TODO: column/offset indexes are not copied
      // (it would require seeking to the end of the file for each row groups)
      currentColumnIndexes.add(null);
      currentOffsetIndexes.add(null);
  } 

 
https://github.com/apache/parquet-mr/blob/f8465a274b42e0a96996c76f3be0b50cf85ecf15/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java#L1033C19-L1033C19

 

Look forward to functionality that support append with page index.

 

Reporter: GANHONGNAN

Note: This issue was originally created as PARQUET-2340. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions