-
Couldn't load subscription status.
- Fork 1.5k
Open
Description
Describe the enhancement requested
When writing a V2 data page, it seems that compression is always unconditionally enabled even when compression doesn't actually yield any benefits:
parquet-java/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java
Lines 305 to 311 in 0fea3e1
| boolean compressed = false; | |
| BytesInput compressedData = BytesInput.empty(); | |
| if (data.size() > 0) { | |
| // TODO: decide if we compress | |
| compressedData = compressor.compress(data); | |
| compressed = true; | |
| } |
It would be relatively easy to use a hardcoded threshold (for example 98%) above which compression is disabled, which makes reading faster.
Component(s)
Core