Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 23, 2025

What changes were proposed in this pull request?

This PR aims to support orc.compression.zstd.strategy.

Why are the changes needed?

To allow a user to choose a proper strategy based on their data.

https://facebook.github.io/zstd/zstd_manual.html#Chapter5

typedef enum { ZSTD_fast=1,
               ZSTD_dfast=2,
               ZSTD_greedy=3,
               ZSTD_lazy=4,
               ZSTD_lazy2=5,
               ZSTD_btlazy2=6,
               ZSTD_btopt=7,
               ZSTD_btultra=8,
               ZSTD_btultra2=9
               /* note : new strategies _might_ be added in the future.
                         Only the order (from fast to strong) is guaranteed */
} ZSTD_strategy;

How was this patch tested?

Pass the CIs.

$ cd java
$ mvn package -DskipTests -Pbenchmark
$ cd bench

$ time java -Dorc.compression.zstd.strategy=1 -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c zstd -f orc
...
54.51s user 1.28s system 103% cpu 53.984 total

$ time java -Dorc.compression.zstd.strategy=9 -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c zstd -f orc
...
148.21s user 1.75s system 101% cpu 2:28.13 total

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @williamhyun ?

@dongjoon-hyun
Copy link
Member Author

Thank you, @cxzl25 .

Copy link
Member

@williamhyun williamhyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM, will be merging this now!

williamhyun pushed a commit that referenced this pull request Jul 24, 2025
### What changes were proposed in this pull request?

This PR aims to support `orc.compression.zstd.strategy`.

### Why are the changes needed?

To allow a user to choose a proper strategy based on their data.

https://facebook.github.io/zstd/zstd_manual.html#Chapter5

```
typedef enum { ZSTD_fast=1,
               ZSTD_dfast=2,
               ZSTD_greedy=3,
               ZSTD_lazy=4,
               ZSTD_lazy2=5,
               ZSTD_btlazy2=6,
               ZSTD_btopt=7,
               ZSTD_btultra=8,
               ZSTD_btultra2=9
               /* note : new strategies _might_ be added in the future.
                         Only the order (from fast to strong) is guaranteed */
} ZSTD_strategy;
```

### How was this patch tested?

Pass the CIs.

```
$ cd java
$ mvn package -DskipTests -Pbenchmark
$ cd bench

$ time java -Dorc.compression.zstd.strategy=1 -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c zstd -f orc
...
54.51s user 1.28s system 103% cpu 53.984 total

$ time java -Dorc.compression.zstd.strategy=9 -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c zstd -f orc
...
148.21s user 1.75s system 101% cpu 2:28.13 total
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #2338 from dongjoon-hyun/ORC-1961.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: William Hyun <[email protected]>
(cherry picked from commit 725fbc5)
Signed-off-by: William Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member Author

Thank you, @williamhyun !

@dongjoon-hyun dongjoon-hyun deleted the ORC-1961 branch July 25, 2025 15:44
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Jul 30, 2025
### What changes were proposed in this pull request?

This PR aims to upgrade ORC to 2.2.0. 2.2.0 RC1 is currently under voting.

### Why are the changes needed?

Apache ORC 2.2.0 is a new feature release.
- https://github.com/apache/orc/releases/tag/v2.2.0
  - apache/orc#2032
  - apache/orc#2338
  - apache/orc#2269
  - apache/orc#2249
  - apache/orc#2144

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51676 from williamhyun/ORC-2.2.0.

Authored-by: William Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants