Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Sep 23, 2025

Which issue does this PR close?

Closes #.

Rationale for this change

We want to add Parquet Module Encryption support for the native readers when using a Spark KMS. We use the encryption factory features added in DataFusion 50 to register an encryption factory that uses JNI to get decryption keys from Spark.

What changes are included in this PR?

How are these changes tested?

  • Existing PME tests with new readers added.
  • New tests that exercise PME options like plaintext footer, etc.

@mbutrovich mbutrovich changed the title feat: Parquet Modular Encryption support for native_datafusion and native_iceberg_compat readers feat: Parquet Modular Encryption with Spark KMS for native_datafusion and native_iceberg_compat readers Sep 23, 2025
@mbutrovich mbutrovich changed the title feat: Parquet Modular Encryption with Spark KMS for native_datafusion and native_iceberg_compat readers feat: Parquet Modular Encryption with Spark KMS for native readers Sep 23, 2025
@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2025

Codecov Report

❌ Patch coverage is 36.78161% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.92%. Comparing base (f09f8af) to head (257f163).
⚠️ Report is 579 commits behind head on main.

Files with missing lines Patch % Lines
...rg/apache/comet/parquet/CometFileKeyUnwrapper.java 0.00% 18 Missing ⚠️
...a/org/apache/comet/parquet/CometParquetUtils.scala 0.00% 15 Missing ⚠️
...ain/scala/org/apache/comet/CometExecIterator.scala 33.33% 7 Missing and 1 partial ⚠️
...va/org/apache/comet/parquet/NativeBatchReader.java 0.00% 5 Missing ⚠️
...n/scala/org/apache/spark/sql/comet/operators.scala 80.76% 3 Missing and 2 partials ⚠️
...n/scala/org/apache/comet/rules/CometScanRule.scala 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2447      +/-   ##
============================================
+ Coverage     56.12%   58.92%   +2.79%     
- Complexity      976     1457     +481     
============================================
  Files           119      147      +28     
  Lines         11743    13642    +1899     
  Branches       2251     2369     +118     
============================================
+ Hits           6591     8038    +1447     
- Misses         4012     4381     +369     
- Partials       1140     1223      +83     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@parthchandra
Copy link
Contributor

@mbutrovich mbutrovich marked this pull request as ready for review September 26, 2025 20:31
# Conflicts:
#	spark/src/main/scala/org/apache/comet/CometExecIterator.scala
@mbutrovich
Copy link
Contributor Author

Results attached from the benchmark I added to CometReadBenchmark, and a small chart with highlights to see what the overhead of encryption is for the various readers.

decryption

benchmark_decryption.txt

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


// spotless:off
/*
* Architecture Overview:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diagram is super helpful, thanks a lot.

# Conflicts:
#	native/core/src/execution/jni_api.rs
#	spark/src/main/scala/org/apache/comet/CometExecIterator.scala
#	spark/src/main/scala/org/apache/comet/Native.scala
Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mbutrovich mbutrovich merged commit c23dc25 into apache:main Oct 7, 2025
102 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants