Skip to content

Commit 6ba21c8

Browse files
authored
Merge pull request #5518 from gchq/5409-write-java-side-df-query-code
Issue 5409 - Write Java side of DataFusion query code
2 parents 93134c5 + cb0a7f3 commit 6ba21c8

File tree

85 files changed

+3955
-888
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+3955
-888
lines changed

.github/config/chunks.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ chunks:
2222
- configuration
2323
- sketches
2424
- parquet
25+
- arrow
2526
- common/common-job
2627
- common/common-task
2728
- common/common-invoke-tables
@@ -54,6 +55,7 @@ chunks:
5455
name: Rust
5556
workflow: chunk-rust.yaml
5657
modules:
58+
- query/query-datafusion
5759
- compaction/compaction-datafusion
5860
- foreign-bridge
5961
ingest:
@@ -78,6 +80,5 @@ chunks:
7880
- query/query-core
7981
- query/query-runner
8082
- query/query-lambda
81-
- query/query-datafusion
8283
- athena
8384
- trino

.github/workflows/chunk-clients-cdk.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ on:
2828
- 'java/ingest/ingest-batcher-store/**'
2929
- 'java/common/common-task/**'
3030
- 'java/query/query-runner/**'
31+
- 'java/query/query-datafusion/**'
3132
- 'java/configuration/**'
3233
- 'java/common/dynamodb-tools/**'
3334
- 'java/bulk-import/bulk-import-core/**'
@@ -50,6 +51,7 @@ on:
5051
- 'java/common/localstack-test/**'
5152
- 'java/bulk-export/bulk-export-core/**'
5253
- 'java/foreign-bridge/**'
54+
- 'java/arrow/**'
5355

5456
jobs:
5557
chunk-workflow:

.github/workflows/chunk-common.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ on:
2929
- 'java/compaction/compaction-core/**'
3030
- 'java/ingest/ingest-tracker/**'
3131
- 'java/ingest/ingest-core/**'
32+
- 'java/arrow/**'
3233

3334
jobs:
3435
chunk-workflow:

.github/workflows/chunk-compaction.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ on:
4040
- 'java/example-iterators/**'
4141
- 'java/core/**'
4242
- 'java/foreign-bridge/**'
43+
- 'java/arrow/**'
4344

4445
jobs:
4546
chunk-workflow:

.github/workflows/chunk-ingest.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ on:
3131
- 'java/sketches/**'
3232
- 'java/example-iterators/**'
3333
- 'java/core/**'
34+
- 'java/arrow/**'
3435

3536
jobs:
3637
chunk-workflow:

.github/workflows/chunk-query.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ on:
1313
- 'java/query/query-runner/**'
1414
- 'java/query/query-lambda/**'
1515
- 'java/query/query-datafusion/**'
16+
- 'java/foreign-bridge/**'
1617
- 'java/athena/**'
1718
- 'java/trino/**'
1819
- 'java/partitions/splitter/**'
@@ -28,6 +29,7 @@ on:
2829
- 'java/example-iterators/**'
2930
- 'java/core/**'
3031
- 'java/common/localstack-test/**'
32+
- 'java/arrow/**'
3133

3234
jobs:
3335
chunk-workflow:

.github/workflows/chunk-rust.yaml

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,26 @@ on:
99
- 'code-style/spotbugs*.xml'
1010
- 'rust/**'
1111
- 'java/pom.xml'
12+
- 'java/query/pom.xml'
1213
- 'java/compaction/pom.xml'
14+
- 'java/query/query-datafusion/**'
1315
- 'java/compaction/compaction-datafusion/**'
16+
- 'java/foreign-bridge/**'
17+
- 'java/query/query-core/**'
18+
- 'java/ingest/ingest-runner/**'
19+
- 'java/statestore/**'
20+
- 'java/common/common-job/**'
21+
- 'java/ingest/ingest-tracker/**'
22+
- 'java/configuration/**'
23+
- 'java/ingest/ingest-core/**'
24+
- 'java/common/dynamodb-tools/**'
25+
- 'java/example-iterators/**'
26+
- 'java/parquet/**'
1427
- 'java/compaction/compaction-core/**'
1528
- 'java/sketches/**'
16-
- 'java/parquet/**'
17-
- 'java/core/**'
1829
- 'java/common/localstack-test/**'
19-
- 'java/foreign-bridge/**'
30+
- 'java/core/**'
31+
- 'java/arrow/**'
2032

2133
jobs:
2234
chunk-workflow:

.github/workflows/java-status.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ jobs:
4949
working-directory: ./java
5050
run: |
5151
PROJECT_ROOT=$(cd .. && pwd)
52-
mvn install -Pquick -q -e -pl clients -am -Dmaven.repo.local=${{ runner.temp }}/.m2/repository
52+
mvn install -Pquick,skipShade -q -e -pl clients -am -DskipRust -Dmaven.repo.local=${{ runner.temp }}/.m2/repository
5353
mvn exec:java -q -e -Dmaven.repo.local=${{ runner.temp }}/.m2/repository -pl clients \
5454
-Dexec.mainClass=sleeper.clients.deploy.documentation.GeneratePropertiesTemplates \
5555
-Dexec.args="$PROJECT_ROOT"

docs/usage/properties/instance/user/table_property_defaults.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,4 @@ The following instance properties relate to default values used by table propert
4747
| sleeper.default.gc.commit.async | This is the default for whether the garbage collector will record deleted files asynchronously via the state store committer, if asynchronous commit is enabled. Otherwise, the garbage collector will record this directly to the state store. | true | false |
4848
| sleeper.default.statestore.committer.update.every.commit | When using the transaction log state store, this sets whether to update from the transaction log before adding a transaction in the asynchronous state store committer.<br>If asynchronous commits are used for all or almost all state store updates, this can be false to avoid the extra queries.<br>If the state store is commonly updated directly outside of the asynchronous committer, this can be true to avoid conflicts and retries. | false | false |
4949
| sleeper.default.statestore.committer.update.every.batch | When using the transaction log state store, this sets whether to update from the transaction log before adding a batch of transactions in the asynchronous state store committer. | true | false |
50-
| sleeper.default.table.data.engine | Select which data engine to use for the table. Valid values are: [java, datafusion] | DATAFUSION | false |
50+
| sleeper.default.table.data.engine | Select which data engine to use for the table. Valid values are: [java, datafusion, datafusion_experimental] | DATAFUSION | false |

docs/usage/properties/table/data_definition.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The following table properties relate to the definition of data inside a table.
88
| sleeper.table.id | A unique ID identifying this table, generated by Sleeper on table creation. | |
99
| sleeper.table.online | A boolean flag representing whether this table is online or offline.<br>An offline table will not have any partition splitting or compaction jobs run automatically.<br>Note that taking a table offline will not stop any partitions that are being split or compaction jobs that are running. Additionally, you are still able to ingest data to offline tables and perform queries against them. | true |
1010
| sleeper.table.schema | The schema representing the structure of this table. | |
11-
| sleeper.table.data.engine | Select which data engine to use for the table. Valid values are: [java, datafusion] | DATAFUSION |
11+
| sleeper.table.data.engine | Select which data engine to use for the table. Valid values are: [java, datafusion, datafusion_experimental] | DATAFUSION |
1212
| sleeper.table.iterator.class.name | Fully qualified class of a custom iterator to apply to this table. Defaults to nothing. This will be applied both during queries and during compaction, and will apply the results to the underlying table data persistently. This forces use of the Java data engine for compaction. This is not recommended, as the Java implementation is much slower and much more expensive. Consider using the aggregation and filtering properties instead. | |
1313
| sleeper.table.iterator.config | A configuration string to be passed to the iterator specified in `sleeper.table.iterator.class.name`. This will be read by the custom iterator object. | |
1414
| sleeper.table.filters | Sets how rows are filtered out and deleted from the table. This is applied every time the data is read, e.g. during compactions or queries. Defaults to retaining all rows.<br>Currently this can only be `ageOff(field,age)`, to age off old data. The first parameter is the name of the timestamp field to check against, which must be of type long, in milliseconds since the epoch. The second parameter is the maximum age in milliseconds, e.g. 1209600000 for 2 weeks. | |

0 commit comments

Comments
 (0)