-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Is your feature request related to a problem? Please describe.
We need to be able to write multiple files to object storage in an efficient and concurrent way, that requires uniquely naming and organizing then.
Hive-style partitioning + snowflakeid names was picked.
Describe the solution you'd like
This feature works in the exact way I described a few months ago. There are two new function arguments: partition_strategy='wildcard|hive'
and partition_columns_in_data_file=0|1
.
The user is responsible for setting only the table root, no wildcard/ macro for partition id and not even the filename.
ClickHouse should produce filepaths in the following form: <prefix>/<key1=val1/key2=val2...>/<snowflakeid>.<toLower(file_format)>
.
The partition_strategy='hive'
argument requires the schema to be specified AND a partition by
expression.
Examples can be found in the tests, but just a recap:
arthur :) CREATE TABLE t_03363_parquet (year UInt16, country String, counter UInt8)
ENGINE = S3(s3_conn, filename = 't_03363_parquet', format = Parquet, partition_strategy='hive')
PARTITION BY (year, country);
INSERT INTO t_03363_parquet VALUES
(2022, 'USA', 1),
(2022, 'Canada', 2),
(2023, 'USA', 3),
(2023, 'Mexico', 4),
(2024, 'France', 5),
(2024, 'Germany', 6),
(2024, 'Germany', 7),
(1999, 'Brazil', 8),
(2100, 'Japan', 9),
(2024, 'CN', 10),
(2025, '', 11);
CREATE TABLE t_03363_parquet
(
`year` UInt16,
`country` String,
`counter` UInt8
)
ENGINE = S3(s3_conn, filename = 't_03363_parquet', format = Parquet, partition_strategy = 'hive')
PARTITION BY (year, country)
Query id: 3f5b959d-978c-42bf-9c13-c9cff7590372
Ok.
0 rows in set. Elapsed: 0.028 sec.
INSERT INTO t_03363_parquet FORMAT Values
Query id: 3ccbf8d1-bb3c-49e3-9b64-813d9b03e64f
Ok.
11 rows in set. Elapsed: 0.204 sec.
arthur :) select _path, * from t_03363_parquet;
SELECT
_path,
*
FROM t_03363_parquet
Query id: c211dca1-2dff-4aae-a5e8-2583d2428c8b
┌─_path──────────────────────────────────────────────────────────────────────┬─year─┬─country─┬─counter─┐
1. │ test/t_03363_parquet/year=2100/country=Japan/7329604473272971264.parquet │ 2100 │ Japan │ 9 │
2. │ test/t_03363_parquet/year=2024/country=France/7329604473323302912.parquet │ 2024 │ France │ 5 │
3. │ test/t_03363_parquet/year=2022/country=Canada/7329604473314914304.parquet │ 2022 │ Canada │ 2 │
4. │ test/t_03363_parquet/year=1999/country=Brazil/7329604473289748480.parquet │ 1999 │ Brazil │ 8 │
5. │ test/t_03363_parquet/year=2023/country=Mexico/7329604473293942784.parquet │ 2023 │ Mexico │ 4 │
6. │ test/t_03363_parquet/year=2023/country=USA/7329604473319108608.parquet │ 2023 │ USA │ 3 │
7. │ test/t_03363_parquet/year=2025/country=/7329604473327497216.parquet │ 2025 │ │ 11 │
8. │ test/t_03363_parquet/year=2024/country=CN/7329604473310720000.parquet │ 2024 │ CN │ 10 │
9. │ test/t_03363_parquet/year=2022/country=USA/7329604473298137088.parquet │ 2022 │ USA │ 1 │
10. │ test/t_03363_parquet/year=2024/country=Germany/7329604473306525696.parquet │ 2024 │ Germany │ 6 │
11. │ test/t_03363_parquet/year=2024/country=Germany/7329604473306525696.parquet │ 2024 │ Germany │ 7 │
└────────────────────────────────────────────────────────────────────────────┴──────┴─────────┴─────────┘
11 rows in set. Elapsed: 0.015 sec.
The partition_columns_in_data_file
can be used to control whether ClickHouse will write the partition columns or not. Not only that, but it'll also tell ClickHouse whether it is expected to have those columns in the files upon reading. It can only be used with partition_strategy='hive'
.
I have used function arguments instead of settings because these are not ephemeral and are tied to the table itself. It makes more sense imho.
Refactoring: Treating hive columns as actual columns instead of virtual
One of the main problems I faced when implementing native reads and writes was the fact that use_hive_partitioning
was implemented using virtual columns. This is problematic if we want to have S3 tables associated with hive.
In the existing use_hive_partitioning
implementation, whenever a key-value pair is found in the filepath, this column will be removed from the table schema. Now consider a table with three columns: year, country and counter. And a partition by expression of (year, country).
The following query will result in clickhouse-client
asking for the table metadata (i.e, StorageMetadataPtr
) so that it knows the schema and what to expected from the input.
clickhouse-client -q "insert into s3_hive_table values (2025, "Japan", 9)"
.
With the existing use_Hive_partitioning
implementation, both the year
and country
columns are not part of the schema. Thus, the clickhouse-client
will only see counter
. You can see where I am going, right?
Therefore, I refactored the hive partition columns to be in the metadata instead of virtual (for now, this is just a poc that seems to work ok). I think it also makes more sense as it is actually part of the schema, specially if we have partition_columns_in_data_file=1
BUT in case it is ok to have the partition columns existing both in virtual columns and storage, then I guess this refactoring is not needed?
Backwards compatibility with use_hive_partitioning
This setting is not optimal imo, and the name might be confusing. Still, I believe I had to make it backwards compatible. So, here's how use_hive_partitioning
and partition_strategy
interact:
partition_strategy='hive'
is the dominant control. It is like a request for hive style. It'll allow native reads and writes in hive-style format. The use_hive_partitioning
setting should have no effect here.
On the other hand, use_hive_partitioning
is more like a hint for reads that only work if the path is globbed. It should work just like it does nowadays.