Object storage hive partitioned writes with snowflake id

**Is your feature request related to a problem? Please describe.**
We need to be able to write multiple files to object storage in an efficient and concurrent way, that requires uniquely naming and organizing then.

Hive-style partitioning + snowflakeid names was picked.

**Describe the solution you'd like**

This feature works in the exact way I described a few months ago. There are two new function arguments: `partition_strategy='wildcard|hive'` and `partition_columns_in_data_file=0|1`.

The user is responsible for setting only the table root, no wildcard/ macro for partition id and not even the filename. 

ClickHouse should produce filepaths in the following form: `<prefix>/<key1=val1/key2=val2...>/<snowflakeid>.<toLower(file_format)>`.

The `partition_strategy='hive'` argument requires the schema to be specified AND a `partition by` expression.

Examples can be found in the [tests](https://github.com/ClickHouse/ClickHouse/pull/76802/files#diff-0f5d656521c8d2361e6c331e6948368ef20d4cbfc9ea7c39b06256182d7ecef2), but just a recap:

```
arthur :) CREATE TABLE t_03363_parquet (year UInt16, country String, counter UInt8)
ENGINE = S3(s3_conn, filename = 't_03363_parquet', format = Parquet, partition_strategy='hive')
PARTITION BY (year, country);

INSERT INTO t_03363_parquet VALUES
    (2022, 'USA', 1),
    (2022, 'Canada', 2),
    (2023, 'USA', 3),
    (2023, 'Mexico', 4),
    (2024, 'France', 5),
    (2024, 'Germany', 6),
    (2024, 'Germany', 7),
    (1999, 'Brazil', 8),
    (2100, 'Japan', 9),
    (2024, 'CN', 10),
    (2025, '', 11);

CREATE TABLE t_03363_parquet
(
    `year` UInt16,
    `country` String,
    `counter` UInt8
)
ENGINE = S3(s3_conn, filename = 't_03363_parquet', format = Parquet, partition_strategy = 'hive')
PARTITION BY (year, country)

Query id: 3f5b959d-978c-42bf-9c13-c9cff7590372

Ok.

0 rows in set. Elapsed: 0.028 sec. 


INSERT INTO t_03363_parquet FORMAT Values

Query id: 3ccbf8d1-bb3c-49e3-9b64-813d9b03e64f

Ok.

11 rows in set. Elapsed: 0.204 sec. 

arthur :) select _path, * from t_03363_parquet;

SELECT
    _path,
    *
FROM t_03363_parquet

Query id: c211dca1-2dff-4aae-a5e8-2583d2428c8b

    ┌─_path──────────────────────────────────────────────────────────────────────┬─year─┬─country─┬─counter─┐
 1. │ test/t_03363_parquet/year=2100/country=Japan/7329604473272971264.parquet   │ 2100 │ Japan   │       9 │
 2. │ test/t_03363_parquet/year=2024/country=France/7329604473323302912.parquet  │ 2024 │ France  │       5 │
 3. │ test/t_03363_parquet/year=2022/country=Canada/7329604473314914304.parquet  │ 2022 │ Canada  │       2 │
 4. │ test/t_03363_parquet/year=1999/country=Brazil/7329604473289748480.parquet  │ 1999 │ Brazil  │       8 │
 5. │ test/t_03363_parquet/year=2023/country=Mexico/7329604473293942784.parquet  │ 2023 │ Mexico  │       4 │
 6. │ test/t_03363_parquet/year=2023/country=USA/7329604473319108608.parquet     │ 2023 │ USA     │       3 │
 7. │ test/t_03363_parquet/year=2025/country=/7329604473327497216.parquet        │ 2025 │         │      11 │
 8. │ test/t_03363_parquet/year=2024/country=CN/7329604473310720000.parquet      │ 2024 │ CN      │      10 │
 9. │ test/t_03363_parquet/year=2022/country=USA/7329604473298137088.parquet     │ 2022 │ USA     │       1 │
10. │ test/t_03363_parquet/year=2024/country=Germany/7329604473306525696.parquet │ 2024 │ Germany │       6 │
11. │ test/t_03363_parquet/year=2024/country=Germany/7329604473306525696.parquet │ 2024 │ Germany │       7 │
    └────────────────────────────────────────────────────────────────────────────┴──────┴─────────┴─────────┘

11 rows in set. Elapsed: 0.015 sec. 

```

The `partition_columns_in_data_file` can be used to control whether ClickHouse will write the partition columns or not. Not only that, but it'll also tell ClickHouse whether it is expected to have those columns in the files upon reading. It can only be used with `partition_strategy='hive'`.

I have used function arguments instead of settings because these are not ephemeral and are tied to the table itself. It makes more sense imho.

### Refactoring: Treating hive columns as actual columns instead of virtual

One of the main problems I faced when implementing native reads and writes was the fact that `use_hive_partitioning` was implemented using virtual columns. This is problematic if we want to have S3 tables associated with hive.

In the existing `use_hive_partitioning` implementation, whenever a key-value pair is found in the filepath, this column will be **removed from the table schema**. Now consider a table with three columns: year, country and counter. And a partition by expression of (year, country).

The following query will result in `clickhouse-client` asking for the table metadata (i.e, `StorageMetadataPtr`) so that it knows the schema and what to expected from the input.

`clickhouse-client -q "insert into s3_hive_table values (2025, "Japan", 9)"` .

With the existing `use_Hive_partitioning` implementation, both the `year` and `country` columns are not part of the schema. Thus, the `clickhouse-client` will only see `counter`. You can see where I am going, right?

Therefore, I refactored the hive partition columns to be in the metadata instead of virtual (for now, this is just a poc that seems to work ok). I think it also makes more sense as it is actually part of the schema, specially if we have `partition_columns_in_data_file=1`

**BUT** in case it is ok to have the partition columns existing both in virtual columns and storage, then I guess this refactoring is not needed?

### Backwards compatibility with `use_hive_partitioning`

This setting is not optimal imo, and the name might be confusing. Still, I believe I had to make it backwards compatible. So, here's how `use_hive_partitioning` and `partition_strategy` interact:

`partition_strategy='hive'` is the dominant control. It is like a request for hive style. It'll allow native reads and writes in hive-style format. The `use_hive_partitioning` setting should have no effect here.

On the other hand, `use_hive_partitioning` is more like a hint for reads that only work if the path is globbed. It should work just like it does nowadays.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Object storage hive partitioned writes with snowflake id #820

Refactoring: Treating hive columns as actual columns instead of virtual

Backwards compatibility with `use_hive_partitioning`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Object storage hive partitioned writes with snowflake id #820

Description

Refactoring: Treating hive columns as actual columns instead of virtual

Backwards compatibility with use_hive_partitioning

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Backwards compatibility with `use_hive_partitioning`