Skip to content

Allow DynamicFileCatalog support to query partitioned file #12671

@goldmedal

Description

@goldmedal

Is your feature request related to a problem or challenge?

#11035 supports to query files through their URL. If the target dataset is partitioned, DynamicFileCatalog can't recognize the partitioned columns well.
Given the file structure like:

partitioned_table
     c_date=2018-11-13
         data.csv 
     c_date=2018-12-13
         data.csv

If we tried to query it through the dynamic file catalog

    let sql = "SELECT * FROM 'datafusion/core/tests/data/partitioned_table'";
    let session_config =
        SessionConfig::new().set_str("datafusion.catalog.has_header", "false");
    let state = SessionStateBuilder::default()
        .with_default_features()
        .with_config(session_config)
        .build();
    let ctx = SessionContext::new_with_state(state).enable_url_table();
    ctx.sql(sql).await?.show().await?;

The result is

+----------+-------------------------+
| column_1 | column_2                |
+----------+-------------------------+
| Jorge    | 2018-12-13T12:12:10.011 |
| Andrew   | 2018-11-13T17:11:10.011 |
| Jorge    | 2018-12-13T12:12:10.011 |
| Andrew   | 2018-11-13T17:11:10.011 |
+----------+-------------------------+

The partitioned column c_date won't be used.

Describe the solution you'd like

When inferring the ListingTableConfig, we can register the table partition column automatically. I think we can invoke ListingOption::infer_partitions to infer the required partition columns at the runtime.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions