Skip to content

Consider using the FileFormat::get_ext when creating ListingOptions #12378

@waruto210

Description

@waruto210

Is your feature request related to a problem or challenge?

When I use the following code to attempt creating external tables using SQL and Rust API respectively, a "Corrupt footer" error occurs. Eventually, I discovered that this was because there were files of other formats in the partitioned directory, but DataFusion was reading them as data files. I think this behavior is confusing, because the SQL specifies the Parquet format, and the Rust code also creates a new ParquetFormat.

CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'partitioned/';
  let mut opts = TableParquetOptions::default();
  opts.set("pushdown_filters", "true").unwrap();
  let format = ParquetFormat::new().with_options(opts);
  let options = ListingOptions::new(Arc::new(format))
      .with_table_partition_cols(vec![("A".to_owned(), DataType::Int32)]);
  ctx.register_listing_table(
      "hits",
      "partitioned/",
      options,
      None,
      None,
  )
  .await

Describe the solution you'd like

Currently, when creating ListingOptions using format, format.get_ext is not called. I believe format.get_ext should be used as the default file extension filter, which would also better align with the definition of FileFormattrait. I you agree with me, I could submit a pr to make that change.

impl ListingOptions {
  /// Creates an options instance with the given format
  /// Default values:
  /// - no file extension filter
  /// - no input partition to discover
  /// - one target partition
  /// - stat collection
  pub fn new(format: Arc<dyn FileFormat>) -> Self {
      Self {
          file_extension: String::new(),
          format,
          table_partition_cols: vec![],
          collect_stat: true,
          target_partitions: 1,
          file_sort_order: vec![],
      }
  }

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions