-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
When I use the following code to attempt creating external tables using SQL and Rust API respectively, a "Corrupt footer" error occurs. Eventually, I discovered that this was because there were files of other formats in the partitioned
directory, but DataFusion was reading them as data files. I think this behavior is confusing, because the SQL specifies the Parquet format, and the Rust code also creates a new ParquetFormat
.
CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'partitioned/';
let mut opts = TableParquetOptions::default();
opts.set("pushdown_filters", "true").unwrap();
let format = ParquetFormat::new().with_options(opts);
let options = ListingOptions::new(Arc::new(format))
.with_table_partition_cols(vec![("A".to_owned(), DataType::Int32)]);
ctx.register_listing_table(
"hits",
"partitioned/",
options,
None,
None,
)
.await
Describe the solution you'd like
Currently, when creating ListingOptions using format, format.get_ext
is not called. I believe format.get_ext
should be used as the default file extension filter, which would also better align with the definition of FileFormat
trait. I you agree with me, I could submit a pr to make that change.
impl ListingOptions {
/// Creates an options instance with the given format
/// Default values:
/// - no file extension filter
/// - no input partition to discover
/// - one target partition
/// - stat collection
pub fn new(format: Arc<dyn FileFormat>) -> Self {
Self {
file_extension: String::new(),
format,
table_partition_cols: vec![],
collect_stat: true,
target_partitions: 1,
file_sort_order: vec![],
}
}
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request