-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Is your feature request related to a problem or challenge?
Part of #11752
While working to enable StringView in #12092 I found that the columns when read as StringView
and BinaryView
do not take advantage of Bloom filters.
Specifically this code doesn't handle StringView
datafusion/datafusion/core/src/datasource/physical_plan/parquet/row_group_filter.rs
Lines 267 to 272 in a08f923
ScalarValue::Utf8(Some(v)) => sbbf.check(&v.as_str()), | |
ScalarValue::Binary(Some(v)) => sbbf.check(v), | |
ScalarValue::FixedSizeBinary(_size, Some(v)) => sbbf.check(v), | |
ScalarValue::Boolean(Some(v)) => sbbf.check(v), | |
ScalarValue::Float64(Some(v)) => sbbf.check(v), | |
ScalarValue::Float32(Some(v)) => sbbf.check(v), |
Describe the solution you'd like
Support applying parquet bloom filters to StringView columns
Describe alternatives you've considered
Basically:
- Make the code changes for bloom filters in Enable reading
StringViewArray
by default from Parquet #12092 - Write a test
In terms of testing, I think the easiest thing to do would be to follow the model of the existing tests for Utf8/Binary columns and pass the schema_force_view_types
config flag
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers