Skip to content

Enable reading StringView by default from Parquet (schema_force_string_view) by default #11682

Closed
@alamb

Description

@alamb

Part of #11752

Is your feature request related to a problem or challenge?

As part of #10918, @XiangpengHao has threaded the use of StringView through parquet, arrow-rs and then into DataFusion

When the datafusion.execution.parquet.schema_force_string_view option is enabled, the DataFusion Parquet reader will read all Utf8 columns as StringView instead, which results in significantly faster performance (details TBD but we will write it down in #11603 )

However, when initially merged #11667 this setting will be off by default

This ticket tracks what it would take to turn the setting on by default

Describe the solution you'd like

Change the default value of datafusion.execution.parquet.schema_force_string_view to true

Describe alternatives you've considered

Basically we should enable the flag by default and then run some benchmarks to ensure performance doesn't change by too much

Additional context

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions