Skip to content

Blog post about datafusion 16 release #4804

@alamb

Description

@alamb

As part of the 16.0.0 release I would like to write a blog about datafusion on https://arrow.apache.org/blog/ (source at https://github.com/apache/arrow-site)

I am thinking about a basic theme like datafusion is leading the charge to bring advanced OLAP technology everywhere

I would like to highlight the theme summarized by Andy Pavlo in https://ottertune.com/blog/2022-databases-retrospective/

The long-term trend to watch is the proliferation of frameworks like Velox, DataFusion, and Polars. Along with projects like Substrait, the commoditization of these query execution components means that all OLAP DBMSs will be roughly equivalent in the next five years. Instead of building a new DBMS entirely from scratch or hard forking an existing system (e.g., how Firebolt forked Clickhouse), people are better off using an extensible framework like Velox. This means that every DBMS will have the same vectorized execution capabilities that were unique to Snowflake ten years ago. And since in the cloud, the storage layer is the same for everyone (e.g., Amazon controls EBS/S3), the critical differentiator between DBMS offerings will be things that are difficult to quantify, like UI/UX stuff and query optimization.

Some supporting evidence:

  • Several new databases built on datafusion (synnada.ai, greptimedb, probably others)
  • GA of InfluxDB IOx

New features:

  • Advanced Windowing functions (like unbounded windows)
  • Join support (TODO gather more details)
  • Optimizer advancements

Future directions:
1 .Improved grouping / sorting performance
2. RLE (Run End Encoding support
etc

Here is the most recent blog about datafusion I know about https://arrow.apache.org/blog/2022/10/25/datafusion-13.0.0 -- source at https://github.com/apache/arrow-site/blob/master/_posts/2022-10-25-datafusion-13.0.0.md

Please leave comments with your suggestions / ideas!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions