Skip to content

Clarify DataFusion similarities and differences with duckdb, pola.rs and other similar systems #5498

@alamb

Description

@alamb

Please comment if you have any thoughts on these ideas:

I think it would be good to update the text here: https://github.com/apache/arrow-datafusion/blob/main/README.md#comparisons-with-other-projects

In terms of competition / optics of DuckDB vs DataFusion (vs Pola.rs) -- I think the best approach is to define the areas each is best at rather than try to "compete" head to head. I would be quite happy to have comparable performance with DuckDB (not faster) and pola.rs

Some thoughts on the benefits of DataFusion where it has clear differentiation:

  1. Target audience is different (developers rather than end users / data scientists)
  2. Designed to be embedded (rather than designed to be a file based sql engine)
  3. Community / ASF (rather than being tightly controlled in Amsterdam)
  4. Rust implementation (all the cool kids want Rust, I hear!)

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions