Please comment if you have any thoughts on these ideas:
I think it would be good to update the text here: https://github.com/apache/arrow-datafusion/blob/main/README.md#comparisons-with-other-projects
In terms of competition / optics of DuckDB vs DataFusion (vs Pola.rs) -- I think the best approach is to define the areas each is best at rather than try to "compete" head to head. I would be quite happy to have comparable performance with DuckDB (not faster) and pola.rs
Some thoughts on the benefits of DataFusion where it has clear differentiation:
- Target audience is different (developers rather than end users / data scientists)
- Designed to be embedded (rather than designed to be a file based sql engine)
- Community / ASF (rather than being tightly controlled in Amsterdam)
- Rust implementation (all the cool kids want Rust, I hear!)