-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
“Write cool software and tell people about it” – Paul Dix @pauldix (Founder and CTO of InfluxData)
Call to action:
The DataFusion community has invested a lot in the cool software; Now is the time to do better on the “tell people about it” part.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
DataFusion is too difficult to learn for new users. See https://towardsdev.com/writing-a-data-pipeline-in-rust-with-datafusion-25b5e45410ca for one users experience, which is summarized here:
- The API documentations are pretty bad, the less frequently used function does not provide any document or incomplete document and lacks examples that how to use them. So I had to guess and try out many things to use some of the function such as to_timestamp date_part when(...).otherwise()
- Data Reading example also lacks example, there is only 2–3 example mentioned in the API doc. But you might need another way of reading the data. For example, first define the schema and then use the schema to read the data without inferring the scheme by the framework. But this in not in the doc
- There is no tutorial like example in the doc, this also true for Rust API for Polars DataFrame
- From user point of view, I think documentation is the weakest part of the Framework, on top of that rust is not that easy itself.
Describe the solution you'd like
TBD. This issue is an EPIC to track tasks to improve the situation.
User Guide
- Document all scalar SQL functions in user guide #3065
- Add DataFrame section to user guide #3066
- Update SQL reference in user guide to cover all supported syntax #3091
- Add documentation on querying against files in object store such as S3 #3399
- Clarify DataFusion similarities and differences with duckdb, pola.rs and other similar systems #5498
- Automate build and publish of the user guide #5500
- Consolidate README.md and main documentation site #5755
- Improve expressions.md #5977
- Document Window functions #6338
Rust Docs (docs.rs)
- Automate production of SQL and DataFrame references for SQL functions / Expressions #3092
- Include the latest rustdocs (e.g.
cargo doc
output on https://arrow.apache.org/datafusion/) #5981 - rustdoc searches don't find all relevant classes #6648
Developer/Contributor Guide
- Improve the docs for developers (Create a Developers / Hackers guide for DataFusion) #5501
- Create a presentation about DataFusion's architecture #5499
- Fix rustdoc errors #6042
Python Docs
Blog posts
- Blog post about datafusion 16 release #4804
- Write a blog about parquet predicate pushdown #3464
- Blog post with DataFusion Jan - April 2023 #5812
Older Issues To Be Reviewed
loic-sharma and liurenjie1024
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request