forked from h2oai/db-benchmark
-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
Thank you DuckDB team for keeping this benchmark going!!!
I see there are a lot of variations on group bys and joins, however, I think it would be highly beneficial to incorporate additional data wrangling methods. A few that come to mind, but others should add to this list, includes:
- Unions
- Subsetting data
- Sampling data
- Rolling joins (see data.table)
- Pivots long and wide
- Rolling / windowing operations by groups over time, such as lags and moving averages
- Differencing data by groups based on a time column
- Updating records in a data frame / table
- Categorical encoding methods: target encoding, James-Stein encoding
- Column type conversions
I believe a broader set of operations serves a several purposes. For one, I would like to know if a particular framework can actually do the operation. Secondly, I would like to see benchmarks on their performance. Lastly, I think it would a huge community benefit to see what the actual code ends up looking like to get the greatest performance, which isn't always available through documentation or stackoverflow.
Thanks in advance,
Adrian
Metadata
Metadata
Assignees
Labels
No labels