Add additional data wrangling methods

Thank you DuckDB team for keeping this benchmark going!!!

I see there are a lot of variations on group bys and joins, however, I think it would be highly beneficial to incorporate additional data wrangling methods. A few that come to mind,  but others should add to this list, includes:
- Unions
- Subsetting data
- Sampling data
- Rolling joins (see data.table)
- Pivots long and wide
- Rolling / windowing operations by groups over time, such as lags and moving averages
- Differencing data by groups based on a time column
- Updating records in a data frame / table
- Categorical encoding methods: target encoding, James-Stein encoding
- Column type conversions

I believe a broader set of operations serves a several purposes. For one, I would like to know if a particular framework can actually do the operation. Secondly, I would like to see benchmarks on their performance. Lastly, I think it would a huge community benefit to see what the actual code ends up looking like to get the greatest performance, which isn't always available through documentation or stackoverflow.

Thanks in advance,
Adrian



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add additional data wrangling methods #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add additional data wrangling methods #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions