Skip to content

Add additional data wrangling methods #6

@AdrianAntico

Description

@AdrianAntico

Thank you DuckDB team for keeping this benchmark going!!!

I see there are a lot of variations on group bys and joins, however, I think it would be highly beneficial to incorporate additional data wrangling methods. A few that come to mind, but others should add to this list, includes:

  • Unions
  • Subsetting data
  • Sampling data
  • Rolling joins (see data.table)
  • Pivots long and wide
  • Rolling / windowing operations by groups over time, such as lags and moving averages
  • Differencing data by groups based on a time column
  • Updating records in a data frame / table
  • Categorical encoding methods: target encoding, James-Stein encoding
  • Column type conversions

I believe a broader set of operations serves a several purposes. For one, I would like to know if a particular framework can actually do the operation. Secondly, I would like to see benchmarks on their performance. Lastly, I think it would a huge community benefit to see what the actual code ends up looking like to get the greatest performance, which isn't always available through documentation or stackoverflow.

Thanks in advance,
Adrian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions