Skip to content

Support array argument for percentile_cont aggregate function #18600

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

#17988 has added support for percentile_cont, and it is a commonly used function to inspect the distribution of a dataset. Currently, it only supports computing a single percentile per call.

This proposal extends percentile_cont to accept an array of percentile arguments, allowing multiple percentiles to be computed in a single aggregation. I think this feature is quite ergonomic and also makes execution more efficient.

DuckDB has already support it:

-- Calculating 1%, 5%, 10%, 50%, 90% percentile in one expression
D SELECT
    percentile_cont([0.01, 0.05, 0.1, 0.5, 0.9])
    WITHIN GROUP (ORDER BY l_orderkey)
  FROM '/Users/yongting/data/tpch_sf1/lineitem.parquet';
┌───────────────────────────────────────────────────────────────────────────────┐
│ quantile_cont(main.list_value(0.01, 0.05, 0.1, 0.5, 0.9) ORDER BY l_orderkey) │
│                                   double[]                                    │
├───────────────────────────────────────────────────────────────────────────────┤
│ [59815.0, 300256.0, 599552.0, 3000961.0, 5400486.0]                           │
└───────────────────────────────────────────────────────────────────────────────┘

(Data is generated with https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli)

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions