-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
#17988 has added support for percentile_cont, and it is a commonly used function to inspect the distribution of a dataset. Currently, it only supports computing a single percentile per call.
This proposal extends percentile_cont to accept an array of percentile arguments, allowing multiple percentiles to be computed in a single aggregation. I think this feature is quite ergonomic and also makes execution more efficient.
DuckDB has already support it:
-- Calculating 1%, 5%, 10%, 50%, 90% percentile in one expression
D SELECT
percentile_cont([0.01, 0.05, 0.1, 0.5, 0.9])
WITHIN GROUP (ORDER BY l_orderkey)
FROM '/Users/yongting/data/tpch_sf1/lineitem.parquet';
┌───────────────────────────────────────────────────────────────────────────────┐
│ quantile_cont(main.list_value(0.01, 0.05, 0.1, 0.5, 0.9) ORDER BY l_orderkey) │
│ double[] │
├───────────────────────────────────────────────────────────────────────────────┤
│ [59815.0, 300256.0, 599552.0, 3000961.0, 5400486.0] │
└───────────────────────────────────────────────────────────────────────────────┘
(Data is generated with https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request