Description
Since sorting and I assume ranking will be part of stdlib, here is some related functionality to consider. In another issue I mentioned the median. The median is a special case of a trimmed mean, where you trim the top N and bottom N values for a vector with 2N+1 or 2N+2 elements and compute the mean of the remaining 1 or 2 values. Trimmed means are a commonly used robust statistic -- one computes the mean after removing top N and bottom N. So a function
function trimmed_mean(x,ntrim_high,ntrim_low) result(xmean)
real, intent(in) :: x
integer, intent(in) :: ntrim_high,ntrim_low
real :: xmean ! mean after removing ntrim_high and ntrim_low values
end function trimmed_mean
could be created. In quantitative finance, insurance, and probably other domains, you often want to know what the mean of N largest or smallest observations were. A complement to the function above would be
function extreme_mean(x,nhigh,nlow) result(xmean)
real, intent(in) :: x
integer, intent(in), optional :: nhigh,nlow
real :: xmean ! mean of nhigh and nlow values
end function extreme_mean
Usually only one of nhigh and nlow would be specified by the user, but the case where both are specified can also be handled. In quantitative finance, the mean of the N lowest returns of a sample is an estimator of the Conditional Value at Risk.
The most commonly computed average is just the sample mean, which is in stdlib. Quantities such as the trimmed mean are less trivial to program and would perhaps provide a greater benefit by being in stdlib.
The descriptive statistics section has var to compute variance, which is the square of the standard deviation. A robust alternative to standard deviation to measure the spread of data is the interquartile range (IQR) (the difference of 75th and 25th percentiles). Having percentiles and IQR in stdlib would be nice. If you ask R to summarize 1000 normal variates, it says
> x = rnorm(1000)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.84938 -0.65930 0.06237 0.06395 0.75097 3.22734
So statisticians think percentile measures are as important as the standard deviation to understand spread.