Robust statistics: trimmed mean and interquartile range

Since sorting and I assume ranking will be part of stdlib, here is some related functionality to consider. In another issue I mentioned the median. The median is a special case of a trimmed mean, where you trim the top N and bottom N values for a vector with 2N+1 or 2N+2 elements and compute the mean of the remaining 1 or 2 values. Trimmed means are a commonly used robust statistic -- one computes the mean after removing top N and bottom N. So a function

```
function trimmed_mean(x,ntrim_high,ntrim_low) result(xmean)
real, intent(in) :: x
integer, intent(in) :: ntrim_high,ntrim_low
real :: xmean ! mean after removing ntrim_high and ntrim_low values
end function trimmed_mean
```

could be created. In quantitative finance, insurance, and probably other domains, you often want to know what the mean of N largest or smallest observations were. A complement to the function above would be 

```
function extreme_mean(x,nhigh,nlow) result(xmean)
real, intent(in) :: x
integer, intent(in), optional :: nhigh,nlow
real :: xmean ! mean of nhigh and nlow values
end function extreme_mean
```
Usually only one of nhigh and nlow would be specified by the user, but the case where both are specified can also be handled. In quantitative finance, the mean of the N lowest returns of a sample is an estimator of the Conditional Value at Risk.

The most commonly computed average is just the sample mean, which is in stdlib. Quantities such as the trimmed mean are less trivial to program and would perhaps provide a greater benefit by being in stdlib.

The descriptive statistics section has var to compute variance, which is the square of the standard deviation. A robust alternative to standard deviation to measure the spread of data is the interquartile range (IQR) (the difference of 75th and 25th percentiles). Having percentiles and IQR in stdlib would be nice. If you ask R to summarize 1000 normal variates, it says

```
> x = rnorm(1000)
> summary(x)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.84938 -0.65930  0.06237  0.06395  0.75097  3.22734 
```

So statisticians think percentile measures are as important as the standard deviation to understand spread.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Robust statistics: trimmed mean and interquartile range #379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Robust statistics: trimmed mean and interquartile range #379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions