Skip to content

Order statistics #461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
ee0d54d
Added function signature.
LukeMathWalker May 14, 2018
9b775d0
Determine the index of the desired element if the array were to be
LukeMathWalker May 14, 2018
a4bec9d
Added macro use for slice
LukeMathWalker May 14, 2018
470af6c
Barebone implementation of randomized_select with the signature of
LukeMathWalker May 14, 2018
f2eec96
Added rand as a dependency.
LukeMathWalker May 14, 2018
92d3433
Added extern crate rand to lib.rs.
LukeMathWalker May 14, 2018
8e25ae0
Basic implementation of randomized_partition first section. Missing
LukeMathWalker May 14, 2018
fc3e570
Using swap where needed, instead of manually doing the swap using clone.
LukeMathWalker May 15, 2018
0f5fb47
Cleaned comments and debugging prints.
LukeMathWalker May 15, 2018
177e03b
Fixed index overflow bug.
LukeMathWalker May 15, 2018
811e84f
Fixed issues with index overflow.
LukeMathWalker May 15, 2018
b618377
Remove useless type annotation.
LukeMathWalker May 15, 2018
8994eae
Refactored randomized_partition and partition to properly use
LukeMathWalker May 15, 2018
9359134
Restored previous version of randomized_select.
LukeMathWalker May 15, 2018
43d765e
Modified all function signatures to accept mutable views instead of
LukeMathWalker May 15, 2018
0dec699
Added macro_use to get azip! in numerics.
LukeMathWalker May 21, 2018
9315413
Added macro_use to macrozip to get azip! in numeric.
LukeMathWalker May 21, 2018
212d1b7
Added an implementation of percentile_axis that requires a mutable
LukeMathWalker May 21, 2018
ccc8b04
We need to increment the index by 1 to get the correct result
LukeMathWalker May 21, 2018
54f227a
Mapping passes i instead of i+1 to randomized_select, but we use ceil to
LukeMathWalker May 21, 2018
461e258
Added some documentation for percentile_axis_mut.
LukeMathWalker May 21, 2018
184d65d
Added an assert to prevent invalid q values.
LukeMathWalker May 21, 2018
e290bff
Changed end of recursion condition - it needs to check for n==1 not n…
LukeMathWalker May 21, 2018
911a656
Added a new test for the maximum, as well as testing it on arr1.
LukeMathWalker May 21, 2018
c57fb25
Split test into two more meaningful tests.
LukeMathWalker May 21, 2018
2107f56
Added more documentation
LukeMathWalker May 24, 2018
07bc655
Updated to version 0.5.0 of rand.
LukeMathWalker May 30, 2018
891fe95
Updated to use the proper way of generating random integers.
LukeMathWalker May 30, 2018
540c495
Updated docs.
LukeMathWalker May 30, 2018
e22f889
Updated docs.
LukeMathWalker May 30, 2018
54b1878
Changing tests for randomized_select - we want to get i to be the index
LukeMathWalker Jun 2, 2018
a54b696
Renamed randomized_select: now nth_mut. Modified all functions to mak…
LukeMathWalker Jun 2, 2018
337d389
nth_mut renamed ith_mut, to have coherence with argument naming strat…
LukeMathWalker Jun 2, 2018
14f1605
Improved variable naming.
LukeMathWalker Jun 2, 2018
10603a2
partition renamed partition_mut.
LukeMathWalker Jun 2, 2018
cbcf69a
Added documentation to partition. Renamed variable for better
LukeMathWalker Jun 2, 2018
4d71df7
Shortened function signature.
LukeMathWalker Jun 2, 2018
6e06f13
Using more expressive variable names in partition_mut.
LukeMathWalker Jun 2, 2018
31f0f28
Removed unnecessary auxiliary variable, to achieve more meaningful
LukeMathWalker Jun 2, 2018
77629ca
Updated test_partition into test_partition_mut.
LukeMathWalker Jun 2, 2018
eee88ce
Updated test_randomized_select into test_ith_mut.
LukeMathWalker Jun 2, 2018
5a29f47
Fixed bug in test_partition_mut.
LukeMathWalker Jun 2, 2018
310b53e
Fixed comparison to make documentation aligned to actual function
LukeMathWalker Jun 2, 2018
6351601
Fixed edge case
LukeMathWalker Jun 2, 2018
5defbcc
Moved tests to test module.
LukeMathWalker Jun 2, 2018
a3122e9
Fixed erroneous complexity indication. Added link to algorithm implem…
LukeMathWalker Jun 2, 2018
243f12c
Added link to algorithm description
LukeMathWalker Jun 2, 2018
2933b28
Merged
LukeMathWalker Jun 2, 2018
71ad410
Fixed unclosed braces.
LukeMathWalker Jun 2, 2018
517621c
Properly reorganized functions to be used as method whenever possible
LukeMathWalker Jun 2, 2018
08ccac9
Updated function calls in tests.
LukeMathWalker Jun 2, 2018
f9ed81e
Improved test formatting
LukeMathWalker Jun 2, 2018
fd11be0
Fixed doc typo.
LukeMathWalker Jun 5, 2018
e4325a1
Removed unnecessary referencing for cloning.
LukeMathWalker Jun 5, 2018
35dd35b
Removed unnecessary call to view_mut.
LukeMathWalker Jun 5, 2018
7475610
Added Hoare routine, with docs and proper reworking of ith_mut
LukeMathWalker Jun 11, 2018
5240811
Added some tests for hoare_partition_mut
LukeMathWalker Jun 11, 2018
ca2f3f9
Improved docs and tests
LukeMathWalker Jun 12, 2018
ddc0c65
Merge remote-tracking branch 'upstream/master' into order-statistics
LukeMathWalker Jun 15, 2018
3999031
Using map_axis_mut to simplify percentile_axis_mut
LukeMathWalker Jun 15, 2018
ee4eb05
Remove patch number specification for rand.
LukeMathWalker Jun 24, 2018
4be2146
Refactored ith_mut. Removed random_pivot.
LukeMathWalker Jun 24, 2018
37e90a9
Refactored partition_hoare_mut.
LukeMathWalker Jun 24, 2018
5145b8c
Renamed hoare_partition_mut to partition_mut.
LukeMathWalker Jun 24, 2018
630106b
Reworked docs.
LukeMathWalker Jun 24, 2018
d3cc1ac
Modified signature. Simplified index calculation
LukeMathWalker Jun 24, 2018
a6fecc7
Inlined closure.
LukeMathWalker Jun 24, 2018
90774bc
Modified docs to include NaN cases.
LukeMathWalker Jun 24, 2018
3a7f314
Fixing visibility typo.
LukeMathWalker Jun 24, 2018
22415cd
Fixed edge case
LukeMathWalker Jun 24, 2018
b1ce4cc
Merged tests. Changed test assertions
LukeMathWalker Jun 24, 2018
fcc9f82
Added test for edge case
LukeMathWalker Jun 24, 2018
7e1b503
Renamed ith_mut to sorted_get_mut
LukeMathWalker Jun 24, 2018
393b6bb
Added different interpolation strategies for the percentile index.
LukeMathWalker Jun 24, 2018
a9f148d
Renamed Highest to Higher, for coherence. Fixed docs, added docs to I…
LukeMathWalker Jun 24, 2018
0bd4c71
Fixed visibility of InterpolationStrategy
LukeMathWalker Jun 24, 2018
ded4e94
Updated tests to add new argument to percentile_axis_mut
LukeMathWalker Jun 24, 2018
c6c459d
Merge branch 'master' of https://github.com/bluss/ndarray into order-…
LukeMathWalker Sep 1, 2018
1ec4e42
Implementing interpolation strategies as trait bounds.
LukeMathWalker Sep 1, 2018
63bbb07
Providing some default methods.
LukeMathWalker Sep 1, 2018
3d1a196
Mysterious type annotation error.
LukeMathWalker Sep 1, 2018
10a72d0
Fixed.
LukeMathWalker Sep 2, 2018
f8b7713
Making Lower, Upper, Nearest public.
LukeMathWalker Sep 2, 2018
e7d1f61
Removed empty line.
LukeMathWalker Sep 2, 2018
cec96ca
Re-exporting functions, structs and traits.
LukeMathWalker Sep 2, 2018
c786eac
Fixed import.
LukeMathWalker Sep 2, 2018
744a427
Fixed test cases.
LukeMathWalker Sep 2, 2018
bdc8c3a
Updated docs.
LukeMathWalker Sep 2, 2018
b526445
Fixed docs.
LukeMathWalker Sep 2, 2018
1b39c61
Fixed warnings on unused variables.
LukeMathWalker Sep 2, 2018
00ddf37
Merge pull request #1 from LukeMathWalker/interpolation-strategy
LukeMathWalker Sep 2, 2018
e90555e
Implemented Midpoint. Refactored code to change types of variables.
LukeMathWalker Sep 2, 2018
1d094e8
Defined Linear.
LukeMathWalker Sep 2, 2018
f5f77ea
Added Interpolate implementation for Linear.
LukeMathWalker Sep 2, 2018
ff51479
Fixed docs!
LukeMathWalker Sep 2, 2018
15aa626
Merge pull request #2 from LukeMathWalker/add-more-interpolation-stra…
LukeMathWalker Sep 2, 2018
d4e7de7
Removed unused constraint on A.
LukeMathWalker Sep 2, 2018
caf58ee
Reducing the scope of search for upper_index element if lower_index was
LukeMathWalker Sep 2, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ num-traits = "0.2"
num-complex = "0.2"
rustc-serialize = { version = "0.3.20", optional = true }
itertools = { version = "0.7.0", default-features = false }
rand = { version = "0.5" }

# Use via the `blas` crate feature!
cblas-sys = { version = "0.1.4", optional = true, default-features = false }
Expand Down
95 changes: 94 additions & 1 deletion src/impl_1d.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
//! Methods for one-dimensional arrays.
use imp_prelude::*;

use rand::prelude::*;
use rand::thread_rng;

impl<A, S> ArrayBase<S, Ix1>
where S: Data<Elem=A>,
{
Expand All @@ -23,5 +26,95 @@ impl<A, S> ArrayBase<S, Ix1>
::iterators::to_vec(self.iter().map(|x| x.clone()))
}
}
}

/// Return the element that would occupy the `i`-th position if
/// the array were sorted in increasing order.
///
/// The array is shuffled **in place** to retrieve the desired element:
/// no copy of the array is allocated.
/// After the shuffling, all elements with an index smaller than `i`
/// are smaller than the desired element, while all elements with
/// an index greater or equal than `i` are greater than or equal
/// to the desired element.
///
/// No other assumptions should be made on the ordering of the
/// elements after this computation.
///
/// Complexity ([quickselect](https://en.wikipedia.org/wiki/Quickselect)):
/// - average case: O(`n`);
/// - worst case: O(`n`^2);
/// where n is the number of elements in the array.
///
/// **Panics** if `i` is greater than or equal to `n`.
pub fn sorted_get_mut(&mut self, i: usize) -> A
where A: Ord + Clone,
S: DataMut,
{
let n = self.len();
if n == 1 {
self[0].clone()
} else {
let mut rng = thread_rng();
let pivot_index = rng.gen_range(0, n);
let partition_index = self.partition_mut(pivot_index);
if i < partition_index {
self.slice_mut(s![..partition_index]).sorted_get_mut(i)
} else if i == partition_index {
self[i].clone()
} else {
self.slice_mut(s![partition_index+1..]).sorted_get_mut(i - (partition_index+1))
}
}
}

/// Return the index of `self[partition_index]` if `self` were to be sorted
/// in increasing order.
///
/// `self` elements are rearranged in such a way that `self[partition_index]`
/// is in the position it would be in an array sorted in increasing order.
/// All elements smaller than `self[partition_index]` are moved to its
/// left and all elements equal or greater than `self[partition_index]`
/// are moved to its right.
/// The ordering of the elements in the two partitions is undefined.
///
/// `self` is shuffled **in place** to operate the desired partition:
/// no copy of the array is allocated.
///
/// The method uses Hoare's partition algorithm.
/// Complexity: O(`n`), where `n` is the number of elements in the array.
/// Average number of element swaps: n/6 - 1/3 (see
/// (link)[https://cs.stackexchange.com/questions/11458/quicksort-partitioning-hoare-vs-lomuto/11550])
///
/// **Panics** if `partition_index` is greater than or equal to `n`.
pub fn partition_mut(&mut self, pivot_index: usize) -> usize
where A: Ord + Clone,
S: DataMut
{
let pivot_value = self[pivot_index].clone();
self.swap(pivot_index, 0);

let n = self.len();
let mut i = 1;
let mut j = n - 1;
loop {
loop {
if i > j { break }
if self[i] >= pivot_value { break }
i += 1;
}
while pivot_value <= self[j] {
if j == 1 { break }
j -= 1;
}
if i >= j {
break
} else {
self.swap(i, j);
i += 1;
j -= 1;
}
}
self.swap(0, i-1);
i-1
}
}
5 changes: 4 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ extern crate matrixmultiply;
extern crate num_traits as libnum;
extern crate num_complex;

extern crate rand;

#[cfg(feature = "docs")]
pub mod doc;

Expand Down Expand Up @@ -158,6 +160,7 @@ mod free_functions;
pub use free_functions::*;
pub use iterators::iter;

#[macro_use]
mod slice;
mod layout;
mod indexes;
Expand Down Expand Up @@ -893,7 +896,7 @@ impl<A, S, D> ArrayBase<S, D>
mod impl_1d;
mod impl_2d;

mod numeric;
pub mod numeric;

pub mod linalg;

Expand Down
201 changes: 198 additions & 3 deletions src/numeric/impl_numeric.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

use std::ops::Add;
use libnum::{self, Zero, Float};
use std::ops::{Add, Sub, Div, Mul};
use libnum::{self, Zero, Float, FromPrimitive};
use itertools::free::enumerate;

use imp_prelude::*;
Expand All @@ -19,6 +19,133 @@ use {
Zip,
};

/// Used to provide an interpolation strategy to [`percentile_axis_mut`].
///
/// [`percentile_axis_mut`]: struct.ArrayBase.html#method.percentile_axis_mut
pub trait Interpolate<T> {
fn float_percentile_index(q: f64, len: usize) -> f64 {
((len - 1) as f64) * q
}

fn lower_index(q: f64, len: usize) -> usize {
Self::float_percentile_index(q, len).floor() as usize
}

fn upper_index(q: f64, len: usize) -> usize {
Self::float_percentile_index(q, len).ceil() as usize
}

fn float_percentile_index_fraction(q: f64, len: usize) -> f64 {
Self::float_percentile_index(q, len) - (Self::lower_index(q, len) as f64)
}

fn needs_lower(q: f64, len: usize) -> bool;
fn needs_upper(q: f64, len: usize) -> bool;
fn interpolate<D>(lower: Option<Array<T, D>>,
upper: Option<Array<T, D>>,
q: f64,
len: usize) -> Array<T, D>
where D: Dimension;
}

pub struct Upper;
pub struct Lower;
pub struct Nearest;
pub struct Midpoint;
pub struct Linear;

impl<T> Interpolate<T> for Upper {
fn needs_lower(_q: f64, _len: usize) -> bool {
false
}
fn needs_upper(_q: f64, _len: usize) -> bool {
true
}
fn interpolate<D>(_lower: Option<Array<T, D>>,
upper: Option<Array<T, D>>,
_q: f64,
_len: usize) -> Array<T, D> {
upper.unwrap()
}
}

impl<T> Interpolate<T> for Lower {
fn needs_lower(_q: f64, _len: usize) -> bool {
true
}
fn needs_upper(_q: f64, _len: usize) -> bool {
false
}
fn interpolate<D>(lower: Option<Array<T, D>>,
_upper: Option<Array<T, D>>,
_q: f64,
_len: usize) -> Array<T, D> {
lower.unwrap()
}
}

impl<T> Interpolate<T> for Nearest {
fn needs_lower(q: f64, len: usize) -> bool {
let lower = <Self as Interpolate<T>>::lower_index(q, len);
((lower as f64) - <Self as Interpolate<T>>::float_percentile_index(q, len)) <= 0.
}
fn needs_upper(q: f64, len: usize) -> bool {
!<Self as Interpolate<T>>::needs_lower(q, len)
}
fn interpolate<D>(lower: Option<Array<T, D>>,
upper: Option<Array<T, D>>,
q: f64,
len: usize) -> Array<T, D> {
if <Self as Interpolate<T>>::needs_lower(q, len) {
lower.unwrap()
} else {
upper.unwrap()
}
}
}

impl<T> Interpolate<T> for Midpoint
where T: Add<T, Output = T> + Div<T, Output = T> + Clone + FromPrimitive
{
fn needs_lower(_q: f64, _len: usize) -> bool {
true
}
fn needs_upper(_q: f64, _len: usize) -> bool {
true
}
fn interpolate<D>(lower: Option<Array<T, D>>,
upper: Option<Array<T, D>>,
_q: f64, _len: usize) -> Array<T, D>
where D: Dimension
{
let denom = T::from_u8(2).unwrap();
(lower.unwrap() + upper.unwrap()).mapv_into(|x| x / denom.clone())
}
}

impl<T> Interpolate<T> for Linear
where T: Add<T, Output = T> + Sub<T, Output = T> + Mul<T, Output = T> + Clone + FromPrimitive
{
fn needs_lower(_q: f64, _len: usize) -> bool {
true
}
fn needs_upper(_q: f64, _len: usize) -> bool {
true
}
fn interpolate<D>(lower: Option<Array<T, D>>,
upper: Option<Array<T, D>>,
q: f64, len: usize) -> Array<T, D>
where D: Dimension
{
let fraction = T::from_f64(
<Self as Interpolate<T>>::float_percentile_index_fraction(q, len)
).unwrap();
let a = lower.unwrap().mapv_into(|x| x * fraction.clone());
let b = upper.unwrap().mapv_into(|x| x * (T::from_u8(1).unwrap() - fraction.clone()));
a + b
}
}

/// Numerical methods for arrays.
impl<A, S, D> ArrayBase<S, D>
where S: Data<Elem=A>,
Expand Down Expand Up @@ -115,6 +242,75 @@ impl<A, S, D> ArrayBase<S, D>
sum / &aview0(&cnt)
}

/// Return the qth percentile of the data along the specified axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, put an empty line after the short description on the first line of doc comments. It doesn't really make a significant difference for instance methods like this, but it does affect the docs of implementors of trait methods, and it's nice to be consistent. See here for an example of the effect of an empty line. Note the "Read more" text.

///
/// `q` needs to be a float between 0 and 1, bounds included.
/// The qth percentile for a 1-dimensional lane of length `N` is defined
/// as the element that would be indexed as `(N-1)q` if the lane were to be sorted
/// in increasing order.
/// If `(N-1)q` is not an integer the desired percentile lies between
/// two data points: we return the lower, nearest, higher or interpolated
/// value depending on the type `Interpolate` bound `I`.
///
/// Some examples:
/// - `q=0.` returns the minimum along each 1-dimensional lane;
/// - `q=0.5` returns the median along each 1-dimensional lane;
/// - `q=1.` returns the maximum along each 1-dimensional lane.
/// (`q=0` and `q=1` are considered improper percentiles)
///
/// The array is shuffled **in place** along each 1-dimensional lane in
/// order to produce the required percentile without allocating a copy
/// of the original array. Each 1-dimensional lane is shuffled independently
/// from the others.
/// No assumptions should be made on the ordering of the array elements
/// after this computation.
///
/// Complexity ([quickselect](https://en.wikipedia.org/wiki/Quickselect)):
/// - average case: O(`m`);
/// - worst case: O(`m`^2);
/// where `m` is the number of elements in the array.
///
/// **Panics** if `axis` is out of bounds or if `q` is not between
/// `0.` and `1.` (inclusive).
pub fn percentile_axis_mut<I>(&mut self, axis: Axis, q: f64) -> Array<A, D::Smaller>
where D: RemoveAxis,
A: Ord + Clone,
S: DataMut,
I: Interpolate<A>,
{
assert!((0. <= q) && (q <= 1.));
let mut lower = None;
let mut upper = None;
let axis_len = self.len_of(axis);
if I::needs_lower(q, axis_len) {
let lower_index = I::lower_index(q, axis_len);
lower = Some(
self.map_axis_mut(
axis,
|mut x| x.sorted_get_mut(lower_index)
)
);
if I::needs_upper(q, axis_len) {
let upper_index = I::upper_index(q, axis_len);
let relative_upper_index = upper_index - lower_index;
upper = Some(
self.map_axis_mut(
axis,
|mut x| x.slice_mut(s![lower_index..]).sorted_get_mut(relative_upper_index)
)
);
};
} else {
upper = Some(
self.map_axis_mut(
axis,
|mut x| x.sorted_get_mut(I::upper_index(q, axis_len))
)
);
};
I::interpolate(lower, upper, q, axis_len)
}

/// Return variance along `axis`.
///
/// The variance is computed using the [Welford one-pass
Expand Down Expand Up @@ -201,4 +397,3 @@ impl<A, S, D> ArrayBase<S, D>
}).is_done()
}
}

2 changes: 1 addition & 1 deletion src/numeric/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
pub use self::impl_numeric::*;

mod impl_numeric;

Loading