Skip to content

Bulk quantiles #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 84 commits into from
Apr 6, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
06b82be
Promoted module to directory
LukeMathWalker Feb 2, 2019
47c1696
Moved interpolate to separate file
LukeMathWalker Feb 2, 2019
8f1e7cd
Re-implemented quantile_axis_mut to get closer to something we can us…
LukeMathWalker Feb 2, 2019
c81f6be
Use a set instead of a vec to avoid repeating computations
LukeMathWalker Feb 2, 2019
7aee452
Use bulk method for single quantile
LukeMathWalker Feb 2, 2019
745e45b
Implement bulk method to get sorted
LukeMathWalker Feb 4, 2019
74eda81
Refactored quantiles_axis_mut to use sorted_get_many_mut
LukeMathWalker Feb 6, 2019
93531de
Avoid recomputing index value
LukeMathWalker Feb 6, 2019
c00620d
Add quantiles_mut to 1d trait
LukeMathWalker Feb 6, 2019
a7111e9
Return hashmaps from bulk methods
LukeMathWalker Feb 9, 2019
36284d2
Fixed tests
LukeMathWalker Feb 9, 2019
fc56ca4
Use IndexSet to preserve insertion order
LukeMathWalker Feb 9, 2019
67a4477
Fix indentation
LukeMathWalker Feb 9, 2019
ac0ca03
IndexMap provides a more intuitive behaviour
LukeMathWalker Feb 9, 2019
a4c1508
Remove prints
LukeMathWalker Feb 9, 2019
aa3a157
Renamed methods
LukeMathWalker Feb 9, 2019
2ea9233
Docs for get_many_from_sorted_mut
LukeMathWalker Feb 9, 2019
12a7944
Added docs for private free function
LukeMathWalker Feb 9, 2019
ac93a1e
Docs for quantiles_mut
LukeMathWalker Feb 9, 2019
c408c67
Fixed several typos in docs
LukeMathWalker Feb 9, 2019
c471955
More robust test
LukeMathWalker Feb 11, 2019
1411f15
Added test for quantiles
LukeMathWalker Feb 11, 2019
48f2bf0
Test quantiles_axis_mut
LukeMathWalker Feb 11, 2019
c27feb1
Add comments
LukeMathWalker Feb 11, 2019
00e14f7
Return options when the lane we are computing against is empty
LukeMathWalker Feb 11, 2019
846c336
Fixed docs
LukeMathWalker Feb 11, 2019
ab8d701
Fixed tests
LukeMathWalker Feb 11, 2019
8b38345
Move *index* functions out of Interpolate trait
jturner314 Mar 9, 2019
5771514
Reduce indentation in quantiles_axis_mut
jturner314 Mar 9, 2019
e0eb686
Reduce indentation in quantile_axis_skipnan_mut
jturner314 Mar 9, 2019
6c51145
Use .into_scalar() method
jturner314 Mar 9, 2019
30c3466
Improve docs of partition_mut
jturner314 Mar 9, 2019
dca9c7b
Reformat quantiles_axis_mut
jturner314 Mar 9, 2019
92f08a4
Cargo fmt
LukeMathWalker Mar 10, 2019
35d2094
Fmt
LukeMathWalker Mar 10, 2019
1021507
Formatting
LukeMathWalker Mar 10, 2019
c2ed805
Log version works
LukeMathWalker Mar 12, 2019
9dc5eef
Refactor
LukeMathWalker Mar 15, 2019
c49ad04
Fix indexes
LukeMathWalker Mar 15, 2019
cf7b362
Working implementation
LukeMathWalker Mar 16, 2019
cb1d9f8
Shorter syntax
LukeMathWalker Mar 16, 2019
75d7d55
Formatting
LukeMathWalker Mar 16, 2019
ca951cf
Better docs
LukeMathWalker Mar 16, 2019
45e84cd
Comments
LukeMathWalker Mar 16, 2019
46a6834
Typo
LukeMathWalker Mar 16, 2019
01e794c
Don't lose pieces after rebase
LukeMathWalker Mar 16, 2019
0c70bbb
Fmt
LukeMathWalker Mar 16, 2019
1ba922a
Reduce code duplication
LukeMathWalker Mar 17, 2019
d5ab45c
Fmt
LukeMathWalker Mar 18, 2019
1d8c671
Merge branch 'master' into bulk-quantiles
LukeMathWalker Mar 26, 2019
7b4e0de
Clarify docs of get_many_from_sorted_mut_unchecked
jturner314 Apr 1, 2019
64ed72b
Add get_many_from_sorted_mut benchmark
jturner314 Apr 1, 2019
2c90309
Add get_from_sorted_mut benchmark
jturner314 Apr 1, 2019
3a4ea2e
Simplify get_many_from_sorted_mut_unchecked
jturner314 Apr 1, 2019
e5c9474
Eliminate allocations from _get_many_from_sorted_mut_unchecked
jturner314 Apr 1, 2019
24ee710
Call slice_axis_mut instead of slice_mut
jturner314 Apr 1, 2019
8739c3b
Replace iter::repeat with vec!
jturner314 Apr 1, 2019
88d896f
Fix typo in comment
jturner314 Apr 1, 2019
29d507b
Remove unnecessary type annotation
jturner314 Apr 1, 2019
d0879c8
Simplify quantiles tests
jturner314 Apr 1, 2019
54c11be
Check keys in test_sorted_get_many_mut
jturner314 Apr 1, 2019
847fcd5
Simplify sort tests
jturner314 Apr 1, 2019
c6f762a
Improve sort and quantiles docs
jturner314 Apr 1, 2019
1685095
Make Interpolate::interpolate operate elementwise
jturner314 Apr 2, 2019
e965e85
Make quantiles_* return Array instead of IndexMap
jturner314 Apr 2, 2019
cfc408f
Add interpolate parameter to quantile*
jturner314 Apr 2, 2019
b5d8a08
Make get_many_from_sorted_mut take array of indexes
jturner314 Apr 2, 2019
00a21c0
Make quantiles* take array instead of slice
jturner314 Apr 2, 2019
8f9f0b6
Remove unnecessary IndexSet
jturner314 Apr 2, 2019
a4e8c5d
Merge pull request #5 from jturner314/bulk-quantiles
LukeMathWalker Apr 2, 2019
beec7ae
Merge master
LukeMathWalker Apr 2, 2019
5ff4430
Return EmptyInput instead of None
LukeMathWalker Apr 2, 2019
ca9f3db
Fix tests
LukeMathWalker Apr 2, 2019
7ca6b7f
Match output type for argmin/max_skipnan
LukeMathWalker Apr 2, 2019
22cbfbb
Fix tests
LukeMathWalker Apr 2, 2019
56906cf
Fmt
LukeMathWalker Apr 2, 2019
950cd44
Update src/lib.rs
jturner314 Apr 5, 2019
37b3b19
Add quantile error
LukeMathWalker Apr 5, 2019
1f37d44
Renamed InvalidFraction to InvalidQuantile
LukeMathWalker Apr 5, 2019
1e9ba18
Return QuantileError
LukeMathWalker Apr 5, 2019
caad47d
Fix tests
LukeMathWalker Apr 5, 2019
fab842c
Fix docs
LukeMathWalker Apr 5, 2019
a32d9a8
Fmt
LukeMathWalker Apr 5, 2019
a315f70
Simplify and deduplicate
LukeMathWalker Apr 6, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,15 @@ noisy_float = "0.1.8"
num-traits = "0.2"
rand = "0.6"
itertools = { version = "0.7.0", default-features = false }
indexmap = "1.0"

[dev-dependencies]
quickcheck = "0.7"
criterion = "0.2"
quickcheck = { version = "0.8.1", default-features = false }
ndarray-rand = "0.9"
approx = "0.3"
quickcheck_macros = "0.8"

[[bench]]
name = "sort"
harness = false
67 changes: 67 additions & 0 deletions benches/sort.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
extern crate criterion;
extern crate ndarray;
extern crate ndarray_stats;
extern crate rand;

use criterion::{
black_box, criterion_group, criterion_main, AxisScale, BatchSize, Criterion,
ParameterizedBenchmark, PlotConfiguration,
};
use ndarray::prelude::*;
use ndarray_stats::Sort1dExt;
use rand::prelude::*;

fn get_from_sorted_mut(c: &mut Criterion) {
let lens = vec![10, 100, 1000, 10000];
let benchmark = ParameterizedBenchmark::new(
"get_from_sorted_mut",
|bencher, &len| {
let mut rng = StdRng::seed_from_u64(42);
let mut data: Vec<_> = (0..len).collect();
data.shuffle(&mut rng);
let indices: Vec<_> = (0..len).step_by(len / 10).collect();
bencher.iter_batched(
|| Array1::from(data.clone()),
|mut arr| {
for &i in &indices {
black_box(arr.get_from_sorted_mut(i));
}
},
BatchSize::SmallInput,
)
},
lens,
)
.plot_config(PlotConfiguration::default().summary_scale(AxisScale::Logarithmic));
c.bench("get_from_sorted_mut", benchmark);
}

fn get_many_from_sorted_mut(c: &mut Criterion) {
let lens = vec![10, 100, 1000, 10000];
let benchmark = ParameterizedBenchmark::new(
"get_many_from_sorted_mut",
|bencher, &len| {
let mut rng = StdRng::seed_from_u64(42);
let mut data: Vec<_> = (0..len).collect();
data.shuffle(&mut rng);
let indices: Vec<_> = (0..len).step_by(len / 10).collect();
bencher.iter_batched(
|| Array1::from(data.clone()),
|mut arr| {
black_box(arr.get_many_from_sorted_mut(&indices));
},
BatchSize::SmallInput,
)
},
lens,
)
.plot_config(PlotConfiguration::default().summary_scale(AxisScale::Logarithmic));
c.bench("get_many_from_sorted_mut", benchmark);
}

criterion_group! {
name = benches;
config = Criterion::default();
targets = get_from_sorted_mut, get_many_from_sorted_mut
}
criterion_main!(benches);
29 changes: 29 additions & 0 deletions src/errors.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
//! Custom errors returned from our methods and functions.
use noisy_float::types::N64;
use std::error::Error;
use std::fmt;

Expand Down Expand Up @@ -112,3 +113,31 @@ impl From<ShapeMismatch> for MultiInputError {
MultiInputError::ShapeMismatch(err)
}
}

/// An error computing a quantile.
#[derive(Debug, Clone, Eq, PartialEq)]
pub enum QuantileError {
/// The input was empty.
EmptyInput,
/// The `q` was not between `0.` and `1.` (inclusive).
InvalidQuantile(N64),
}

impl fmt::Display for QuantileError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
QuantileError::EmptyInput => write!(f, "Empty input."),
QuantileError::InvalidQuantile(q) => {
write!(f, "{:} is not between 0. and 1. (inclusive).", q)
}
}
}
}

impl Error for QuantileError {}

impl From<EmptyInput> for QuantileError {
fn from(_: EmptyInput) -> QuantileError {
QuantileError::EmptyInput
}
}
5 changes: 3 additions & 2 deletions src/histogram/strategies.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ use super::errors::BinsBuildError;
use super::{Bins, Edges};
use ndarray::prelude::*;
use ndarray::Data;
use noisy_float::types::n64;
use num_traits::{FromPrimitive, NumOps, Zero};

/// A trait implemented by all strategies to build [`Bins`]
Expand Down Expand Up @@ -334,8 +335,8 @@ where
}

let mut a_copy = a.to_owned();
let first_quartile = a_copy.quantile_mut::<Nearest>(0.25).unwrap();
let third_quartile = a_copy.quantile_mut::<Nearest>(0.75).unwrap();
let first_quartile = a_copy.quantile_mut(n64(0.25), &Nearest).unwrap();
let third_quartile = a_copy.quantile_mut(n64(0.75), &Nearest).unwrap();
let iqr = third_quartile - first_quartile;

let bin_width = FreedmanDiaconis::compute_bin_width(n_points, iqr);
Expand Down
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
//! [`NumPy`]: https://docs.scipy.org/doc/numpy-1.14.1/reference/routines.statistics.html
//! [`StatsBase.jl`]: https://juliastats.github.io/StatsBase.jl/latest/

extern crate indexmap;
extern crate itertools;
extern crate ndarray;
extern crate noisy_float;
Expand Down
138 changes: 138 additions & 0 deletions src/quantile/interpolate.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
//! Interpolation strategies.
use noisy_float::types::N64;
use num_traits::{Float, FromPrimitive, NumOps, ToPrimitive};

fn float_quantile_index(q: N64, len: usize) -> N64 {
q * ((len - 1) as f64)
}

/// Returns the fraction that the quantile is between the lower and higher indices.
///
/// This ranges from 0, where the quantile exactly corresponds the lower index,
/// to 1, where the quantile exactly corresponds to the higher index.
fn float_quantile_index_fraction(q: N64, len: usize) -> N64 {
float_quantile_index(q, len).fract()
}

/// Returns the index of the value on the lower side of the quantile.
pub(crate) fn lower_index(q: N64, len: usize) -> usize {
float_quantile_index(q, len).floor().to_usize().unwrap()
}

/// Returns the index of the value on the higher side of the quantile.
pub(crate) fn higher_index(q: N64, len: usize) -> usize {
float_quantile_index(q, len).ceil().to_usize().unwrap()
}

/// Used to provide an interpolation strategy to [`quantile_axis_mut`].
///
/// [`quantile_axis_mut`]: ../trait.QuantileExt.html#tymethod.quantile_axis_mut
pub trait Interpolate<T> {
/// Returns `true` iff the lower value is needed to compute the
/// interpolated value.
#[doc(hidden)]
fn needs_lower(q: N64, len: usize) -> bool;

/// Returns `true` iff the higher value is needed to compute the
/// interpolated value.
#[doc(hidden)]
fn needs_higher(q: N64, len: usize) -> bool;

/// Computes the interpolated value.
///
/// **Panics** if `None` is provided for the lower value when it's needed
/// or if `None` is provided for the higher value when it's needed.
#[doc(hidden)]
fn interpolate(lower: Option<T>, higher: Option<T>, q: N64, len: usize) -> T;
}

/// Select the higher value.
pub struct Higher;
/// Select the lower value.
pub struct Lower;
/// Select the nearest value.
pub struct Nearest;
/// Select the midpoint of the two values (`(lower + higher) / 2`).
pub struct Midpoint;
/// Linearly interpolate between the two values
/// (`lower + (higher - lower) * fraction`, where `fraction` is the
/// fractional part of the index surrounded by `lower` and `higher`).
pub struct Linear;

impl<T> Interpolate<T> for Higher {
fn needs_lower(_q: N64, _len: usize) -> bool {
false
}
fn needs_higher(_q: N64, _len: usize) -> bool {
true
}
fn interpolate(_lower: Option<T>, higher: Option<T>, _q: N64, _len: usize) -> T {
higher.unwrap()
}
}

impl<T> Interpolate<T> for Lower {
fn needs_lower(_q: N64, _len: usize) -> bool {
true
}
fn needs_higher(_q: N64, _len: usize) -> bool {
false
}
fn interpolate(lower: Option<T>, _higher: Option<T>, _q: N64, _len: usize) -> T {
lower.unwrap()
}
}

impl<T> Interpolate<T> for Nearest {
fn needs_lower(q: N64, len: usize) -> bool {
float_quantile_index_fraction(q, len) < 0.5
}
fn needs_higher(q: N64, len: usize) -> bool {
!<Self as Interpolate<T>>::needs_lower(q, len)
}
fn interpolate(lower: Option<T>, higher: Option<T>, q: N64, len: usize) -> T {
if <Self as Interpolate<T>>::needs_lower(q, len) {
lower.unwrap()
} else {
higher.unwrap()
}
}
}

impl<T> Interpolate<T> for Midpoint
where
T: NumOps + Clone + FromPrimitive,
{
fn needs_lower(_q: N64, _len: usize) -> bool {
true
}
fn needs_higher(_q: N64, _len: usize) -> bool {
true
}
fn interpolate(lower: Option<T>, higher: Option<T>, _q: N64, _len: usize) -> T {
let denom = T::from_u8(2).unwrap();
let lower = lower.unwrap();
let higher = higher.unwrap();
lower.clone() + (higher.clone() - lower.clone()) / denom.clone()
}
}

impl<T> Interpolate<T> for Linear
where
T: NumOps + Clone + FromPrimitive + ToPrimitive,
{
fn needs_lower(_q: N64, _len: usize) -> bool {
true
}
fn needs_higher(_q: N64, _len: usize) -> bool {
true
}
fn interpolate(lower: Option<T>, higher: Option<T>, q: N64, len: usize) -> T {
let fraction = float_quantile_index_fraction(q, len).to_f64().unwrap();
let lower = lower.unwrap();
let higher = higher.unwrap();
let lower_f64 = lower.to_f64().unwrap();
let higher_f64 = higher.to_f64().unwrap();
lower.clone() + T::from_f64(fraction * (higher_f64 - lower_f64)).unwrap()
}
}
Loading