Skip to content

Commit 783b527

Browse files
Joe Hammanfujiisoup
Joe Hamman
authored andcommitted
WIP: Feature/interpolate (#1640)
* initial interpolate commit first working version of interpolate method/module pep8 after merge master update interpolate, more pandas compat tests add interpolate_na to api encoding fix for py2 in dataarray.py working... checkin, roughed in interpolator classes move tests and some mods to apply_ufunc usage in interp_na add method to kwargs fixes for scipy and some docs cleanup scipy vs numpy interpolator selection cleanups add limit to interpolate_na bfill/ffill parallelized new interface with use_coordinate kwarg use partial function to wrap interpolate class a few fixes for ffill/bfill, placeholder for interpolate_at method add some tests fix test * fix to interpolate wrapper function * remove duplicate limit handling in ffill/bfill * tests are passing * more docs, more tests * backward compat and add benchmarks * skip tests for numpy versions before 1.12 * test fixes for py27 fixture * try reording decorators * minor reorg of travis to make the flake8 check useful * cleanup following @fujiisoup's comments * dataset missing methods, some more docs, and more scipy interpolators * workaround for parameterized tests that are skipped in missing.py module * requires_np112 for dataset interpolate test * remove req for np 112 * fix typo in docs * @requires_np112 for methods that use apply_ufunc in missing.py * reuse type in apply over vars with dim * rework the fill value convention for linear interpolation, no longer match pandas -- adjusted tests and docs to reflect this change * flake8
1 parent 6eac857 commit 783b527

File tree

10 files changed

+1150
-7
lines changed

10 files changed

+1150
-7
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,9 @@ install:
9393
- python xarray/util/print_versions.py
9494

9595
script:
96+
- git diff upstream/master xarray/**/*py | flake8 --diff --exit-zero || true
9697
- python -OO -c "import xarray"
9798
- py.test xarray --cov=xarray --cov-config ci/.coveragerc --cov-report term-missing --verbose $EXTRA_FLAGS
98-
- git diff upstream/master **/*py | flake8 --diff --exit-zero || true
9999

100100
after_success:
101101
- coveralls
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
from __future__ import absolute_import
2+
from __future__ import division
3+
from __future__ import print_function
4+
5+
import pandas as pd
6+
7+
try:
8+
import dask
9+
except ImportError:
10+
pass
11+
12+
import xarray as xr
13+
14+
from . import randn, requires_dask
15+
16+
17+
def make_bench_data(shape, frac_nan, chunks):
18+
vals = randn(shape, frac_nan)
19+
coords = {'time': pd.date_range('2000-01-01', freq='D',
20+
periods=shape[0])}
21+
da = xr.DataArray(vals, dims=('time', 'x', 'y'), coords=coords)
22+
23+
if chunks is not None:
24+
da = da.chunk(chunks)
25+
26+
return da
27+
28+
29+
def time_interpolate_na(shape, chunks, method, limit):
30+
if chunks is not None:
31+
requires_dask()
32+
da = make_bench_data(shape, 0.1, chunks=chunks)
33+
actual = da.interpolate_na(dim='time', method='linear', limit=limit)
34+
35+
if chunks is not None:
36+
actual = actual.compute()
37+
38+
39+
time_interpolate_na.param_names = ['shape', 'chunks', 'method', 'limit']
40+
time_interpolate_na.params = ([(3650, 200, 400), (100, 25, 25)],
41+
[None, {'x': 25, 'y': 25}],
42+
['linear', 'spline', 'quadratic', 'cubic'],
43+
[None, 3])
44+
45+
46+
def time_ffill(shape, chunks, limit):
47+
48+
da = make_bench_data(shape, 0.1, chunks=chunks)
49+
actual = da.ffill(dim='time', limit=limit)
50+
51+
if chunks is not None:
52+
actual = actual.compute()
53+
54+
55+
time_ffill.param_names = ['shape', 'chunks', 'limit']
56+
time_ffill.params = ([(3650, 200, 400), (100, 25, 25)],
57+
[None, {'x': 25, 'y': 25}],
58+
[None, 3])
59+
60+
61+
def time_bfill(shape, chunks, limit):
62+
63+
da = make_bench_data(shape, 0.1, chunks=chunks)
64+
actual = da.bfill(dim='time', limit=limit)
65+
66+
if chunks is not None:
67+
actual = actual.compute()
68+
69+
70+
time_bfill.param_names = ['shape', 'chunks', 'limit']
71+
time_bfill.params = ([(3650, 200, 400), (100, 25, 25)],
72+
[None, {'x': 25, 'y': 25}],
73+
[None, 3])

doc/api.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ Computation
148148
:py:attr:`~Dataset.count`
149149
:py:attr:`~Dataset.dropna`
150150
:py:attr:`~Dataset.fillna`
151+
:py:attr:`~Dataset.ffill`
152+
:py:attr:`~Dataset.bfill`
153+
:py:attr:`~Dataset.interpolate_na`
151154
:py:attr:`~Dataset.where`
152155

153156
**ndarray methods**:
@@ -299,6 +302,9 @@ Computation
299302
:py:attr:`~DataArray.count`
300303
:py:attr:`~DataArray.dropna`
301304
:py:attr:`~DataArray.fillna`
305+
:py:attr:`~DataArray.ffill`
306+
:py:attr:`~DataArray.bfill`
307+
:py:attr:`~DataArray.interpolate_na`
302308
:py:attr:`~DataArray.where`
303309

304310
**ndarray methods**:

doc/computation.rst

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,9 @@ Missing values
5959

6060
xarray objects borrow the :py:meth:`~xarray.DataArray.isnull`,
6161
:py:meth:`~xarray.DataArray.notnull`, :py:meth:`~xarray.DataArray.count`,
62-
:py:meth:`~xarray.DataArray.dropna` and :py:meth:`~xarray.DataArray.fillna` methods
63-
for working with missing data from pandas:
62+
:py:meth:`~xarray.DataArray.dropna`, :py:meth:`~xarray.DataArray.fillna`,
63+
:py:meth:`~xarray.DataArray.ffill`, and :py:meth:`~xarray.DataArray.bfill`
64+
methods for working with missing data from pandas:
6465

6566
.. ipython:: python
6667
@@ -70,10 +71,25 @@ for working with missing data from pandas:
7071
x.count()
7172
x.dropna(dim='x')
7273
x.fillna(-1)
74+
x.ffill()
75+
x.bfill()
7376
7477
Like pandas, xarray uses the float value ``np.nan`` (not-a-number) to represent
7578
missing values.
7679

80+
xarray objects also have an :py:meth:`~xarray.DataArray.interpolate_na` method
81+
for filling missing values via 1D interpolation.
82+
83+
.. ipython:: python
84+
85+
x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
86+
coords={'xx': xr.Variable('x', [0, 1, 1.1, 1.9, 3])})
87+
x.interpolate_na(dim='x', method='linear', use_coordinate='xx')
88+
89+
Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
90+
providing the ``use_coordinate`` keyword which facilitates a clear specification
91+
of which values to use as the index in the interpolation.
92+
7793
Aggregation
7894
===========
7995

xarray/core/dataarray.py

Lines changed: 94 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1228,6 +1228,97 @@ def fillna(self, value):
12281228
out = ops.fillna(self, value)
12291229
return out
12301230

1231+
def interpolate_na(self, dim=None, method='linear', limit=None,
1232+
use_coordinate=True,
1233+
**kwargs):
1234+
"""Interpolate values according to different methods.
1235+
1236+
Parameters
1237+
----------
1238+
dim : str
1239+
Specifies the dimension along which to interpolate.
1240+
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
1241+
'polynomial', 'barycentric', 'krog', 'pchip',
1242+
'spline', 'akima'}, optional
1243+
String indicating which method to use for interpolation:
1244+
1245+
- 'linear': linear interpolation (Default). Additional keyword
1246+
arguments are passed to ``numpy.interp``
1247+
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
1248+
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
1249+
method=='polynomial', the ``order`` keyword argument must also be
1250+
provided.
1251+
- 'barycentric', 'krog', 'pchip', 'spline', and `akima`: use their
1252+
respective``scipy.interpolate`` classes.
1253+
use_coordinate : boolean or str, default True
1254+
Specifies which index to use as the x values in the interpolation
1255+
formulated as `y = f(x)`. If False, values are treated as if
1256+
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
1257+
used. If use_coordinate is a string, it specifies the name of a
1258+
coordinate variariable to use as the index.
1259+
limit : int, default None
1260+
Maximum number of consecutive NaNs to fill. Must be greater than 0
1261+
or None for no limit.
1262+
1263+
Returns
1264+
-------
1265+
DataArray
1266+
1267+
See also
1268+
--------
1269+
numpy.interp
1270+
scipy.interpolate
1271+
"""
1272+
from .missing import interp_na
1273+
return interp_na(self, dim=dim, method=method, limit=limit,
1274+
use_coordinate=use_coordinate, **kwargs)
1275+
1276+
def ffill(self, dim, limit=None):
1277+
'''Fill NaN values by propogating values forward
1278+
1279+
*Requires bottleneck.*
1280+
1281+
Parameters
1282+
----------
1283+
dim : str
1284+
Specifies the dimension along which to propagate values when
1285+
filling.
1286+
limit : int, default None
1287+
The maximum number of consecutive NaN values to forward fill. In
1288+
other words, if there is a gap with more than this number of
1289+
consecutive NaNs, it will only be partially filled. Must be greater
1290+
than 0 or None for no limit.
1291+
1292+
Returns
1293+
-------
1294+
DataArray
1295+
'''
1296+
from .missing import ffill
1297+
return ffill(self, dim, limit=limit)
1298+
1299+
def bfill(self, dim, limit=None):
1300+
'''Fill NaN values by propogating values backward
1301+
1302+
*Requires bottleneck.*
1303+
1304+
Parameters
1305+
----------
1306+
dim : str
1307+
Specifies the dimension along which to propagate values when
1308+
filling.
1309+
limit : int, default None
1310+
The maximum number of consecutive NaN values to backward fill. In
1311+
other words, if there is a gap with more than this number of
1312+
consecutive NaNs, it will only be partially filled. Must be greater
1313+
than 0 or None for no limit.
1314+
1315+
Returns
1316+
-------
1317+
DataArray
1318+
'''
1319+
from .missing import bfill
1320+
return bfill(self, dim, limit=limit)
1321+
12311322
def combine_first(self, other):
12321323
"""Combine two DataArray objects, with union of coordinates.
12331324
@@ -1935,10 +2026,10 @@ def sortby(self, variables, ascending=True):
19352026
sorted: DataArray
19362027
A new dataarray where all the specified dims are sorted by dim
19372028
labels.
1938-
2029+
19392030
Examples
19402031
--------
1941-
2032+
19422033
>>> da = xr.DataArray(np.random.rand(5),
19432034
... coords=[pd.date_range('1/1/2000', periods=5)],
19442035
... dims='time')
@@ -1952,7 +2043,7 @@ def sortby(self, variables, ascending=True):
19522043
<xarray.DataArray (time: 5)>
19532044
array([ 0.26532 , 0.270962, 0.552878, 0.615637, 0.965471])
19542045
Coordinates:
1955-
* time (time) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05 ...
2046+
* time (time) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05 ...
19562047
"""
19572048
ds = self._to_temp_dataset().sortby(variables, ascending=ascending)
19582049
return self._from_temp_dataset(ds)

xarray/core/dataset.py

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2410,6 +2410,105 @@ def fillna(self, value):
24102410
out = ops.fillna(self, value)
24112411
return out
24122412

2413+
def interpolate_na(self, dim=None, method='linear', limit=None,
2414+
use_coordinate=True,
2415+
**kwargs):
2416+
"""Interpolate values according to different methods.
2417+
2418+
Parameters
2419+
----------
2420+
dim : str
2421+
Specifies the dimension along which to interpolate.
2422+
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
2423+
'polynomial', 'barycentric', 'krog', 'pchip',
2424+
'spline'}, optional
2425+
String indicating which method to use for interpolation:
2426+
2427+
- 'linear': linear interpolation (Default). Additional keyword
2428+
arguments are passed to ``numpy.interp``
2429+
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
2430+
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
2431+
method=='polynomial', the ``order`` keyword argument must also be
2432+
provided.
2433+
- 'barycentric', 'krog', 'pchip', 'spline': use their respective
2434+
``scipy.interpolate`` classes.
2435+
use_coordinate : boolean or str, default True
2436+
Specifies which index to use as the x values in the interpolation
2437+
formulated as `y = f(x)`. If False, values are treated as if
2438+
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
2439+
used. If use_coordinate is a string, it specifies the name of a
2440+
coordinate variariable to use as the index.
2441+
limit : int, default None
2442+
Maximum number of consecutive NaNs to fill. Must be greater than 0
2443+
or None for no limit.
2444+
2445+
Returns
2446+
-------
2447+
Dataset
2448+
2449+
See also
2450+
--------
2451+
numpy.interp
2452+
scipy.interpolate
2453+
"""
2454+
from .missing import interp_na, _apply_over_vars_with_dim
2455+
2456+
new = _apply_over_vars_with_dim(interp_na, self, dim=dim,
2457+
method=method, limit=limit,
2458+
use_coordinate=use_coordinate,
2459+
**kwargs)
2460+
return new
2461+
2462+
def ffill(self, dim, limit=None):
2463+
'''Fill NaN values by propogating values forward
2464+
2465+
*Requires bottleneck.*
2466+
2467+
Parameters
2468+
----------
2469+
dim : str
2470+
Specifies the dimension along which to propagate values when
2471+
filling.
2472+
limit : int, default None
2473+
The maximum number of consecutive NaN values to forward fill. In
2474+
other words, if there is a gap with more than this number of
2475+
consecutive NaNs, it will only be partially filled. Must be greater
2476+
than 0 or None for no limit.
2477+
2478+
Returns
2479+
-------
2480+
Dataset
2481+
'''
2482+
from .missing import ffill, _apply_over_vars_with_dim
2483+
2484+
new = _apply_over_vars_with_dim(ffill, self, dim=dim, limit=limit)
2485+
return new
2486+
2487+
def bfill(self, dim, limit=None):
2488+
'''Fill NaN values by propogating values backward
2489+
2490+
*Requires bottleneck.*
2491+
2492+
Parameters
2493+
----------
2494+
dim : str
2495+
Specifies the dimension along which to propagate values when
2496+
filling.
2497+
limit : int, default None
2498+
The maximum number of consecutive NaN values to backward fill. In
2499+
other words, if there is a gap with more than this number of
2500+
consecutive NaNs, it will only be partially filled. Must be greater
2501+
than 0 or None for no limit.
2502+
2503+
Returns
2504+
-------
2505+
Dataset
2506+
'''
2507+
from .missing import bfill, _apply_over_vars_with_dim
2508+
2509+
new = _apply_over_vars_with_dim(bfill, self, dim=dim, limit=limit)
2510+
return new
2511+
24132512
def combine_first(self, other):
24142513
"""Combine two Datasets, default to data_vars of self.
24152514

0 commit comments

Comments
 (0)