Skip to content

Commit a31e0e5

Browse files
committed
Merge pull request #291 from shoyer/faster-isel
Support using dictionaries for labeled indexing
2 parents 4122158 + 967809a commit a31e0e5

14 files changed

+180
-81
lines changed

doc/api.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@ Top-level functions
1616

1717
align
1818
concat
19-
decode_cf
2019

2120
Dataset
2221
=======
@@ -29,6 +28,7 @@ Creating a dataset
2928

3029
Dataset
3130
open_dataset
31+
decode_cf
3232

3333
Attributes
3434
----------
@@ -85,6 +85,7 @@ Indexing
8585
.. autosummary::
8686
:toctree: generated/
8787

88+
Dataset.loc
8889
Dataset.isel
8990
Dataset.sel
9091
Dataset.squeeze

doc/indexing.rst

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ By name By integer ``arr.isel(space=0)`` ``ds.isel(space=0)``
2828
By name By label ``arr.sel(space='IA')`` ``ds.sel(space='IA')``
2929
================ ============ ======================= ======================
3030

31-
Array indexing
32-
--------------
31+
Positional indexing
32+
-------------------
3333

3434
Indexing a :py:class:`~xray.DataArray` directly works (mostly) just like it
3535
does for numpy arrays, except that the returned object is always another
@@ -70,16 +70,29 @@ Indexing with labeled dimensions
7070
--------------------------------
7171

7272
With labeled dimensions, we do not have to rely on dimension order and can
73-
use them explicitly to slice data with the :py:meth:`~xray.DataArray.sel`
74-
and :py:meth:`~xray.DataArray.isel` methods:
73+
use them explicitly to slice data. There are two ways to do this:
7574

76-
.. ipython:: python
75+
1. Use a dictionary as the argument for array positional or label based array
76+
indexing:
77+
78+
.. ipython:: python
79+
80+
# index by integer array indices
81+
arr[dict(space=0, time=slice(None, 2))]
82+
83+
# index by dimension coordinate labels
84+
arr.loc[dict(time=slice('2000-01-01', '2000-01-02'))]
85+
86+
2. Use the :py:meth:`~xray.DataArray.sel` and :py:meth:`~xray.DataArray.isel`
87+
convenience methods:
7788

78-
# index by integer array indices
79-
arr.isel(space=0, time=slice(None, 2))
89+
.. ipython:: python
8090
81-
# index by dimension coordinate labels
82-
arr.sel(time=slice('2000-01-01', '2000-01-02'))
91+
# index by integer array indices
92+
arr.isel(space=0, time=slice(None, 2))
93+
94+
# index by dimension coordinate labels
95+
arr.sel(time=slice('2000-01-01', '2000-01-02'))
8396
8497
The arguments to these methods can be any objects that could index the array
8598
along the dimension given by the keyword, e.g., labels for an individual value,
@@ -88,10 +101,8 @@ Python :py:func:`slice` objects or 1-dimensional arrays.
88101
.. note::
89102

90103
We would love to be able to do indexing with labeled dimension names inside
91-
brackets, but Python `does yet not support`__ indexing with keyword
92-
arguments like ``arr[space=0]``. One alternative we are considering is
93-
allowing for indexing with a dictionary, ``arr[{'space': 0}]``
94-
(see :issue:`187`.
104+
brackets, but unfortunately, Python `does yet not support`__ indexing with
105+
keyword arguments like ``arr[space=0]``
95106

96107
__ http://legacy.python.org/dev/peps/pep-0472/
97108

@@ -100,17 +111,14 @@ __ http://legacy.python.org/dev/peps/pep-0472/
100111
Do not try to assign values when using ``isel`` or ``sel``::
101112

102113
# DO NOT do this
103-
arr.isel(space='0') = 0
114+
arr.isel(space=0) = 0
104115

105116
Depending on whether the underlying numpy indexing returns a copy or a
106117
view, the method will fail, and when it fails, **it will fail
107-
silently**. Until we support indexing with dictionaries (see the note
108-
above), you should explicitly construct a tuple to do positional indexing
109-
if you want to do assignment with labeled dimensions::
118+
silently**. Instead, you should use normal index assignment::
110119

111-
# this is safer
112-
indexer = tuple(0 if d == 'space' else slice(None) for d in arr.dims)
113-
arr[indexer] = 0
120+
# this is safe
121+
arr[dict(space=0)] = 0
114122

115123
Dataset indexing
116124
----------------
@@ -126,7 +134,15 @@ simultaneously, returning a new dataset:
126134
127135
Positional indexing on a dataset is not supported because the ordering of
128136
dimensions in a dataset is somewhat ambiguous (it can vary between different
129-
arrays).
137+
arrays). However, you can do normal indexing with labeled dimensions:
138+
139+
.. ipython:: python
140+
141+
ds[dict(space=[0], time=[0])]
142+
ds.loc[dict(time='2000-01-01')]
143+
144+
Using indexing to *assign* values to a subset of dataset (e.g.,
145+
``ds[dict(space=0)] = 1``) is not yet supported.
130146

131147
Indexing details
132148
----------------

xray/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33
from .core.dataset import Dataset, open_dataset
44
from .core.dataarray import DataArray
55

6-
from .conventions import cf_decode
6+
from .conventions import decode_cf
77

88
from .version import version as __version__

xray/conventions.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -638,9 +638,10 @@ def stackable(dim):
638638
return new_vars, attributes, coord_names
639639

640640

641-
def cf_decode(obj, concat_characters=True, mask_and_scale=True,
641+
def decode_cf(obj, concat_characters=True, mask_and_scale=True,
642642
decode_times=True, decode_coords=True):
643-
"""Decode the given object according to CF conventions into a new Dataset.
643+
"""Decode the given Dataset or Datastore according to CF conventions into
644+
a new Dataset.
644645
645646
Parameters
646647
----------

xray/core/dataarray.py

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -62,12 +62,20 @@ def __init__(self, data_array):
6262
self.data_array = data_array
6363

6464
def _remap_key(self, key):
65-
label_indexers = self.data_array._key_to_indexers(key)
66-
indexers = []
67-
for dim, label in iteritems(label_indexers):
65+
def lookup_positions(dim, labels):
6866
index = self.data_array.indexes[dim]
69-
indexers.append(indexing.convert_label_indexer(index, label))
70-
return tuple(indexers)
67+
return indexing.convert_label_indexer(index, labels)
68+
69+
if utils.is_dict_like(key):
70+
return dict((dim, lookup_positions(dim, labels))
71+
for dim, labels in iteritems(key))
72+
else:
73+
if not isinstance(key, tuple):
74+
key = (key,)
75+
# note: it's OK if there are fewer keys than dimensions: zip will
76+
# finish early in that case (we don't need to insert colons)
77+
return tuple(lookup_positions(dim, labels) for dim, labels
78+
in zip(self.data_array.dims, key))
7179

7280
def __getitem__(self, key):
7381
return self.data_array[self._remap_key(key)]
@@ -326,16 +334,19 @@ def dimensions(self):
326334
utils.alias_warning('dimensions', 'dims')
327335
return self.dims
328336

329-
def _key_to_indexers(self, key):
330-
return OrderedDict(
331-
zip(self.dims, indexing.expanded_indexer(key, self.ndim)))
337+
def _item_key_to_dict(self, key):
338+
if utils.is_dict_like(key):
339+
return key
340+
else:
341+
key = indexing.expanded_indexer(key, self.ndim)
342+
return dict(zip(self.dims, key))
332343

333344
def __getitem__(self, key):
334345
if isinstance(key, basestring):
335346
return self.coords[key]
336347
else:
337348
# orthogonal array indexing
338-
return self.isel(**self._key_to_indexers(key))
349+
return self.isel(**self._item_key_to_dict(key))
339350

340351
def __setitem__(self, key, value):
341352
if isinstance(key, basestring):
@@ -352,7 +363,7 @@ def __contains__(self, key):
352363

353364
@property
354365
def loc(self):
355-
"""Attribute for location based indexing like pandas..
366+
"""Attribute for location based indexing like pandas.
356367
"""
357368
return _LocIndexer(self)
358369

xray/core/dataset.py

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ def open_dataset(filename_or_obj, decode_cf=True, mask_and_scale=True,
8585
store = backends.ScipyDataStore(filename_or_obj)
8686

8787
if decode_cf:
88-
return conventions.cf_decode(
88+
return conventions.decode_cf(
8989
store, mask_and_scale=mask_and_scale,
9090
decode_times=decode_times, concat_characters=concat_characters,
9191
decode_coords=decode_coords)
@@ -292,6 +292,16 @@ def __repr__(self):
292292
return formatting.vars_repr(self)
293293

294294

295+
class _LocIndexer(object):
296+
def __init__(self, dataset):
297+
self.dataset = dataset
298+
299+
def __getitem__(self, key):
300+
if not utils.is_dict_like(key):
301+
raise TypeError('can only lookup dictionaries from Dataset.loc')
302+
return self.dataset.sel(**key)
303+
304+
295305
class Dataset(Mapping, common.ImplementsDatasetReduce):
296306
"""A multi-dimensional, in memory, array database.
297307
@@ -606,6 +616,13 @@ def __len__(self):
606616
def __iter__(self):
607617
return iter(self._arrays)
608618

619+
@property
620+
def loc(self):
621+
"""Attribute for location based indexing. Only supports __getitem__,
622+
and only when the key is a dict of the form {dim: labels}.
623+
"""
624+
return _LocIndexer(self)
625+
609626
@property
610627
def virtual_variables(self):
611628
"""A frozenset of names that don't exist in this dataset but for which
@@ -633,6 +650,9 @@ def __getitem__(self, key):
633650
"""
634651
from .dataarray import DataArray
635652

653+
if utils.is_dict_like(key):
654+
return self.isel(**key)
655+
636656
key = np.asarray(key)
637657
if key.ndim == 0:
638658
return DataArray._new_from_dataset(self, key.item())
@@ -650,6 +670,9 @@ def __setitem__(self, key, value):
650670
``(dims, data[, attrs])``), add it to this dataset as a new
651671
variable.
652672
"""
673+
if utils.is_dict_like(key):
674+
raise NotImplementedError('cannot yet use a dictionary as a key '
675+
'to set Dataset values')
653676
self.merge({key: value}, inplace=True, overwrite_vars=[key])
654677

655678
def __delitem__(self, key):
@@ -967,21 +990,23 @@ def reindex_like(self, other, copy=True):
967990
"""
968991
return self.reindex(copy=copy, **other.indexes)
969992

970-
def reindex(self, copy=True, **indexers):
993+
def reindex(self, indexers=None, copy=True, **kw_indexers):
971994
"""Conform this object onto a new set of indexes, filling in
972995
missing values with NaN.
973996
974997
Parameters
975998
----------
976-
copy : bool, optional
977-
If `copy=True`, the returned dataset contains only copied
978-
variables. If `copy=False` and no reindexing is required then
979-
original variables from this dataset are returned.
980-
**indexers : dict
999+
indexers : dict. optional
9811000
Dictionary with keys given by dimension names and values given by
9821001
arrays of coordinates tick labels. Any mis-matched coordinate values
9831002
will be filled in with NaN, and any mis-matched dimension names will
9841003
simply be ignored.
1004+
copy : bool, optional
1005+
If `copy=True`, the returned dataset contains only copied
1006+
variables. If `copy=False` and no reindexing is required then
1007+
original variables from this dataset are returned.
1008+
**kw_indexers : optional
1009+
Keyword arguments in the same form as ``indexers``.
9851010
9861011
Returns
9871012
-------
@@ -993,6 +1018,8 @@ def reindex(self, copy=True, **indexers):
9931018
Dataset.reindex_like
9941019
align
9951020
"""
1021+
indexers = utils.combine_pos_and_kw_args(indexers, kw_indexers,
1022+
'reindex')
9961023
if not indexers:
9971024
# shortcut
9981025
return self.copy(deep=True) if copy else self

xray/core/utils.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,19 @@ def is_dict_like(value):
189189
return hasattr(value, '__getitem__') and hasattr(value, 'keys')
190190

191191

192+
def combine_pos_and_kw_args(pos_kwargs, kw_kwargs, func_name):
193+
if pos_kwargs is not None:
194+
if not is_dict_like(pos_kwargs):
195+
raise ValueError('the first argument to .%s must be a dictionary'
196+
% func_name)
197+
if kw_kwargs:
198+
raise ValueError('cannot specify both keyword and positional '
199+
'arguments to .%s' % func_name)
200+
return pos_kwargs
201+
else:
202+
return kw_kwargs
203+
204+
192205
def is_scalar(value):
193206
"""np.isscalar only work on primitive numeric types and (bizarrely)
194207
excludes 0-d ndarrays; this version does more comprehensive checks

xray/core/variable.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,12 @@ def _parse_dimensions(self, dims):
357357
def dims(self, value):
358358
self._dims = self._parse_dimensions(value)
359359

360+
def _item_key_to_tuple(self, key):
361+
if utils.is_dict_like(key):
362+
return tuple(key.get(dim, slice(None)) for dim in self.dims)
363+
else:
364+
return key
365+
360366
def __getitem__(self, key):
361367
"""Return a new Array object whose contents are consistent with
362368
getting the provided key from the underlying data.
@@ -374,9 +380,10 @@ def __getitem__(self, key):
374380
If you really want to do indexing like `x[x > 0]`, manipulate the numpy
375381
array `x.values` directly.
376382
"""
383+
key = self._item_key_to_tuple(key)
377384
key = indexing.expanded_indexer(key, self.ndim)
378385
dims = [dim for k, dim in zip(key, self.dims)
379-
if not isinstance(k, (int, np.integer))]
386+
if not isinstance(k, (int, np.integer))]
380387
values = self._data[key]
381388
# orthogonal indexing should ensure the dimensionality is consistent
382389
if hasattr(values, 'ndim'):
@@ -391,6 +398,7 @@ def __setitem__(self, key, value):
391398
392399
See __getitem__ for more details.
393400
"""
401+
key = self._item_key_to_tuple(key)
394402
self._data_cached()[key] = value
395403

396404
@property
@@ -817,6 +825,7 @@ def __init__(self, name, data, attrs=None, encoding=None):
817825
type(self).__name__)
818826

819827
def __getitem__(self, key):
828+
key = self._item_key_to_tuple(key)
820829
values = self._data[key]
821830
if not hasattr(values, 'ndim') or values.ndim == 0:
822831
return Variable((), values, self.attrs, self.encoding)

xray/test/__init__.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,8 @@ def assertDatasetIdentical(self, d1, d2):
9797
assert d1.identical(d2), (d1, d2)
9898

9999
def assertDatasetAllClose(self, d1, d2, rtol=1e-05, atol=1e-08):
100-
# for now, *don't* check coordinates vs variables
101-
self.assertEqual(sorted(d1, key=str),
102-
sorted(d2, key=str))
100+
self.assertEqual(sorted(d1, key=str), sorted(d2, key=str))
101+
self.assertItemsEqual(d1.coords, d2.coords)
103102
for k in d1:
104103
v1 = d1._arrays[k]
105104
v2 = d2._arrays[k]

0 commit comments

Comments
 (0)