Skip to content

Commit 8608451

Browse files
authored
Merge pull request #4 from shoyer/indexing_broadcasting
indexing.rst edits
2 parents 969f9cf + d0d6a6f commit 8608451

File tree

1 file changed

+72
-52
lines changed

1 file changed

+72
-52
lines changed

doc/indexing.rst

Lines changed: 72 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,19 @@ Indexing and selecting data
1111
import xarray as xr
1212
np.random.seed(123456)
1313
14+
xarray offers extremely flexible indexing routines that combine the best
15+
features of NumPy and pandas for data selection.
1416

15-
The point of xarray is to introduce a numpy-ndarray-like multidimensional array object into a powerful pandas's flexible data handling scheme.
16-
We provide several (say, numpy-like, pandas-like, and more advanced type) indexing functionalities.
17-
18-
The most basic way to access each element of xarray's multi-dimensional
19-
object is to use Python ``[obj]`` syntax, such as ``array[i, j]``, where ``i`` and ``j`` are both integers.
20-
As xarray objects can store coordinates corresponding to each dimension of the
17+
The most basic way to access elements of a :py:class:`~xarray.DataArray`
18+
object is to use Python's ``[]`` syntax, such as ``array[i, j]``, where
19+
``i`` and ``j`` are both integers.
20+
As xarray objects can store coordinates corresponding to each dimension of an
2121
array, label-based indexing similar to ``pandas.DataFrame.loc`` is also possible.
2222
In label-based indexing, the element position ``i`` is automatically
2323
looked-up from the coordinate values.
2424

25-
Dimensions of xarray object have names and you can also lookup the dimensions
26-
by name, instead of remembering the positional ordering of dimensions by yourself.
25+
Dimensions of xarray objects have names, so you can also lookup the dimensions
26+
by name, instead of remembering their positional order.
2727

2828
Thus in total, xarray supports four different kinds of indexing, as described
2929
below and summarized in this table:
@@ -271,27 +271,27 @@ elements that are fully masked:
271271
Vectorized Indexing
272272
-------------------
273273

274-
xarray supports many types of indexing with a `vectorized` manner.
274+
Like numpy and pandas, xarray supports indexing many array elements at once in a
275+
`vectorized` manner.
275276

276-
If you provide an integer, slice, or unlabeled array (array without dimension names, such as ``np.ndarray``, ``list``, but not :py:meth:`~xarray.DataArray` or :py:meth:`~xarray.Variable`)
277-
our indexing is basically orthogonal.
278-
For example,
279-
if you pass multiple integer sequences to an array, they work independently
280-
along each dimension (similar to the way vector subscripts work in fortran).
277+
If you only provide integers, slices, or unlabeled arrays (array without
278+
dimension names, such as ``np.ndarray``, ``list``, but not
279+
:py:meth:`~xarray.DataArray` or :py:meth:`~xarray.Variable`) indexing can be
280+
understand as orthogonally. Each indexer component selects independently along
281+
the corresponding dimension, similar to how vector indexing works in Fortran or
282+
MATLAB, or after using the :py:func:`numpy.xi_` helper:
281283

282284
.. ipython:: python
283285
284286
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
285287
coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
286288
da
287289
da[[0, 1], [1, 1]]
288-
# Sequential indexing gives the same result.
289-
da[[0, 1], [1, 1]] == da[[0, 1]][:, [1, 1]]
290290
291-
In order to make more advanced indexing, you can supply
292-
:py:meth:`~xarray.DataArray` as indexers.
293-
In this case, the dimension of the resultant array is determined
294-
by the indexers' dimension names,
291+
For more flexibility, you can supply :py:meth:`~xarray.DataArray` objects
292+
as indexers.
293+
Dimensions on resultant arrays are given by the ordered union of the indexers'
294+
dimensions:
295295

296296
.. ipython:: python
297297
@@ -300,9 +300,8 @@ by the indexers' dimension names,
300300
da[ind_x, ind_y] # orthogonal indexing
301301
da[ind_x, ind_x] # vectorized indexing
302302
303-
Slices or sequences, which do not have named-dimensions,
304-
as a manner of fact,
305-
will be understood as the same dimension which is indexed along.
303+
Slices or sequences/arrays without named-dimensions are treated as if they have
304+
the same dimension which is indexed along:
306305

307306
.. ipython:: python
308307
@@ -312,17 +311,21 @@ will be understood as the same dimension which is indexed along.
312311
313312
Furthermore, you can use multi-dimensional :py:meth:`~xarray.DataArray`
314313
as indexers, where the resultant array dimension is also determined by
315-
indexers' dimension,
314+
indexers' dimension:
316315

317316
.. ipython:: python
318317
319318
ind = xr.DataArray([[0, 1], [0, 1]], dims=['a', 'b'])
320319
da[ind]
321320
322-
To summarize, our advanced indexing is based on our broadcasting scheme.
323-
See :ref:`xarray_indexing_rules` for the full list of our indexing rule.
321+
In briefly, similar to how NumPy's `advanced indexing`_ works, vectorized
322+
indexing for xarray is based on our
323+
:ref:`broadcasting rules <compute.broadcasting>`.
324+
See :ref:`indexing.rules` for the complete specification.
325+
326+
.. _advanced indexing: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html
324327

325-
These vectorized indexing also works with ``isel``, ``loc``, and ``sel``.
328+
Vectorized indexing also works with ``isel``, ``loc``, and ``sel``:
326329

327330
.. ipython:: python
328331
@@ -332,22 +335,28 @@ These vectorized indexing also works with ``isel``, ``loc``, and ``sel``.
332335
ind = xr.DataArray([['a', 'b'], ['b', 'a']], dims=['a', 'b'])
333336
da.loc[:, ind] # same to da.sel(y=ind)
334337
335-
336-
and also for Dataset
338+
and also for ``Dataset``
337339

338340
.. ipython:: python
339341
340342
ds2 = da.to_dataset(name='bar')
341343
ds2.isel(x=xr.DataArray([0, 1, 2], dims=['points']))
342344
345+
.. tip::
346+
347+
If you are lazily loading your data from disk, not every form of vectorized
348+
indexing is supported (or if supported, may not be supported efficiently).
349+
You may find increased performance by loading your data into memory first,
350+
e.g., with :py:meth:`~xarray.Dataset.load`.
351+
343352
.. note::
344-
This advanced indexing was newly added in v.0.10.
345-
In the older version of xarray, dimensions of indexers are not used.
346-
Special methods to realize some advanced indexing,
353+
354+
Vectorized indexing is a new feature in v0.10.
355+
In older versions of xarray, dimensions of indexers are ignored.
356+
Dedicated methods for some advanced indexing use cases,
347357
``isel_points`` and ``sel_points`` are now deprecated.
348358
See :ref:`more_advanced_indexing` for their alternative.
349359

350-
351360
.. _assigning_values:
352361

353362
Assigning values with indexing
@@ -416,8 +425,8 @@ __ https://docs.scipy.org/doc/numpy/user/basics.indexing.html#assigning-values-t
416425
More advanced indexing
417426
-----------------------
418427

419-
The use of :py:meth:`~xarray.DataArray` as indexers enables very flexible indexing.
420-
The following is an example of the pointwise indexing,
428+
The use of :py:meth:`~xarray.DataArray` objects as indexers enables very
429+
flexible indexing. The following is an example of the pointwise indexing:
421430

422431
.. ipython:: python
423432
@@ -438,8 +447,8 @@ you can supply a :py:meth:`~xarray.DataArray` with a coordinate,
438447
coords={'z': ['a', 'b', 'c']}),
439448
y=xr.DataArray([0, 1, 0], dims='z'))
440449
441-
442-
Analogously, label-based pointwise-indexing is also possible by ``.sel`` method,
450+
Analogously, label-based pointwise-indexing is also possible by the ``.sel``
451+
method:
443452

444453
.. ipython:: python
445454
@@ -448,7 +457,6 @@ Analogously, label-based pointwise-indexing is also possible by ``.sel`` method,
448457
arr.sel(space=xr.DataArray(['IA', 'IL', 'IN'], dims=['new_time']),
449458
time=times)
450459
451-
452460
.. _align and reindex:
453461

454462
Align and reindex
@@ -648,28 +656,40 @@ dimensions or use the ellipsis in the ``loc`` specifier, e.g. in the example
648656
above, ``mda.loc[{'one': 'a', 'two': 0}, :]`` or ``mda.loc[('a', 0), ...]``.
649657

650658

651-
.. _xarray_indexing_rules:
659+
.. _indexing.rules:
652660

653-
xarray indexing rules
654-
---------------------
661+
Indexing rules
662+
--------------
655663

656-
The detailed indexing scheme in xarray is as follows.
657-
(Note that it is for the explanation purpose and the actual implementation is differ.)
664+
Here we describe the full rules xarray uses for vectorized indexing. Note that
665+
this is for the purposes of explanation: for the sake of efficiency and to
666+
support various backends, the actual implementation is different.
658667

659-
0. (Only for label based indexing.) Look up positional indexes along each dimension based on :py:class:`pandas.Index`.
668+
0. (Only for label based indexing.) Look up positional indexes along each
669+
dimension from the corresponding :py:class:`pandas.Index`.
660670

661-
1. ``slice`` is converted to an array, such that ``np.arange(*slice.indices(...))``.
671+
1. A full slice object ``:`` is inserted for each dimension without an indexer.
662672

663-
2. Assume dimension names of array indexers without dimension, such as ``np.ndarray`` and ``list``, from the dimensions to be indexed along. For example, ``v.isel(x=[0, 1])`` is understood as ``v.isel(x=xr.DataArray([0, 1], dims=['x']))``.
673+
2. ``slice`` objects are converted into arrays, given by
674+
``np.arange(*slice.indices(...))``.
664675

665-
3. Broadcast all the indexers based on their dimension names (see :ref:`compute.broadcasting` for our name-based broadcasting).
676+
3. Assume dimension names for array indexers without dimensions, such as
677+
``np.ndarray`` and ``list``, from the dimensions to be indexed along.
678+
For example, ``v.isel(x=[0, 1])`` is understood as
679+
``v.isel(x=xr.DataArray([0, 1], dims=['x']))``.
666680

667-
4. Index the object by the broadcasted indexers.
681+
4. For each variable in a ``Dataset`` or ``DataArray`` (the array and its
682+
coordinates):
668683

669-
5. If an indexer-DataArray has coordinates, attached them to the indexed object.
684+
a. Broadcast all relevant indexers based on their dimension names
685+
(see :ref:`compute.broadcasting` for full details).
670686

671-
.. note::
687+
b. Index the underling array by the broadcast indexers, using NumPy's
688+
advanced indexing rules.
689+
690+
5. If any indexer DataArray has coordinates and no coordinate with the
691+
same name exists, attach them to the indexed object.
672692

673-
+ There should not be a conflict between the coordinates of indexer- and indexed- DataArrays. In v.0.10.0, xarray raises ``FutureWarning`` if there is such a conflict, but in the next major release, it will raise an Error.
693+
.. note::
674694

675-
+ Only 1-dimensional boolean array can be used as an indexer.
695+
Only 1-dimensional boolean arrays can be used as indexers.

0 commit comments

Comments
 (0)