Skip to content

Commit c836fbd

Browse files
committed
Merge branch 'main' into pre-commit-ci-update-config
* main: add backend intro and how-to diagram (#9175) Fix copybutton for multi line examples in double digit ipython cells (#9264) Update signature for _arrayfunction.__array__ (#9237) Add encode_cf_datetime benchmark (#9262) groupby, resample: Deprecate some positional args (#9236) Delete ``base`` and ``loffset`` parameters to resample (#9233) Update dropna docstring (#9257) Grouper, Resampler as public api (#8840) Fix mypy on main (#9252) fix fallback isdtype method (#9250) Enable pandas type checking (#9213) Per-variable specification of boolean parameters in open_dataset (#9218) test push Added a space to the documentation (#9247) Fix typing for test_plot.py (#9234)
2 parents 25bf152 + 95e67b6 commit c836fbd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1312
-1067
lines changed

asv_bench/benchmarks/coding.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import numpy as np
2+
3+
import xarray as xr
4+
5+
from . import parameterized
6+
7+
8+
@parameterized(["calendar"], [("standard", "noleap")])
9+
class EncodeCFDatetime:
10+
def setup(self, calendar):
11+
self.units = "days since 2000-01-01"
12+
self.dtype = np.dtype("int64")
13+
self.times = xr.date_range(
14+
"2000", freq="D", periods=10000, calendar=calendar
15+
).values
16+
17+
def time_encode_cf_datetime(self, calendar):
18+
xr.coding.times.encode_cf_datetime(self.times, self.units, calendar, self.dtype)

doc/api-hidden.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -693,3 +693,7 @@
693693

694694
coding.times.CFTimedeltaCoder
695695
coding.times.CFDatetimeCoder
696+
697+
core.groupers.Grouper
698+
core.groupers.Resampler
699+
core.groupers.EncodedGroups

doc/api.rst

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,18 @@ DataArray
803803
DataArrayGroupBy.dims
804804
DataArrayGroupBy.groups
805805

806+
Grouper Objects
807+
---------------
808+
809+
.. currentmodule:: xarray.core
810+
811+
.. autosummary::
812+
:toctree: generated/
813+
814+
groupers.BinGrouper
815+
groupers.UniqueGrouper
816+
groupers.TimeResampler
817+
806818

807819
Rolling objects
808820
===============
@@ -1028,17 +1040,20 @@ DataArray
10281040
Accessors
10291041
=========
10301042

1031-
.. currentmodule:: xarray
1043+
.. currentmodule:: xarray.core
10321044

10331045
.. autosummary::
10341046
:toctree: generated/
10351047

1036-
core.accessor_dt.DatetimeAccessor
1037-
core.accessor_dt.TimedeltaAccessor
1038-
core.accessor_str.StringAccessor
1048+
accessor_dt.DatetimeAccessor
1049+
accessor_dt.TimedeltaAccessor
1050+
accessor_str.StringAccessor
1051+
10391052

10401053
Custom Indexes
10411054
==============
1055+
.. currentmodule:: xarray
1056+
10421057
.. autosummary::
10431058
:toctree: generated/
10441059

doc/conf.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@
9797
}
9898

9999
# sphinx-copybutton configurations
100-
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: "
100+
copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.{3,}: | {5,8}: "
101101
copybutton_prompt_is_regexp = True
102102

103103
# nbsphinx configurations
@@ -158,6 +158,8 @@
158158
"Variable": "~xarray.Variable",
159159
"DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy",
160160
"DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy",
161+
"Grouper": "~xarray.core.groupers.Grouper",
162+
"Resampler": "~xarray.core.groupers.Resampler",
161163
# objects without namespace: numpy
162164
"ndarray": "~numpy.ndarray",
163165
"MaskedArray": "~numpy.ma.MaskedArray",
@@ -169,6 +171,7 @@
169171
"CategoricalIndex": "~pandas.CategoricalIndex",
170172
"TimedeltaIndex": "~pandas.TimedeltaIndex",
171173
"DatetimeIndex": "~pandas.DatetimeIndex",
174+
"IntervalIndex": "~pandas.IntervalIndex",
172175
"Series": "~pandas.Series",
173176
"DataFrame": "~pandas.DataFrame",
174177
"Categorical": "~pandas.Categorical",

doc/internals/how-to-add-new-backend.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ How to add a new backend
44
------------------------
55

66
Adding a new backend for read support to Xarray does not require
7-
to integrate any code in Xarray; all you need to do is:
7+
one to integrate any code in Xarray; all you need to do is:
88

99
- Create a class that inherits from Xarray :py:class:`~xarray.backends.BackendEntrypoint`
1010
and implements the method ``open_dataset`` see :ref:`RST backend_entrypoint`

doc/user-guide/groupby.rst

Lines changed: 82 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: xarray
2+
13
.. _groupby:
24

35
GroupBy: Group and Bin Data
@@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
1517
- Apply some function to each group.
1618
- Combine your groups back into a single data object.
1719

18-
Group by operations work on both :py:class:`~xarray.Dataset` and
19-
:py:class:`~xarray.DataArray` objects. Most of the examples focus on grouping by
20+
Group by operations work on both :py:class:`Dataset` and
21+
:py:class:`DataArray` objects. Most of the examples focus on grouping by
2022
a single one-dimensional variable, although support for grouping
2123
over a multi-dimensional variable has recently been implemented. Note that for
2224
one-dimensional data, it is usually faster to rely on pandas' implementation of
2325
the same pipeline.
2426

2527
.. tip::
2628

27-
To substantially improve the performance of GroupBy operations, particularly
28-
with dask `install the flox package <https://flox.readthedocs.io>`_. flox
29+
`Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
30+
of GroupBy operations, particularly with dask. flox
2931
`extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
30-
by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
32+
by allowing grouping by multiple variables, and lazy grouping by dask arrays.
33+
If installed, Xarray will automatically use flox by default.
3134

3235
Split
3336
~~~~~
@@ -87,7 +90,7 @@ Binning
8790
Sometimes you don't want to use all the unique values to determine the groups
8891
but instead want to "bin" the data into coarser groups. You could always create
8992
a customized coordinate, but xarray facilitates this via the
90-
:py:meth:`~xarray.Dataset.groupby_bins` method.
93+
:py:meth:`Dataset.groupby_bins` method.
9194

9295
.. ipython:: python
9396
@@ -110,7 +113,7 @@ Apply
110113
~~~~~
111114

112115
To apply a function to each group, you can use the flexible
113-
:py:meth:`~xarray.core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
116+
:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
114117
concatenated back together along the group axis:
115118

116119
.. ipython:: python
@@ -121,8 +124,8 @@ concatenated back together along the group axis:
121124
122125
arr.groupby("letters").map(standardize)
123126
124-
GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
125-
methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
127+
GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
128+
methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
126129
aggregation function:
127130

128131
.. ipython:: python
@@ -183,7 +186,7 @@ Iterating and Squeezing
183186
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
184187
a GroupBy object. This behaviour is being removed.
185188
You can always squeeze explicitly later with the Dataset or DataArray
186-
:py:meth:`~xarray.DataArray.squeeze` methods.
189+
:py:meth:`DataArray.squeeze` methods.
187190

188191
.. ipython:: python
189192
@@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
217220
da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)
218221
219222
Because multidimensional groups have the ability to generate a very large
220-
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
223+
number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
221224
may be desirable:
222225

223226
.. ipython:: python
@@ -232,3 +235,71 @@ applying your function, and then unstacking the result:
232235
233236
stacked = da.stack(gridcell=["ny", "nx"])
234237
stacked.groupby("gridcell").sum(...).unstack("gridcell")
238+
239+
.. _groupby.groupers:
240+
241+
Grouper Objects
242+
~~~~~~~~~~~~~~~
243+
244+
Both ``groupby_bins`` and ``resample`` are specializations of the core ``groupby`` operation for binning,
245+
and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
246+
variables with a combination of categorical grouping, binning, and resampling; or more specializations like
247+
spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
248+
custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
249+
extension point using ``Grouper`` objects.
250+
251+
.. tip::
252+
253+
See the `grouper design`_ doc for more detail on the motivation and design ideas behind
254+
Grouper objects.
255+
256+
.. _grouper design: https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md
257+
258+
For now Xarray provides three specialized Grouper objects:
259+
260+
1. :py:class:`groupers.UniqueGrouper` for categorical grouping
261+
2. :py:class:`groupers.BinGrouper` for binned grouping
262+
3. :py:class:`groupers.TimeResampler` for resampling along a datetime coordinate
263+
264+
These provide functionality identical to the existing ``groupby``, ``groupby_bins``, and ``resample`` methods.
265+
That is,
266+
267+
.. code-block:: python
268+
269+
ds.groupby("x")
270+
271+
is identical to
272+
273+
.. code-block:: python
274+
275+
from xarray.groupers import UniqueGrouper
276+
277+
ds.groupby(x=UniqueGrouper())
278+
279+
; and
280+
281+
.. code-block:: python
282+
283+
ds.groupby_bins("x", bins=bins)
284+
285+
is identical to
286+
287+
.. code-block:: python
288+
289+
from xarray.groupers import BinGrouper
290+
291+
ds.groupby(x=BinGrouper(bins))
292+
293+
and
294+
295+
.. code-block:: python
296+
297+
ds.resample(time="ME")
298+
299+
is identical to
300+
301+
.. code-block:: python
302+
303+
from xarray.groupers import TimeResampler
304+
305+
ds.resample(time=TimeResampler("ME"))

doc/user-guide/io.rst

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,81 @@ format (recommended).
1919
2020
np.random.seed(123456)
2121
22+
You can `read different types of files <https://docs.xarray.dev/en/stable/user-guide/io.html>`_
23+
in `xr.open_dataset` by specifying the engine to be used:
24+
25+
.. ipython:: python
26+
:okexcept:
27+
:suppress:
28+
29+
import xarray as xr
30+
31+
xr.open_dataset("my_file.grib", engine="cfgrib")
32+
33+
The "engine" provides a set of instructions that tells xarray how
34+
to read the data and pack them into a `dataset` (or `dataarray`).
35+
These instructions are stored in an underlying "backend".
36+
37+
Xarray comes with several backends that cover many common data formats.
38+
Many more backends are available via external libraries, or you can `write your own <https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html>`_.
39+
This diagram aims to help you determine - based on the format of the file you'd like to read -
40+
which type of backend you're using and how to use it.
41+
42+
Text and boxes are clickable for more information.
43+
Following the diagram is detailed information on many popular backends.
44+
You can learn more about using and developing backends in the
45+
`Xarray tutorial JupyterBook <https://tutorial.xarray.dev/advanced/backends/backends.html>`_.
46+
47+
.. mermaid::
48+
:alt: Flowchart illustrating how to choose the right backend engine to read your data
49+
50+
flowchart LR
51+
built-in-eng["""Is your data stored in one of these formats?
52+
- netCDF4 (<code>netcdf4</code>)
53+
- netCDF3 (<code>scipy</code>)
54+
- Zarr (<code>zarr</code>)
55+
- DODS/OPeNDAP (<code>pydap</code>)
56+
- HDF5 (<code>h5netcdf</code>)
57+
"""]
58+
59+
built-in("""You're in luck! Xarray bundles a backend for this format.
60+
Open data using <code>xr.open_dataset()</code>. We recommend
61+
always setting the engine you want to use.""")
62+
63+
installed-eng["""One of these formats?
64+
- <a href='https://github.com/ecmwf/cfgrib'>GRIB (<code>cfgrib</code>)
65+
- <a href='https://tiledb-inc.github.io/TileDB-CF-Py/documentation/index.html'>TileDB (<code>tiledb</code>)
66+
- <a href='https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray'>GeoTIFF, JPEG-2000, ESRI-hdf (<code>rioxarray</code>, via GDAL)
67+
- <a href='https://www.bopen.eu/xarray-sentinel-open-source-library/'>Sentinel-1 SAFE (<code>xarray-sentinel</code>)
68+
"""]
69+
70+
installed("""Install the package indicated in parentheses to your
71+
Python environment. Restart the kernel and use
72+
<code>xr.open_dataset(files, engine='rioxarray')</code>.""")
73+
74+
other("""Ask around to see if someone in your data community
75+
has created an Xarray backend for your data type.
76+
If not, you may need to create your own or consider
77+
exporting your data to a more common format.""")
78+
79+
built-in-eng -->|Yes| built-in
80+
built-in-eng -->|No| installed-eng
81+
82+
installed-eng -->|Yes| installed
83+
installed-eng -->|No| other
84+
85+
click built-in-eng "https://docs.xarray.dev/en/stable/getting-started-guide/faq.html#how-do-i-open-format-x-file-as-an-xarray-dataset"
86+
click other "https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html"
87+
88+
classDef quesNodefmt fill:#9DEEF4,stroke:#206C89,text-align:left
89+
class built-in-eng,installed-eng quesNodefmt
90+
91+
classDef ansNodefmt fill:#FFAA05,stroke:#E37F17,text-align:left,white-space:nowrap
92+
class built-in,installed,other ansNodefmt
93+
94+
linkStyle default font-size:20pt,color:#206C89
95+
96+
2297
.. _io.netcdf:
2398

2499
netCDF

doc/user-guide/terminology.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ complete examples, please consult the relevant documentation.*
221221
combined_ds
222222
223223
lazy
224-
Lazily-evaluated operations do not load data into memory until necessary.Instead of doing calculations
224+
Lazily-evaluated operations do not load data into memory until necessary. Instead of doing calculations
225225
right away, xarray lets you plan what calculations you want to do, like finding the
226226
average temperature in a dataset.This planning is called "lazy evaluation." Later, when
227227
you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"

doc/whats-new.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,15 @@ v2024.06.1 (unreleased)
2222

2323
New Features
2424
~~~~~~~~~~~~
25+
- Introduce new :py:class:`groupers.UniqueGrouper`, :py:class:`groupers.BinGrouper`, and
26+
:py:class:`groupers.TimeResampler` objects as a step towards supporting grouping by
27+
multiple variables. See the `docs <groupby.groupers_>` and the
28+
`grouper design doc <https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md>`_ for more.
29+
(:issue:`6610`, :pull:`8840`).
30+
By `Deepak Cherian <https://github.com/dcherian>`_.
31+
- Allow per-variable specification of ``mask_and_scale``, ``decode_times``, ``decode_timedelta``
32+
``use_cftime`` and ``concat_characters`` params in :py:func:`~xarray.open_dataset` (:pull:`9218`).
33+
By `Mathijs Verhaegh <https://github.com/Ostheer>`_.
2534
- Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`).
2635
By `Martin Raspaud <https://github.com/mraspaud>`_.
2736
- Extract the source url from fsspec objects (:issue:`9142`, :pull:`8923`).
@@ -33,6 +42,11 @@ New Features
3342

3443
Breaking changes
3544
~~~~~~~~~~~~~~~~
45+
- The ``base`` and ``loffset`` parameters to :py:meth:`Dataset.resample` and :py:meth:`DataArray.resample`
46+
is now removed. These parameters has been deprecated since v2023.03.0. Using the
47+
``origin`` or ``offset`` parameters is recommended as a replacement for using
48+
the ``base`` parameter and using time offset arithmetic is recommended as a
49+
replacement for using the ``loffset`` parameter.
3650

3751

3852
Deprecations
@@ -70,15 +84,20 @@ Bug fixes
7084
Documentation
7185
~~~~~~~~~~~~~
7286

73-
- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_).
87+
- Adds intro to backend section of docs, including a flow-chart to navigate types of backends (:pull:`9175`).
88+
By `Jessica Scheick <https://github.com/jessicas11>`_.
89+
- Adds a flow-chart diagram to help users navigate help resources (`Discussion #8990 <https://github.com/pydata/xarray/discussions/8990>`_, :pull:`9147`).
7490
By `Jessica Scheick <https://github.com/jessicas11>`_.
7591
- Improvements to Zarr & chunking docs (:pull:`9139`, :pull:`9140`, :pull:`9132`)
7692
By `Maximilian Roos <https://github.com/max-sixty>`_.
77-
93+
- Fix copybutton for multi line examples and double digit ipython cell numbers (:pull:`9264`).
94+
By `Moritz Schreiber <https://github.com/mosc9575>`_.
7895

7996
Internal Changes
8097
~~~~~~~~~~~~~~~~
8198

99+
- Enable typing checks of pandas (:pull:`9213`).
100+
By `Michael Niklas <https://github.com/headtr1ck>`_.
82101

83102
.. _whats-new.2024.06.0:
84103

0 commit comments

Comments
 (0)