Skip to content

Commit 2db94eb

Browse files
committed
Merge remote-tracking branch 'upstream/master' into fix/upstream-dev-tests
* upstream/master: drop_vars; deprecate drop for variables (pydata#3475) uamiv test using only raw uamiv variables (pydata#3485) Optimize dask array equality checks. (pydata#3453)
2 parents 8703fe2 + 0e8debf commit 2db94eb

19 files changed

+516
-260
lines changed

doc/data-structures.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -393,14 +393,14 @@ methods (like pandas) for transforming datasets into new objects.
393393

394394
For removing variables, you can select and drop an explicit list of
395395
variables by indexing with a list of names or using the
396-
:py:meth:`~xarray.Dataset.drop` methods to return a new ``Dataset``. These
396+
:py:meth:`~xarray.Dataset.drop_vars` methods to return a new ``Dataset``. These
397397
operations keep around coordinates:
398398

399399
.. ipython:: python
400400
401401
ds[['temperature']]
402402
ds[['temperature', 'temperature_double']]
403-
ds.drop('temperature')
403+
ds.drop_vars('temperature')
404404
405405
To remove a dimension, you can use :py:meth:`~xarray.Dataset.drop_dims` method.
406406
Any variables using that dimension are dropped:

doc/indexing.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -232,14 +232,14 @@ Using indexing to *assign* values to a subset of dataset (e.g.,
232232
Dropping labels and dimensions
233233
------------------------------
234234

235-
The :py:meth:`~xarray.Dataset.drop` method returns a new object with the listed
235+
The :py:meth:`~xarray.Dataset.drop_sel` method returns a new object with the listed
236236
index labels along a dimension dropped:
237237

238238
.. ipython:: python
239239
240-
ds.drop(space=['IN', 'IL'])
240+
ds.drop_sel(space=['IN', 'IL'])
241241
242-
``drop`` is both a ``Dataset`` and ``DataArray`` method.
242+
``drop_sel`` is both a ``Dataset`` and ``DataArray`` method.
243243

244244
Use :py:meth:`~xarray.Dataset.drop_dims` to drop a full dimension from a Dataset.
245245
Any variables with these dimensions are also dropped:

doc/whats-new.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,12 @@ Breaking changes
3838

3939
New Features
4040
~~~~~~~~~~~~
41+
- :py:meth:`Dataset.drop_sel` & :py:meth:`DataArray.drop_sel` have been added for dropping labels.
42+
:py:meth:`Dataset.drop_vars` & :py:meth:`DataArray.drop_vars` have been added for
43+
dropping variables (including coordinates). The existing ``drop`` methods remain as a backward compatible
44+
option for dropping either lables or variables, but using the more specific methods is encouraged.
45+
(:pull:`3475`)
46+
By `Maximilian Roos <https://github.com/max-sixty>`_
4147
- :py:meth:`Dataset.transpose` and :py:meth:`DataArray.transpose` now support an ellipsis (`...`)
4248
to represent all 'other' dimensions. For example, to move one dimension to the front,
4349
use `.transpose('x', ...)`. (:pull:`3421`)
@@ -70,6 +76,9 @@ Bug fixes
7076
but cloudpickle isn't (:issue:`3401`) by `Rhys Doyle <https://github.com/rdoyle45>`_
7177
- Fix grouping over variables with NaNs. (:issue:`2383`, :pull:`3406`).
7278
By `Deepak Cherian <https://github.com/dcherian>`_.
79+
- Use dask names to compare dask objects prior to comparing values after computation.
80+
(:issue:`3068`, :issue:`3311`, :issue:`3454`, :pull:`3453`).
81+
By `Deepak Cherian <https://github.com/dcherian>`_.
7382
- Sync with cftime by removing `dayofwk=-1` for cftime>=1.0.4.
7483
By `Anderson Banihirwe <https://github.com/andersy005>`_.
7584
- Fix :py:meth:`xarray.core.groupby.DataArrayGroupBy.reduce` and
@@ -3749,6 +3758,7 @@ Enhancements
37493758
explicitly listed variables or index labels:
37503759

37513760
.. ipython:: python
3761+
:okwarning:
37523762
37533763
# drop variables
37543764
ds = xray.Dataset({'x': 0, 'y': 1})

xarray/core/concat.py

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from . import dtypes, utils
44
from .alignment import align
5+
from .duck_array_ops import lazy_array_equiv
56
from .merge import _VALID_COMPAT, unique_variable
67
from .variable import IndexVariable, Variable, as_variable
78
from .variable import concat as concat_vars
@@ -189,26 +190,43 @@ def process_subset_opt(opt, subset):
189190
# all nonindexes that are not the same in each dataset
190191
for k in getattr(datasets[0], subset):
191192
if k not in concat_over:
192-
# Compare the variable of all datasets vs. the one
193-
# of the first dataset. Perform the minimum amount of
194-
# loads in order to avoid multiple loads from disk
195-
# while keeping the RAM footprint low.
196-
v_lhs = datasets[0].variables[k].load()
197-
# We'll need to know later on if variables are equal.
198-
computed = []
199-
for ds_rhs in datasets[1:]:
200-
v_rhs = ds_rhs.variables[k].compute()
201-
computed.append(v_rhs)
202-
if not getattr(v_lhs, compat)(v_rhs):
203-
concat_over.add(k)
204-
equals[k] = False
205-
# computed variables are not to be re-computed
206-
# again in the future
207-
for ds, v in zip(datasets[1:], computed):
208-
ds.variables[k].data = v.data
193+
equals[k] = None
194+
variables = [ds.variables[k] for ds in datasets]
195+
# first check without comparing values i.e. no computes
196+
for var in variables[1:]:
197+
equals[k] = getattr(variables[0], compat)(
198+
var, equiv=lazy_array_equiv
199+
)
200+
if equals[k] is not True:
201+
# exit early if we know these are not equal or that
202+
# equality cannot be determined i.e. one or all of
203+
# the variables wraps a numpy array
209204
break
210-
else:
211-
equals[k] = True
205+
206+
if equals[k] is False:
207+
concat_over.add(k)
208+
209+
elif equals[k] is None:
210+
# Compare the variable of all datasets vs. the one
211+
# of the first dataset. Perform the minimum amount of
212+
# loads in order to avoid multiple loads from disk
213+
# while keeping the RAM footprint low.
214+
v_lhs = datasets[0].variables[k].load()
215+
# We'll need to know later on if variables are equal.
216+
computed = []
217+
for ds_rhs in datasets[1:]:
218+
v_rhs = ds_rhs.variables[k].compute()
219+
computed.append(v_rhs)
220+
if not getattr(v_lhs, compat)(v_rhs):
221+
concat_over.add(k)
222+
equals[k] = False
223+
# computed variables are not to be re-computed
224+
# again in the future
225+
for ds, v in zip(datasets[1:], computed):
226+
ds.variables[k].data = v.data
227+
break
228+
else:
229+
equals[k] = True
212230

213231
elif opt == "all":
214232
concat_over.update(
@@ -370,7 +388,7 @@ def ensure_common_dims(vars):
370388
result = result.set_coords(coord_names)
371389
result.encoding = result_encoding
372390

373-
result = result.drop(unlabeled_dims, errors="ignore")
391+
result = result.drop_vars(unlabeled_dims, errors="ignore")
374392

375393
if coord is not None:
376394
# add concat dimension last to ensure that its in the final Dataset

xarray/core/dataarray.py

Lines changed: 54 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
TypeVar,
1717
Union,
1818
cast,
19-
overload,
2019
)
2120

2221
import numpy as np
@@ -54,7 +53,7 @@
5453
from .indexes import Indexes, default_indexes
5554
from .merge import PANDAS_TYPES
5655
from .options import OPTIONS
57-
from .utils import Default, ReprObject, _default, _check_inplace, either_dict_or_kwargs
56+
from .utils import Default, ReprObject, _check_inplace, _default, either_dict_or_kwargs
5857
from .variable import (
5958
IndexVariable,
6059
Variable,
@@ -250,7 +249,7 @@ class DataArray(AbstractArray, DataWithCoords):
250249
Dictionary for holding arbitrary metadata.
251250
"""
252251

253-
_accessors: Optional[Dict[str, Any]]
252+
_accessors: Optional[Dict[str, Any]] # noqa
254253
_coords: Dict[Any, Variable]
255254
_indexes: Optional[Dict[Hashable, pd.Index]]
256255
_name: Optional[Hashable]
@@ -1891,41 +1890,72 @@ def transpose(self, *dims: Hashable, transpose_coords: bool = None) -> "DataArra
18911890
def T(self) -> "DataArray":
18921891
return self.transpose()
18931892

1894-
# Drop coords
1895-
@overload
1896-
def drop(
1897-
self, labels: Union[Hashable, Iterable[Hashable]], *, errors: str = "raise"
1893+
def drop_vars(
1894+
self, names: Union[Hashable, Iterable[Hashable]], *, errors: str = "raise"
18981895
) -> "DataArray":
1899-
...
1896+
"""Drop variables from this DataArray.
1897+
1898+
Parameters
1899+
----------
1900+
names : hashable or iterable of hashables
1901+
Name(s) of variables to drop.
1902+
errors: {'raise', 'ignore'}, optional
1903+
If 'raise' (default), raises a ValueError error if any of the variable
1904+
passed are not in the dataset. If 'ignore', any given names that are in the
1905+
DataArray are dropped and no error is raised.
1906+
1907+
Returns
1908+
-------
1909+
dropped : Dataset
1910+
1911+
"""
1912+
ds = self._to_temp_dataset().drop_vars(names, errors=errors)
1913+
return self._from_temp_dataset(ds)
19001914

1901-
# Drop index labels along dimension
1902-
@overload # noqa: F811
19031915
def drop(
1904-
self, labels: Any, dim: Hashable, *, errors: str = "raise" # array-like
1916+
self,
1917+
labels: Mapping = None,
1918+
dim: Hashable = None,
1919+
*,
1920+
errors: str = "raise",
1921+
**labels_kwargs,
19051922
) -> "DataArray":
1906-
...
1923+
"""Backward compatible method based on `drop_vars` and `drop_sel`
19071924
1908-
def drop(self, labels, dim=None, *, errors="raise"): # noqa: F811
1909-
"""Drop coordinates or index labels from this DataArray.
1925+
Using either `drop_vars` or `drop_sel` is encouraged
1926+
"""
1927+
ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
1928+
return self._from_temp_dataset(ds)
1929+
1930+
def drop_sel(
1931+
self,
1932+
labels: Mapping[Hashable, Any] = None,
1933+
*,
1934+
errors: str = "raise",
1935+
**labels_kwargs,
1936+
) -> "DataArray":
1937+
"""Drop index labels from this DataArray.
19101938
19111939
Parameters
19121940
----------
1913-
labels : hashable or sequence of hashables
1914-
Name(s) of coordinates or index labels to drop.
1915-
If dim is not None, labels can be any array-like.
1916-
dim : hashable, optional
1917-
Dimension along which to drop index labels. By default (if
1918-
``dim is None``), drops coordinates rather than index labels.
1941+
labels : Mapping[Hashable, Any]
1942+
Index labels to drop
19191943
errors: {'raise', 'ignore'}, optional
19201944
If 'raise' (default), raises a ValueError error if
1921-
any of the coordinates or index labels passed are not
1922-
in the array. If 'ignore', any given labels that are in the
1923-
array are dropped and no error is raised.
1945+
any of the index labels passed are not
1946+
in the dataset. If 'ignore', any given labels that are in the
1947+
dataset are dropped and no error is raised.
1948+
**labels_kwargs : {dim: label, ...}, optional
1949+
The keyword arguments form of ``dim`` and ``labels``
1950+
19241951
Returns
19251952
-------
19261953
dropped : DataArray
19271954
"""
1928-
ds = self._to_temp_dataset().drop(labels, dim, errors=errors)
1955+
if labels_kwargs or isinstance(labels, dict):
1956+
labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
1957+
1958+
ds = self._to_temp_dataset().drop_sel(labels, errors=errors)
19291959
return self._from_temp_dataset(ds)
19301960

19311961
def dropna(

0 commit comments

Comments
 (0)