Skip to content

ds.mean('dim') drops strings dataarrays, even when the 'dim' is not dimension of the string dataarray #5368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rcaneill opened this issue May 25, 2021 · 2 comments · Fixed by #5393
Labels

Comments

@rcaneill
Copy link
Contributor

What happened:

I have a dataset along many dimensions, e.g. time and experiments. Some of the dataarrays only contain strings and only have the experiments as dimension. When I use ds.mean('time') I lose these dataarrays.

What you expected to happen:

I would expect that the mean on the full dataset would be similar that what happends with float dataarrays that don't contain the dimension: just return them unchanged. As shown in the minimal example attached, using ds.min produced the result I would expect.

Minimal Complete Verifiable Example:

Here da1 corresponds to my dataarrays that are lost. da3 produced what I would expect (same result no matter the data type).

import xarray as xr

da1 = xr.DataArray(['a','b']).rename('da1')
print(da1, '\n')

da3 = xr.DataArray([-1, -2]).rename('da3')
print(da3, '\n')

da2 = xr.DataArray([[0,1],[2,3]]).rename('da2')
print(da2, '\n')

ds = xr.merge([da1,da2,da3])
print(ds, '\n')

print('mean:', ds.mean('dim_1'))
print('min:', ds.min('dim_1'))

And the output is:

<xarray.DataArray 'da1' (dim_0: 2)>
array(['a', 'b'], dtype='<U1')
Dimensions without coordinates: dim_0 

<xarray.DataArray 'da3' (dim_0: 2)>
array([-1, -2])
Dimensions without coordinates: dim_0 

<xarray.DataArray 'da2' (dim_0: 2, dim_1: 2)>
array([[0, 1],
       [2, 3]])
Dimensions without coordinates: dim_0, dim_1 

<xarray.Dataset>
Dimensions:  (dim_0: 2, dim_1: 2)
Dimensions without coordinates: dim_0, dim_1
Data variables:
    da1      (dim_0) <U1 'a' 'b'
    da2      (dim_0, dim_1) int64 0 1 2 3
    da3      (dim_0) int64 -1 -2 

mean: <xarray.Dataset>
Dimensions:  (dim_0: 2)
Dimensions without coordinates: dim_0
Data variables:
    da2      (dim_0) float64 0.5 2.5
    da3      (dim_0) int64 -1 -2
min: <xarray.Dataset>
Dimensions:  (dim_0: 2)
Dimensions without coordinates: dim_0
Data variables:
    da1      (dim_0) <U1 'a' 'b'
    da2      (dim_0) int64 0 2
    da3      (dim_0) int64 -1 -2

I searched in the opened issues but haven't seen any similar one. I hope I did not miss anything there nor in the doc.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-50-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.0
libnetcdf: 4.7.4

xarray: 0.18.0
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.05.0
distributed: 2021.05.0
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 44.0.0
pip: 20.0.2
conda: None
pytest: 6.2.4
IPython: 7.23.1
sphinx: None

@max-sixty
Copy link
Collaborator

max-sixty commented May 26, 2021

Thanks for the clear issue and example @rcaneill .

I agree this looks like a bug. It's puzzling why it would only occur on mean rather than min.

@malmans2
Copy link
Contributor

Just ran into this bug, see #5393.
Hopefully no one was already working on it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants