Skip to content

apply_ufunc erroneously operating on an empty array when dask used #3168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomNicholas opened this issue Jul 29, 2019 · 3 comments
Closed

Comments

@TomNicholas
Copy link
Member

Problem description

apply_ufunc with dask='parallelized' appears to be trying to act on an empty numpy array when the computation is specified, but before .compute() is called. In other words, a ufunc which just prints the shape of its argument will print (0,0) then print the correct shape once .compute() is called.

Minimum working example

import numpy as np
import xarray as xr


def example_ufunc(x):
    print(x.shape)
    return np.mean(x, axis=-1)

def new_mean(da, dim):
    result = xr.apply_ufunc(example_ufunc, da,
                            input_core_dims=[[dim]], dask='parallelized',
                            output_dtypes=[da.dtype])
    return result


shape = {'t': 2, 'x':3}
data = xr.DataArray(data=np.random.rand(*shape.values()), dims=shape.keys())
unchunked = data
chunked = data.chunk(shape)


actual = new_mean(chunked, dim='x')  # raises the warning
print(actual)

print(actual.compute())  # does the computation correctly

Result

(0, 0)
/home/tnichol/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
<xarray.DataArray (t: 2)>
dask.array<shape=(2,), dtype=float64, chunksize=(2,)>
Dimensions without coordinates: t
(2, 3)
<xarray.DataArray (t: 2)>
array([0.147205, 0.402913])
Dimensions without coordinates: t

Expected result

Same thing without the (0,0) or the numpy warning.

Output of xr.show_versions()

(my xarray is up-to-date with master)

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-862.14.4.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.1

xarray: 0.12.3+23.g1d7bcbd
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.1.0
distributed: 2.1.0
matplotlib: 3.1.0
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 40.6.2
pip: 18.1
conda: None
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2

@shoyer
Copy link
Member

shoyer commented Jul 29, 2019

The warning may have been fixed in the development version of dask by dask/dask#5103

The computing on empty arrays thing is due to the new meta tracking in dask 2.0, used for tracking the underlying type of arrays in a dask array. Dask supports setting meta explicitly inside blockwise. We might consider exposing this from xarray as well.

@rbavery
Copy link

rbavery commented Aug 13, 2019

I am not sure if this is related or not, but my dask array has a different shape before and after computing. After computing by converting to a numpy array, it looks like the time dimension (44) is still there, which is expected but I would also expect this to show in the xarray metadata.

result
<xarray.DataArray 'reflectance' (y: 1082, x: 1084)>
dask.array<shape=(1082, 1084), dtype=uint16, chunksize=(1082, 1084)>
Coordinates:
    band     int64 1
  * y        (y) float64 9.705e+05 9.705e+05 9.705e+05 ... 9.673e+05 9.672e+05
  * x        (x) float64 4.889e+05 4.889e+05 4.889e+05 ... 4.922e+05 4.922e+05

[87]
# the shape of the xarray and numpy array do not match after conversion to numpy array, the time dimension reappears
np.array(result).shape
(1082, 1084, 44) 

See: https://stackoverflow.com/questions/57419541/how-to-use-apply-ufunc-with-numpy-digitize-for-each-image-along-time-dimension-o

@dcherian
Copy link
Contributor

Closed by #3660

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants