align() outer join returns DataArrays that are all NaNs #2215

jjpr-mit · 2018-06-05T12:42:53Z

Code Sample, a copy-pastable example if possible

The problem occurs for me in the midst of a data-processing pipeline that starts with some ~40MB netCDF files. I've tried to create pasteable code that reproduces the behavior from scratch, but I haven't succeeded.

Problem description

I pass two DataArrays to xr.align() with join="outer". The DataArrays are dtype float64, and contain a mix of NaNs and floats. They are 2D and have MultiIndexes with some numeric and some string levels.

The tuple of DataArrays returned by align() have the correct shape and expected indexes, but the contents of the arrays are all NaNs. The original float values are gone. np.nonzero(~np.isnan(da)) returns an empty array.

I've set breakpoints and delved into the code. On line 656 in xarray.core.variable.Variable._getitem_with_mask, self contains non-NaN values, but the data returned by as_indexable(self._data)[actual_indexer] evaluates as all NaNs. However, data.array at that point (which is xarray.backends.netCDF4_.NetCDF4ArrayWrapper) has non-NaNs. So it's some sort of masking caused by the indexing that makes it look like data is all NaNs.

Expected Output

A tuple of DataArrays which contain some non-NaN values.

Output of `xr.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-116-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.3.2
IPython: 6.2.1
sphinx: None

The text was updated successfully, but these errors were encountered:

shoyer · 2018-06-05T15:28:45Z

Are you sure the indexes along the aligned dimensions match exactly? Small differences in floats are the most common source of this issue.

Try using second.reindex_like(first, method='nearest') instead of xarray.align(first, second).

jjpr-mit · 2018-06-05T15:42:15Z

I found a way to reproduce the error. One of the MuliIndex levels on the DataArrays has NaNs in it. If I remove that level, the correct values appear in the result. Should the presence of that MultiIndex level cause this behavior?

import string
import numpy as np
import xarray as xr

dims = ("x", "y")
shape = (10, 5)
das = []
for j in (0, 1):
  data = np.full(shape, np.nan, dtype="float64")
  for i in range(shape[0]):
      data[i, i % shape[1]] = float(i)
  coords_d = {
      "ints": ("x", range(j*shape[0], (j+1)*shape[0])),
      "nans": ("x", np.array([np.nan] * shape[0], dtype="float64")),
      "lower": ("y", list(string.ascii_lowercase[:shape[1]]))
  }
  da = xr.DataArray(data=data, dims=dims, coords=coords_d)
  da.set_index(append=True, inplace=True, x=["ints", "nans"], y=["lower"])
  das.append(da)
nonzeros_raw = [np.nonzero(~np.isnan(da)) for da in das]
print("nonzeros_raw: ")
print(nonzeros_raw)
aligned = xr.align(*das, join="outer")
nonzeros_aligned = [np.nonzero(~np.isnan(da)) for da in aligned]
print("nonzeros_aligned: ")
print(nonzeros_aligned)
assert nonzeros_raw[0].shape == nonzeros_aligned[0].shape

shoyer · 2018-06-05T15:50:04Z

Thanks for the example. Can you please identify exactly which behavior you find surprising, and what you think the result should be?

jjpr-mit · 2018-06-05T15:52:57Z

Since the align is an outer join, I would expect all the non-NaN values in the original DataArrays to also appear in the aligned DataArrays. Perhaps I am misinterpreting the behavior of join="outer".

jjpr-mit · 2018-06-05T15:59:57Z

For clarity, here are the prints of the arrays before and after alignment:

Before alignment:

[<xarray.DataArray (x: 10, y: 5)>
 array([[ 0., nan, nan, nan, nan],
        [nan,  1., nan, nan, nan],
        [nan, nan,  2., nan, nan],
        [nan, nan, nan,  3., nan],
        [nan, nan, nan, nan,  4.],
        [ 5., nan, nan, nan, nan],
        [nan,  6., nan, nan, nan],
        [nan, nan,  7., nan, nan],
        [nan, nan, nan,  8., nan],
        [nan, nan, nan, nan,  9.]])
 Coordinates:
   * x        (x) MultiIndex
   - ints     (x) int64 0 1 2 3 4 5 6 7 8 9
   - nans     (x) float64 nan nan nan nan nan nan nan nan nan nan
   * y        (y) object 'a' 'b' 'c' 'd' 'e', <xarray.DataArray (x: 10, y: 5)>
 array([[ 0., nan, nan, nan, nan],
        [nan,  1., nan, nan, nan],
        [nan, nan,  2., nan, nan],
        [nan, nan, nan,  3., nan],
        [nan, nan, nan, nan,  4.],
        [ 5., nan, nan, nan, nan],
        [nan,  6., nan, nan, nan],
        [nan, nan,  7., nan, nan],
        [nan, nan, nan,  8., nan],
        [nan, nan, nan, nan,  9.]])
 Coordinates:
   * x        (x) MultiIndex
   - ints     (x) int64 10 11 12 13 14 15 16 17 18 19
   - nans     (x) float64 nan nan nan nan nan nan nan nan nan nan
   * y        (y) object 'a' 'b' 'c' 'd' 'e']

After alignment:

(<xarray.DataArray (x: 20, y: 5)>
 array([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]])
 Coordinates:
   * x        (x) MultiIndex
   - ints     (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
   - nans     (x) object nan nan nan nan nan nan nan nan nan nan nan nan nan ...
   * y        (y) object 'a' 'b' 'c' 'd' 'e', <xarray.DataArray (x: 20, y: 5)>
 array([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]])
 Coordinates:
   * x        (x) MultiIndex
   - ints     (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
   - nans     (x) object nan nan nan nan nan nan nan nan nan nan nan nan nan ...
   * y        (y) object 'a' 'b' 'c' 'd' 'e')

shoyer · 2018-06-05T16:01:04Z

Since the align is an outer join, I would expect all the non-NaN values in the original DataArrays to also appear in the aligned DataArrays.

Sorry, I'm not quite following -- can we please give a specific example of which output from your example looks wrong, and print how it should look instead?

jjpr-mit · 2018-06-05T16:12:30Z

This is what I would expect to see returned by align():

(<xarray.DataArray (x: 20, y: 5)>
 array([[ 0., nan, nan, nan, nan],
        [nan,  1., nan, nan, nan],
        [nan, nan,  2., nan, nan],
        [nan, nan, nan,  3., nan],
        [nan, nan, nan, nan,  4.],
        [ 5., nan, nan, nan, nan],
        [nan,  6., nan, nan, nan],
        [nan, nan,  7., nan, nan],
        [nan, nan, nan,  8., nan],
        [nan, nan, nan, nan,  9.],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]])
 Coordinates:
 * x        (x) MultiIndex
 - ints     (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
 - nans     (x) object nan nan nan nan nan nan nan nan nan nan nan nan nan ...
 * y        (y) object 'a' 'b' 'c' 'd' 'e', <xarray.DataArray (x: 20, y: 5)>
 array([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [ 0., nan, nan, nan, nan],
        [nan,  1., nan, nan, nan],
        [nan, nan,  2., nan, nan],
        [nan, nan, nan,  3., nan],
        [nan, nan, nan, nan,  4.],
        [ 5., nan, nan, nan, nan],
        [nan,  6., nan, nan, nan],
        [nan, nan,  7., nan, nan],
        [nan, nan, nan,  8., nan],
        [nan, nan, nan, nan,  9.]])
 Coordinates:
 * x        (x) MultiIndex
 - ints     (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
 - nans     (x) object nan nan nan nan nan nan nan nan nan nan nan nan nan ...
 * y        (y) object 'a' 'b' 'c' 'd' 'e')

I see something very similar, but with the nans level removed, if I do this:
xr.align(*[da.reset_index("nans", drop=True) for da in das], join="outer")

shoyer · 2018-06-06T01:30:51Z

This what I see when printing aligned from your example:

In [26]: aligned
Out[26]:
(<xarray.DataArray (x: 20, y: 5)>
 array([[ 0., nan, nan, nan, nan],
        [nan,  1., nan, nan, nan],
        [nan, nan,  2., nan, nan],
        [nan, nan, nan,  3., nan],
        [nan, nan, nan, nan,  4.],
        [ 5., nan, nan, nan, nan],
        [nan,  6., nan, nan, nan],
        [nan, nan,  7., nan, nan],
        [nan, nan, nan,  8., nan],
        [nan, nan, nan, nan,  9.],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan]])
 Coordinates:
   * x        (x) MultiIndex
   - ints     (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
   - nans     (x) float64 nan nan nan nan nan nan nan nan nan nan nan nan nan ...
   * y        (y) object 'a' 'b' 'c' 'd' 'e', <xarray.DataArray (x: 20, y: 5)>
 array([[nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan],
        [ 0., nan, nan, nan, nan],
        [nan,  1., nan, nan, nan],
        [nan, nan,  2., nan, nan],
        [nan, nan, nan,  3., nan],
        [nan, nan, nan, nan,  4.],
        [ 5., nan, nan, nan, nan],
        [nan,  6., nan, nan, nan],
        [nan, nan,  7., nan, nan],
        [nan, nan, nan,  8., nan],
        [nan, nan, nan, nan,  9.]])
 Coordinates:
   * x        (x) MultiIndex
   - ints     (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
   - nans     (x) float64 nan nan nan nan nan nan nan nan nan nan nan nan nan ...
   * y        (y) object 'a' 'b' 'c' 'd' 'e')

The only material difference I can see in our environments is that I'm running pandas 0.23 and you're running pandas 0.22. Can you try updating pandas and see if that fixes the issue?

jjpr-mit · 2018-06-13T14:16:19Z

@shoyer That did it. Under pandas 0.22, the DataArrays in aligned are all NaNs. I updated to pandas 0.23, and the non-NaN values were there as expected. To double-check, I downgraded to 0.22 again and got all NaNs again.

shoyer · 2018-06-13T21:02:44Z

OK, great. I'm going to close this then, and simply recommend that anyone encounter this issue try upgrading pandas.

…ta/xarray#2215.

naomi-henderson mentioned this issue Jun 5, 2018

tolerance for alignment #2217

Open

shoyer closed this as completed Jun 13, 2018

jjpr-mit added a commit to brain-score/brainio_contrib that referenced this issue Oct 31, 2018

A much simplified way to reproduce the bug. Submitted as part of pyda…

dfb152f

…ta/xarray#2215.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

align() outer join returns DataArrays that are all NaNs #2215

align() outer join returns DataArrays that are all NaNs #2215

jjpr-mit commented Jun 5, 2018

shoyer commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018 •

edited

Loading

Uh oh!

shoyer commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018

Uh oh!

shoyer commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018

Uh oh!

shoyer commented Jun 6, 2018

Uh oh!

jjpr-mit commented Jun 13, 2018 •

edited

Loading

Uh oh!

shoyer commented Jun 13, 2018

Uh oh!

Uh oh!

align() outer join returns DataArrays that are all NaNs #2215

align() outer join returns DataArrays that are all NaNs #2215

Comments

jjpr-mit commented Jun 5, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of xr.show_versions()

shoyer commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018

Uh oh!

shoyer commented Jun 5, 2018

Uh oh!

jjpr-mit commented Jun 5, 2018

Uh oh!

shoyer commented Jun 6, 2018

Uh oh!

jjpr-mit commented Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Jun 13, 2018

Uh oh!

Output of `xr.show_versions()`

jjpr-mit commented Jun 5, 2018 •

edited

Loading

jjpr-mit commented Jun 13, 2018 •

edited

Loading