-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
align() outer join returns DataArrays that are all NaNs #2215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Are you sure the indexes along the aligned dimensions match exactly? Small differences in floats are the most common source of this issue. Try using |
I found a way to reproduce the error. One of the MuliIndex levels on the DataArrays has NaNs in it. If I remove that level, the correct values appear in the result. Should the presence of that MultiIndex level cause this behavior?
|
Thanks for the example. Can you please identify exactly which behavior you find surprising, and what you think the result should be? |
Since the align is an outer join, I would expect all the non-NaN values in the original DataArrays to also appear in the aligned DataArrays. Perhaps I am misinterpreting the behavior of |
For clarity, here are the prints of the arrays before and after alignment: Before alignment:
After alignment:
|
Sorry, I'm not quite following -- can we please give a specific example of which output from your example looks wrong, and print how it should look instead? |
This is what I would expect to see returned by align():
I see something very similar, but with the |
This what I see when printing
The only material difference I can see in our environments is that I'm running pandas 0.23 and you're running pandas 0.22. Can you try updating pandas and see if that fixes the issue? |
@shoyer That did it. Under pandas 0.22, the DataArrays in |
OK, great. I'm going to close this then, and simply recommend that anyone encounter this issue try upgrading pandas. |
Code Sample, a copy-pastable example if possible
The problem occurs for me in the midst of a data-processing pipeline that starts with some ~40MB netCDF files. I've tried to create pasteable code that reproduces the behavior from scratch, but I haven't succeeded.
Problem description
I pass two DataArrays to
xr.align()
withjoin="outer"
. The DataArrays are dtype float64, and contain a mix of NaNs and floats. They are 2D and have MultiIndexes with some numeric and some string levels.The tuple of DataArrays returned by
align()
have the correct shape and expected indexes, but the contents of the arrays are all NaNs. The original float values are gone.np.nonzero(~np.isnan(da))
returns an empty array.I've set breakpoints and delved into the code. On line 656 in
xarray.core.variable.Variable._getitem_with_mask
,self
contains non-NaN values, but thedata
returned byas_indexable(self._data)[actual_indexer]
evaluates as all NaNs. However,data.array
at that point (which isxarray.backends.netCDF4_.NetCDF4ArrayWrapper
) has non-NaNs. So it's some sort of masking caused by the indexing that makes it look likedata
is all NaNs.Expected Output
A tuple of DataArrays which contain some non-NaN values.
Output of
xr.show_versions()
xarray: 0.10.4
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: None
cyordereddict: None
dask: None
distributed: None
matplotlib: 2.1.2
cartopy: None
seaborn: None
setuptools: 38.4.0
pip: 9.0.1
conda: None
pytest: 3.3.2
IPython: 6.2.1
sphinx: None
The text was updated successfully, but these errors were encountered: