Skip to content

Coord name not set when concating along a DataArray #5240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
darikg opened this issue May 1, 2021 · 4 comments · Fixed by #5611
Closed

Coord name not set when concating along a DataArray #5240

darikg opened this issue May 1, 2021 · 4 comments · Fixed by #5611
Labels
topic-combine combine/concat/merge

Comments

@darikg
Copy link
Contributor

darikg commented May 1, 2021

from xarray import DataArray, concat
a = DataArray([0], dims='a')
out = concat([a, a], dim=DataArray([0, 1], dims='b'))
print(out.coords)

Coordinates:

None (b) int32 0 1

I would've expected the name of the new coordinate to be the name of the DataArray variable but instead it's None.

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 04:59:43) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: English_United States.1252
libhdf5: None
libnetcdf: None

xarray: 0.17.0
pandas: 1.2.4
numpy: 1.20.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.1.1
conda: None
pytest: None
IPython: 7.23.0
sphinx: None

@keewis
Copy link
Collaborator

keewis commented May 1, 2021

would've expected the name of the new coordinate to be the name of the DataArray variable

that's exactly what it's doing: it uses the name of DataArray([0, 1], dims='b'), which is None. To fix that, you can assign a name to the DataArray:

In [3]: a = xr.DataArray([0], dims='a')
   ...: out = xr.concat([a, a], dim=xr.DataArray([0, 1], dims='b', name="x"))
   ...: out
Out[3]: 
<xarray.DataArray (b: 2, a: 1)>
array([[0],
       [0]])
Coordinates:
    x        (b) int64 0 1
Dimensions without coordinates: b, a

Since None is not a particularly good name, we could raise for unnamed DataArray objects (not sure if falling back to dims for 1D DataArray objects would be a good idea)

@darikg
Copy link
Contributor Author

darikg commented May 1, 2021

Thanks @keewis! I was confusing dimension names and variable names. I would support raising or falling back to a reasonably sane default -- the reason I stumbled on this was having a None coord name was breaking the _repr_html in Jupyter and causing a much more confusing error message

@keewis
Copy link
Collaborator

keewis commented May 1, 2021

The repr will be fixed by #5149, but we should probably still fix that. Since dim is the dimension to concatenate along I guess we should fall back to that (not sure though). Thoughts, @pydata/xarray?

For reference, the conversion happens in _calc_concat_dim_coord ("else"):

def _calc_concat_dim_coord(dim):
"""
Infer the dimension name and 1d coordinate variable (if appropriate)
for concatenating along the new dimension.
"""
from .dataarray import DataArray
if isinstance(dim, str):
coord = None
elif not isinstance(dim, (DataArray, Variable)):
dim_name = getattr(dim, "name", None)
if dim_name is None:
dim_name = "concat_dim"
coord = IndexVariable(dim_name, dim)
dim = dim_name
elif not isinstance(dim, DataArray):
coord = as_variable(dim).to_index_variable()
(dim,) = coord.dims
else:
coord = dim
(dim,) = coord.dims
return dim, coord

@max-sixty
Copy link
Collaborator

Agree that this should coerce to the name of the dim if the array has no name.

Technically should the kwarg of concat be coord rather than dim? I generally supply dim names there, rather than coords vars. (though I'm not suggesting we actually change it given the backward-compat).

Note that we already do this in the DataArray constructor:

In [12]: a = xr.DataArray([0,1], coords=[xr.DataArray([1,2])], dims='b')   # No name on the coord

In [13]: a
Out[13]:
<xarray.DataArray (b: 2)>
array([0, 1])
Coordinates:
  * b        (b) int64 1 2    # Yes name on the coord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-combine combine/concat/merge
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants