Skip to content

drop_indexes is reversed by assign_coords of unrelated coord #7885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
mjwillson opened this issue May 30, 2023 · 2 comments · Fixed by #8094
Closed
4 tasks done

drop_indexes is reversed by assign_coords of unrelated coord #7885

mjwillson opened this issue May 30, 2023 · 2 comments · Fixed by #8094

Comments

@mjwillson
Copy link

What happened?

I dropped an index on one coord, then later called assign_coords to change another unrelated coord.
I expected the index on the original coord to stay dropped.

What did you expect to happen?

The index was silently created again.

Minimal Complete Verifiable Example

import xarray
import numpy as np
ds = xarray.Dataset(
    {'foo': (('x','y'), np.ones((3,5)))},
    coords={'x': [1,2,3], 'y': [4,5,6,7,8]})
ds = ds.drop_indexes('x')
assert 'x' not in ds.indexes
ds = ds.assign_coords(y=ds.y+1)
assert 'x' not in ds.indexes  # Fails

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

In general it would be nice if xarray made it easier to avoid indexes being automatically created.

E.g. right now, as far as I can tell there's no way to avoid an index being created when you construct a DataArray or Dataset with a coordinate of the same name as a dimension.

Admittedly I have a slightly niche use case -- I'm using xarray with wrapped JAX arrays, which can't be converted into pandas indexes. Indexes being (re-)created in these cases isn't just an inconvenience it actually causes a crash.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 (main, Dec 7 2022, 13:47:07) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 6.1.20-2rodete1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.8 libnetcdf: 4.9.0

xarray: 999
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.0
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.7.0
Nio: None
zarr: 2.13.6+ds
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.4
cfgrib: 0.9.10.3
iris: None
bottleneck: 1.3.5
dask: None
distributed: None
matplotlib: 3.6.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip3: None
conda: None
pytest: 7.2.1
mypy: None
IPython: 8.5.0
sphinx: 5.3.0

@mjwillson mjwillson added bug needs triage Issue that has not been reviewed by xarray team member labels May 30, 2023
@dcherian dcherian added topic-indexing needs triage Issue that has not been reviewed by xarray team member and removed needs triage Issue that has not been reviewed by xarray team member labels Jun 1, 2023
@benbovy
Copy link
Member

benbovy commented Jun 20, 2023

Thanks for the report @mjwillson. I suspect that assign_coords() re-create default indexes for the dimension coordinates but indeed it shouldn't do it.

E.g. right now, as far as I can tell there's no way to avoid an index being created when you construct a DataArray or Dataset with a coordinate of the same name as a dimension.

#7368 allows constructing a new Dataset or DataArray with no default index created for the dimension coordinates. I need to finish it (it is almost ready).

@benbovy
Copy link
Member

benbovy commented Jun 20, 2023

Admittedly I have a slightly niche use case -- I'm using xarray with wrapped JAX arrays, which can't be converted into pandas indexes. Indexes being (re-)created in these cases isn't just an inconvenience it actually causes a crash.

I'm curious, would your use case benefit from a custom (non-pandas) index built from the JAX array(s)? Or you don't need any index at all? Depending on that, it might be worth adding your use case to #7041.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants