Skip to content

Set one-dimensional data variable as dimension coordinate? #2461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nedclimaterisk opened this issue Oct 4, 2018 · 13 comments
Closed

Set one-dimensional data variable as dimension coordinate? #2461

nedclimaterisk opened this issue Oct 4, 2018 · 13 comments

Comments

@nedclimaterisk
Copy link
Contributor

Code Sample

I have this dataset, and I'd like to make it indexable by time:

<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Dimensions without coordinates: station_observations
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

Problem description

I expected to be able to use ds.set_coords to make the time variable an indexable coordinate. The variable IS converted to a coordinate, but it is not a dimension coordinate, so I can't index with it. I can use assign_coords(station_observations=ds.time) to make station_observations indexable by time, but then the name in semantically wrong, and the time variable still exists, which makes the code harder to maintain.

Expected Output

ds.set_coords('time', inplace=True)
<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Coordinates:
    time                   (station_observations) datetime64[ns] ...
Dimensions without coordinates: station_observations
Data variables:
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

In [95]: ds.sel(time='1896')
ValueError: dimensions or multi-index levels ['time'] do not exist

with assign_coords:

In [97]: ds=ds.assign_coords(station_observations=ds.time)

In [98]: ds.sel(station_observations='1896')
Out[98]: 
<xarray.Dataset>
Dimensions:                (station_observations: 366)
Coordinates:
  * station_observations   (station_observations) datetime64[ns] 1896-01-01 ...
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

works correctly, but looks ugly. It would be nice if the time variable could be assigned as a dimension directly. I can drop the time variable and rename the station_observations, but it's a little annoying to do so.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.0-041600-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.13.3
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.0
cyordereddict: None
dask: 0.16.0
distributed: None
matplotlib: 2.1.1
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None

@fujiisoup
Copy link
Member

Hi @nedclimaterisk.
Thanks for the raising an issue.

In that case, you can use swap_dims,

In [1]: import xarray as xr
   ...: ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c'])})
   ...: ds
   ...: 
   ...: 
Out[1]: 
<xarray.Dataset>
Dimensions:  (i: 3)
Dimensions without coordinates: i
Data variables:
    x        (i) int64 0 1 2
    y        (i) <U1 'a' 'b' 'c'

In [2]: ds.swap_dims({'i': 'x'})
Out[2]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 0 1 2
Data variables:
    y        (x) <U1 'a' 'b' 'c'

@fujiisoup
Copy link
Member

I'm closing this issue, but if you have further issue, do not hesitate to reopen this.

@nedclimaterisk
Copy link
Contributor Author

Awesome, thank you @fujiisoup.

It might be worth putting a "see also" note in the assign_coords and set_coords documentation for this. I tried searching quite a bit, but did not find this.

@fujiisoup
Copy link
Member

It might be worth putting a "see also" note in the assign_coords and set_coords documentation for this. I tried searching quite a bit, but did not find this.

Thanks for the suggestion. It sounds a good idea. Do you mind to send a PR for this?

@M-Harrington
Copy link

M-Harrington commented Nov 21, 2019

@fujiisoup This method works when trying to add a single coordinate, but what about when you're trying to add multiple coordinates? Example:

import xarray as xr
ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c']),'z': ('i', ['a', 'b', 'c'])})
ds= ds.set_coords(['y','z'])
ds.swap_dims({'i':'y','i':'z'}) #doesn't work

ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c']),'z': ('i', ['a', 'b', 'c'])})
ds= ds.set_coords('y')
ds=ds.swap_dims({'i':'y'}) 
ds.set_coords('z') #doesn't work either

Is there any reason that this is the default behavior? This is a bit frustrating to work with after creating an xarray dataset from pandas.

@dcherian
Copy link
Contributor

ds.rename_dims({"i": "y"})

image

Is this what you want?

@M-Harrington
Copy link

M-Harrington commented Nov 21, 2019

@dcherian not quite because I want z and y to both have the star next to them (or bolded in your screenshot) so that they're proper coordinates. I likewise thought that the answer would be as simple as:

data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z})

But again, I haven't gotten anywhere this way either and getting the error ValueError: coordinate y has dimensions ('y',), but these are not a subset of the DataArray dimensions ('dim_0',)

@dcherian
Copy link
Contributor

So that's not possible. You can't have both y and z be 1D coordinate variables for x since x is 1D.

What are you ultimately trying to do after the conversion to DataArray?

@M-Harrington
Copy link

M-Harrington commented Nov 21, 2019

This also doesn't work feeding y and z as data.y.values and data.z.values which are 1d arrays.

Ultimately merge to another dataset with the same coordinates. Seems like there's something obvious I'm missing here but I haven't been able to figure out what it is.

Ah I see in this example I need a dataset that's 3x3, let me fix the example and see if it's still relevant to my issue

@keewis
Copy link
Collaborator

keewis commented Nov 21, 2019

to get your example to work, use this:

data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z}, dims="y")

to get both as dimensions, use

df = pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
ds = df.set_index(["y", "z"]).to_xarray()

@M-Harrington
Copy link

This worked perfectly, thanks so much!

@keewis
Copy link
Collaborator

keewis commented Nov 21, 2019

just note that in the end the result is still 2D with the missing values filled with nan

@M-Harrington
Copy link

Right that's actually desired behavior to begin with so this works out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants