Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions src/xray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,13 +247,13 @@ def encode_cf_variable(array):
dimensions = array.dimensions
data = array.data
attributes = array.attributes.copy()
encoding = array.encoding
encoding = array.encoding.copy()

if isinstance(data, pd.DatetimeIndex):
if np.issubdtype(data.dtype, np.datetime64):
# DatetimeIndex objects need to be encoded into numeric arrays
(data, units, calendar) = utils.datetimeindex2num(data,
units=encoding.get('units', None),
calendar=encoding.get('calendar', None))
units=encoding.pop('units', None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you destructively updating the encoding? If we're doing that, we should definitely make a copy first (e.g., encoding = array.encoding.copy() above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed unfortunate to have redundant information in both attributes and encodings, but you're right I assumed encodings was being copied first.

calendar=encoding.pop('calendar', None))
attributes['units'] = units
attributes['calendar'] = calendar
elif data.dtype == np.dtype('O'):
Expand Down Expand Up @@ -327,7 +327,10 @@ def pop_to(source, dest, k):
if 'dtype' in encoding:
if var.data.dtype != encoding['dtype']:
raise ValueError("Refused to overwrite dtype")
encoding['dtype'] = data.dtype
if not isinstance(data, pd.Index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we decoding pd.Index objects? Are you decoding CF variables after you've already put them in a Dataset? Let's discuss how this came up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to add a check here, I would prefer to write something more general like if data.dtype.kind != 'O'. But it's not entirely clear to me why that's the right choice...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm accessing an OpenDAP dataset and don't want to need to download all coordinates, so I slice out only the variables/subsets I'm interested in then decode. I imagine there are other use cases where this issue would crop up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using indices handles this specific situation of a Dataset being decode after some processing, if we're going to add a check of the dtype.kind != 'O' type then I would actually I would argue that the best check would be to check if the dtype is a valid NetCDF datatype.

Also, had I decoded the dataset directly from OpenDAP it would download ALL the data first (because of some of the .data.dtype checks), not just coordinates so we're going to need some sort of fix along these lines.

# When data is a pandas Index we assume the dtype will be
# inferred during encode_cf_variable.
encoding['dtype'] = data.dtype
if np.issubdtype(data.dtype, (str, unicode)):
# TODO: add some sort of check instead of just assuming that the last
# dimension on a character array is always the string dimension
Expand Down