-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Encodings for object data types are not saved. #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -247,13 +247,13 @@ def encode_cf_variable(array): | |
dimensions = array.dimensions | ||
data = array.data | ||
attributes = array.attributes.copy() | ||
encoding = array.encoding | ||
encoding = array.encoding.copy() | ||
|
||
if isinstance(data, pd.DatetimeIndex): | ||
if np.issubdtype(data.dtype, np.datetime64): | ||
# DatetimeIndex objects need to be encoded into numeric arrays | ||
(data, units, calendar) = utils.datetimeindex2num(data, | ||
units=encoding.get('units', None), | ||
calendar=encoding.get('calendar', None)) | ||
units=encoding.pop('units', None), | ||
calendar=encoding.pop('calendar', None)) | ||
attributes['units'] = units | ||
attributes['calendar'] = calendar | ||
elif data.dtype == np.dtype('O'): | ||
|
@@ -327,7 +327,10 @@ def pop_to(source, dest, k): | |
if 'dtype' in encoding: | ||
if var.data.dtype != encoding['dtype']: | ||
raise ValueError("Refused to overwrite dtype") | ||
encoding['dtype'] = data.dtype | ||
if not isinstance(data, pd.Index): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How are we decoding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we are going to add a check here, I would prefer to write something more general like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm accessing an OpenDAP dataset and don't want to need to download all coordinates, so I slice out only the variables/subsets I'm interested in then decode. I imagine there are other use cases where this issue would crop up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using indices handles this specific situation of a Dataset being decode after some processing, if we're going to add a check of the dtype.kind != 'O' type then I would actually I would argue that the best check would be to check if the dtype is a valid NetCDF datatype. Also, had I decoded the dataset directly from OpenDAP it would download ALL the data first (because of some of the .data.dtype checks), not just coordinates so we're going to need some sort of fix along these lines. |
||
# When data is a pandas Index we assume the dtype will be | ||
# inferred during encode_cf_variable. | ||
encoding['dtype'] = data.dtype | ||
if np.issubdtype(data.dtype, (str, unicode)): | ||
# TODO: add some sort of check instead of just assuming that the last | ||
# dimension on a character array is always the string dimension | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you destructively updating the encoding? If we're doing that, we should definitely make a copy first (e.g.,
encoding = array.encoding.copy()
above).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed unfortunate to have redundant information in both attributes and encodings, but you're right I assumed encodings was being copied first.