Skip to content

Conversation

gerritholl
Copy link
Contributor

@gerritholl gerritholl commented Oct 5, 2020

The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied. This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.

The h5netcdf engine exposes single-valued attributes as arrays of shape
(1,), which is correct according to the NetCDF standard, but may cause
a problem when reading a value of shape () before the scale_factor and
add_offset have been applied.  This PR adds a check for the dimensionality
of add_offset and scale_factor and ensures they are scalar before they
are used for further processing, adds a unit test to verify that this
works correctly, and a note to the documentation to warn users of this
difference between the h5netcdf and netcdf4 engines.

Fixes pydata#4471.
@gerritholl
Copy link
Contributor Author

Is this bugfix notable enough to need a whats-new.rst entry?

For the unit test, I tried to construct an object that would emulate what is produced when reading a NetCDF4 file with the h5netcdf engine, but I gave up and settled for a temporary file instead. If this is an undesired approach, I could use some guidance in how to construct the appropriate object that will expose the problem.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerritholl I think a whats-new entry would be appropriate.

Comment on lines 108 to 112
There may be minor differences in the :py:class:`Dataset` object returned
when reading a NetCDF file with different engines. For example,
single-valued attributes are returned as scalars by the default
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
``engine=h5netcdf``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure we need to mention this, I think it sort of goes without saying that different backends may differ in minor ways.

Copy link
Contributor Author

@gerritholl gerritholl Oct 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user who understands much less deeply how backends interact with xarray, it did surprise me. Should I keep this note or remove it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes total sense but I wasn't aware either, so I'd leave it.

Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.
@gerritholl
Copy link
Contributor Author

If this makes more sense as an integration test than as a unit test (for which I need help, see other comment), should I mark the current test in some way and/or move it to a different source file?

Comment on lines 108 to 112
There may be minor differences in the :py:class:`Dataset` object returned
when reading a NetCDF file with different engines. For example,
single-valued attributes are returned as scalars by the default
``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
``engine=h5netcdf``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes total sense but I wasn't aware either, so I'd leave it.

Co-authored-by: Mathias Hauser <[email protected]>
@dcherian
Copy link
Contributor

Thanks @gerritholl

@dcherian dcherian merged commit 569a4da into pydata:master Oct 11, 2020
@gerritholl gerritholl deleted the decode-with-array-attributes branch October 16, 2020 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants