Skip to content

Allow .attrs to use dict-likes #5655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Illviljan opened this issue Jul 31, 2021 · 2 comments · May be fixed by #5667
Open

Allow .attrs to use dict-likes #5655

Illviljan opened this issue Jul 31, 2021 · 2 comments · May be fixed by #5667
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@Illviljan
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Reading attributes from h5py-files is rather slow. So instead of retrieving it immediately I wanted to create a lazy dict-class that only retrieves the attribute values when necessary. But this is difficult to achieve since xarray keeps forcing the attrs to dicts in a lot of places.

Describe the solution you'd like

  • Replace in
    self._attrs = dict(value)
    and
    self._attrs = dict(value)
    with a asdict(value) function that checks if the input is a valid dict-like, if not convert to dict. Things that might be good to check:
    • MutableMapping
    • hasattr(dict_like, "copy")
    • isinstance(dict_like, dict) == True
  • Remove unneccessary conversions to dict. For example
    return dict(variable_attrs[0])
    should not be necessary as attrs from variables/dataarrays/datasets have already been forced to dicts when they were initialized.

Describe alternatives you've considered

  • One could lazify with dicts as well, for example by replacing the value with a function. This however won't look good in reprs, that's why having a convienence class is nice.
  • dict(LazyDict) always forces to dict, it does not let it pass through unchanged even if isinstance(LazyDict, dict) == True.

Interesting reading:
https://stackoverflow.com/questions/16669367/setup-dictionary-lazily
https://stackoverflow.com/questions/3387691/how-to-perfectly-override-a-dict

@Illviljan Illviljan linked a pull request Aug 3, 2021 that will close this issue
5 tasks
@shoyer
Copy link
Member

shoyer commented Aug 4, 2021

I appreciate the concern here, but I'm not sure we want to relax this constraint. Using built-in Python dict objects simplifies Xarray's internal logic considerably.

Could you talk a little bit more about your use-case and why you need lazy attributes? How many attributes are in your HDF5 files and how slow are they to load? Have you considered alternative file formats?

@Illviljan
Copy link
Contributor Author

I'm not so sure it simplifies that considerably. The linked PR is the minimal changes I had to do to get it working for my use cases and most of the changes were just removing unneccessary dict(x). I admittedly haven't checked every part of the code yet though.

My files have 2000+ variables with each variable having like 8 attributes. It starts taking a while when you have to read each one of those.
At the moment, reading from file to Dataset takes about 2s, 600ms of those were reading attributes.
With the PR I got it down to 200ms. Not as much as I'd hoped but I think I can get my LazyDict implementation much faster.

Changing file formats is too large of a change. We have used hdf5-files for many years and just switching to a different file format is just not something you do in painless way without (fast) backwards compatible alternative. It's hard to motivate a switch to xarray if the old alternative reads in files faster.

@andersy005 andersy005 added the topic-metadata Relating to the handling of metadata (i.e. attrs and encoding) label Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants