You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Reading attributes from h5py-files is rather slow. So instead of retrieving it immediately I wanted to create a lazy dict-class that only retrieves the attribute values when necessary. But this is difficult to achieve since xarray keeps forcing the attrs to dicts in a lot of places.
should not be necessary as attrs from variables/dataarrays/datasets have already been forced to dicts when they were initialized.
Describe alternatives you've considered
One could lazify with dicts as well, for example by replacing the value with a function. This however won't look good in reprs, that's why having a convienence class is nice.
dict(LazyDict) always forces to dict, it does not let it pass through unchanged even if isinstance(LazyDict, dict) == True.
I appreciate the concern here, but I'm not sure we want to relax this constraint. Using built-in Python dict objects simplifies Xarray's internal logic considerably.
Could you talk a little bit more about your use-case and why you need lazy attributes? How many attributes are in your HDF5 files and how slow are they to load? Have you considered alternative file formats?
I'm not so sure it simplifies that considerably. The linked PR is the minimal changes I had to do to get it working for my use cases and most of the changes were just removing unneccessary dict(x). I admittedly haven't checked every part of the code yet though.
My files have 2000+ variables with each variable having like 8 attributes. It starts taking a while when you have to read each one of those.
At the moment, reading from file to Dataset takes about 2s, 600ms of those were reading attributes.
With the PR I got it down to 200ms. Not as much as I'd hoped but I think I can get my LazyDict implementation much faster.
Changing file formats is too large of a change. We have used hdf5-files for many years and just switching to a different file format is just not something you do in painless way without (fast) backwards compatible alternative. It's hard to motivate a switch to xarray if the old alternative reads in files faster.
Is your feature request related to a problem? Please describe.
Reading attributes from h5py-files is rather slow. So instead of retrieving it immediately I wanted to create a lazy dict-class that only retrieves the attribute values when necessary. But this is difficult to achieve since xarray keeps forcing the attrs to dicts in a lot of places.
Describe the solution you'd like
xarray/xarray/core/variable.py
Line 865 in dddac11
xarray/xarray/core/dataset.py
Line 798 in dddac11
asdict(value)
function that checks if the input is a valid dict-like, if not convert to dict. Things that might be good to check:MutableMapping
hasattr(dict_like, "copy")
isinstance(dict_like, dict) == True
xarray/xarray/core/merge.py
Line 523 in dddac11
Describe alternatives you've considered
dict(LazyDict)
always forces to dict, it does not let it pass through unchanged even ifisinstance(LazyDict, dict) == True
.Interesting reading:
https://stackoverflow.com/questions/16669367/setup-dictionary-lazily
https://stackoverflow.com/questions/3387691/how-to-perfectly-override-a-dict
The text was updated successfully, but these errors were encountered: