-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Allow .attrs to support any dict-likes #5667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Allow .attrs to support any dict-likes #5667
Conversation
Hello @Illviljan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-10-31 09:57:53 UTC |
xarray/core/merge.py
Outdated
@@ -520,9 +520,9 @@ def merge_attrs(variable_attrs, combine_attrs, context=None): | |||
elif combine_attrs == "drop": | |||
return {} | |||
elif combine_attrs == "override": | |||
return dict(variable_attrs[0]) | |||
return variable_attrs[0].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both dict(x)
and x.copy()
are shallow copies.
Is it a good idea for it to be shallow though? Seems a little scary to me if the attrs happens to include mutable objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I hear that, but IIUC that's what python generally does (I don't have a personally confident view though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge had a nice test that was relevant to this:
xarray/xarray/tests/test_merge.py
Line 203 in 8f5b4a1
def test_merge_attrs_override_copy(self): |
This seems to imply that you would want to be able to tweak the attrs without worrying about other attrs changing. But the test is only made with immutable objects however. Lets do a test with:
ds1 = xr.Dataset(attrs={"x": [0, 1]})
ds2 = xr.Dataset(attrs={"x": 1})
ds3 = xr.merge([ds1, ds2], combine_attrs="override")
ds3.attrs["x"][0] = 2
# assert ds1.x == [0, 1]
print(ds1.x) # [2, 1]
That seems like surprising behaviour to me, but it's indeed also how python does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Copy-on-Write would be ideal but that's not python's strength...
Some fun performance comparisons related to copying and initializing dicts: a = dict(a=2, b=3)
%timeit dict(a)
207 ns ± 3.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit a.copy()
82.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
import copy
%timeit copy.copy(a)
313 ns ± 3.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
from copy import copy
%timeit copy(a)
290 ns ± 3.63 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
from copy import deepcopy
%timeit deepcopy(a)
3.39 µs ± 55.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Using |
That is an interesting result (even aside from the main result here, I'm not sure what python is doing such that But python is slow, and nanos are short — unless there's a noticeable impact on the overall performance, then prioritizing flexibility and compatibility are more consistent with the goals of the library. Does that make sense? |
I'm surprised about the deepcopy being so slow too, I thought it would be similar in speed in this case and just increase if dealing with mutable objects. But using .copy is 100% compatible with how I'm starting to lean towards mutablemapping because subclassing dict has been rather difficult compared to mutablemapping. And if we go with mutablemapping then we should use copy.copy. |
Is python/mypy#3004 still an issue? pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a |
Yes, it is still an issue. I've cheated though and used
Does |
Here's further tests to check how fast different class checkers are: from typing import MutableMapping
class Test2(MutableMapping):
def __init__(self, *args, **kwargs):
self.data = dict(*args, **kwargs)
def __getitem__(self, key):
pass
def __setitem__(self, key, value):
pass
def __delitem__(self, key):
pass
def __iter__(self):
pass
def __len__(self):
pass
b = Test2()
%timeit issubclass(type(b), MutableMapping)
711 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit isinstance(b, MutableMapping)
853 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# If you want to get really fast you can check for one of the required attributes MutableMapping has
%timeit hasattr(b, "update")
82.6 ns ± 0.181 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
|
for more information, see https://pre-commit.ci
If that is still an open issue we could merge current main, try to fix the resulting typing problems. |
pre-commit run --all-files
whats-new.rst
api.rst