Skip to content

Allow .attrs to support any dict-likes #5667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Aug 3, 2021

  • Closes Allow .attrs to use dict-likes #5655
  • Tests added
  • Passes pre-commit run --all-files
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@pep8speaks
Copy link

pep8speaks commented Aug 3, 2021

Hello @Illviljan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-31 09:57:53 UTC

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2021

Unit Test Results

         6 files           6 suites   56m 18s ⏱️
16 290 tests 14 550 ✔️ 1 739 💤 1
90 936 runs  82 737 ✔️ 8 198 💤 1

For more details on these failures, see this check.

Results for commit 650ce22.

♻️ This comment has been updated with latest results.

@Illviljan Illviljan closed this Aug 3, 2021
@Illviljan Illviljan reopened this Aug 3, 2021
@Illviljan Illviljan changed the title .attrs supports dict like Allow .attrs to supports any dict-likes Aug 3, 2021
@Illviljan Illviljan changed the title Allow .attrs to supports any dict-likes Allow .attrs to support any dict-likes Aug 3, 2021
@@ -520,9 +520,9 @@ def merge_attrs(variable_attrs, combine_attrs, context=None):
elif combine_attrs == "drop":
return {}
elif combine_attrs == "override":
return dict(variable_attrs[0])
return variable_attrs[0].copy()
Copy link
Contributor Author

@Illviljan Illviljan Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both dict(x) and x.copy() are shallow copies.
Is it a good idea for it to be shallow though? Seems a little scary to me if the attrs happens to include mutable objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hear that, but IIUC that's what python generally does (I don't have a personally confident view though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge had a nice test that was relevant to this:

def test_merge_attrs_override_copy(self):

This seems to imply that you would want to be able to tweak the attrs without worrying about other attrs changing. But the test is only made with immutable objects however. Lets do a test with:

        ds1 = xr.Dataset(attrs={"x": [0, 1]})
        ds2 = xr.Dataset(attrs={"x": 1})
        ds3 = xr.merge([ds1, ds2], combine_attrs="override")
        ds3.attrs["x"][0] = 2
        # assert ds1.x == [0, 1]
        print(ds1.x) # [2, 1]

That seems like surprising behaviour to me, but it's indeed also how python does this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Copy-on-Write would be ideal but that's not python's strength...

@Illviljan
Copy link
Contributor Author

Illviljan commented Aug 5, 2021

Some fun performance comparisons related to copying and initializing dicts:

a = dict(a=2, b=3)

%timeit dict(a)
207 ns ± 3.41 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit a.copy()
82.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

import copy
%timeit copy.copy(a)
313 ns ± 3.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

from copy import copy
%timeit copy(a)
290 ns ± 3.63 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

from copy import deepcopy
%timeit deepcopy(a)
3.39 µs ± 55.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using a.copy() seems to be the way to go if you want to do a shallow copy of a dict.

@max-sixty
Copy link
Collaborator

Using a.copy() seems to be the way to go if you want to do a shallow copy of a dict.

That is an interesting result (even aside from the main result here, I'm not sure what python is doing such that copy.copy(a) has different performance from copy(a)!)

But python is slow, and nanos are short — unless there's a noticeable impact on the overall performance, then prioritizing flexibility and compatibility are more consistent with the goals of the library. Does that make sense?

@Illviljan
Copy link
Contributor Author

I'm surprised about the deepcopy being so slow too, I thought it would be similar in speed in this case and just increase if dealing with mutable objects.

But using .copy is 100% compatible with how attrs has behaved before. So we come back to the question what type attrs should be?
The options I see is going with dict or MutableMapping.

I'm starting to lean towards mutablemapping because subclassing dict has been rather difficult compared to mutablemapping.

And if we go with mutablemapping then we should use copy.copy.

@max-sixty
Copy link
Collaborator

Is python/mypy#3004 still an issue?

pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a MutableMapping in that function?

@Illviljan
Copy link
Contributor Author

Is python/mypy#3004 still an issue?

pre-commit suggests it's here: https://github.com/pydata/xarray/pull/5667/files#diff-3c0ce7941684cbac55c00ab890684f86acc1de1908ee2afa915dbcb7c944105aR100 — but I guess there's some reason we can't only accept a MutableMapping in that function?

Yes, it is still an issue. I've cheated though and used type: ignore on a few places, that's why its been passing the checks.

Mappings (in the form of FrozenDict) are used to initialize .attrs in xr.open_dataset for example. So we can't unfortunately accept MutableMappings only.

Does pyright handle properties with setters and getters?

@Illviljan
Copy link
Contributor Author

Illviljan commented Aug 20, 2021

Here's further tests to check how fast different class checkers are:

from typing import MutableMapping


class Test2(MutableMapping):
    def __init__(self, *args, **kwargs):
        self.data = dict(*args, **kwargs)

    def __getitem__(self, key):
        pass

    def __setitem__(self, key, value):
        pass

    def __delitem__(self, key):
        pass

    def __iter__(self):
        pass

    def __len__(self):
        pass

b = Test2()

%timeit issubclass(type(b), MutableMapping)
711 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit isinstance(b, MutableMapping)
853 ns ± 6.29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

# If you want to get really fast you can check for one of the required attributes MutableMapping has 
%timeit hasattr(b, "update")
82.6 ns ± 0.181 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

isinstance is rather slow as can be seen. Considering just doing dict(b) takes about 200ns which is basically what the original implementation was it doesn't feel that good to add a check that adds 800ns of wait time.

@Illviljan Illviljan mentioned this pull request Jan 5, 2022
5 tasks
@headtr1ck
Copy link
Collaborator

If that is still an open issue we could merge current main, try to fix the resulting typing problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow .attrs to use dict-likes
4 participants