Skip to content

ChainMap.__contains__ and .get performance improvement. #118932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dg-pb opened this issue May 10, 2024 · 10 comments
Closed

ChainMap.__contains__ and .get performance improvement. #118932

dg-pb opened this issue May 10, 2024 · 10 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@dg-pb
Copy link
Contributor

dg-pb commented May 10, 2024

Feature or enhancement

Proposal:

import collections as coll


class ChainMap2(coll.ChainMap):
    def __contains__(self, key):
        for mapping in self.maps:
            if key in mapping:
                return True
        return False

    def get(self, key, default=None):
        for mapping in self.maps:
            if key in mapping:
                return mapping[key]
        return default


maps = [dict(a=1), dict(a=2, b=2), dict(a=3, b=2, c=3)]
cm = coll.ChainMap(*maps)
cm2 = ChainMap2(*maps)


%timeit 'a' in cm               # 615 ns
%timeit 'c' in cm               # 752 ns
%timeit cm.get('a')             # 776 ns
%timeit cm.get('c')             # 1.46 µs

%timeit 'a' in cm2              # 140 ns
%timeit 'c' in cm2              # 198 ns
%timeit cm2.get('a')            # 147 ns
%timeit cm2.get('c')            # 208 ns

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/collections-chainmap-get-performance/41925

Linked PRs

@dg-pb dg-pb added the type-feature A feature request or enhancement label May 10, 2024
@pochmann3
Copy link
Contributor

I'm curious: how long does cm2['c'] take?

@dg-pb
Copy link
Contributor Author

dg-pb commented May 10, 2024

621 ns
Can't think of a way to improve it as it needs to be sensitive to __missing__ implementations.

@rhettinger rhettinger added performance Performance or resource usage and removed type-feature A feature request or enhancement labels May 11, 2024
@rhettinger rhettinger self-assigned this May 11, 2024
@rhettinger
Copy link
Contributor

The change to __contains__ is a good idea. It's a bummer that any() with a generator expression does not perform well.

The get method should be left as-is. The edit alters the semantics so that a override of __contains__ in a ChainMap subclass would be bypassed.

@dg-pb
Copy link
Contributor Author

dg-pb commented May 11, 2024

Makes sense, can't think of any way to keep that and not to iterate twice.

At least updating __contains__ will boost performance of get too.

Should I issue PR or leave it to you?

@nineteendo
Copy link
Contributor

How much slower is this? https://stackoverflow.com/a/44803103

any(True for m in self.maps if key in m)

@dg-pb
Copy link
Contributor Author

dg-pb commented May 11, 2024

How much slower is this? https://stackoverflow.com/a/44803103

any(True for m in self.maps if key in m)

Slightly faster than the original:

 %timeit 'a' in cm2    # 589 ns
%timeit 'c' in cm2   # 671 ns

@nineteendo
Copy link
Contributor

... but slower than an actual for loop.

@dg-pb
Copy link
Contributor Author

dg-pb commented May 11, 2024

%timeit any(map(opr.call, map(opr.attrgetter(' __contains__'), cm2.maps), itl.cycle([key])))
%timeit any(map(opr.methodcaller('__contains__', key), cm2.maps))

786 ns & 500 ns

@pochmann3
Copy link
Contributor

pochmann3 commented May 11, 2024

any(map(opr.call, map(opr.attrgetter(' __contains__'), cm2.maps), itl.cycle([key])))

Looks like a complicated version of:

any(map(opr.contains, cm2.maps, itl.repeat(key)))

@dg-pb
Copy link
Contributor Author

dg-pb commented May 11, 2024

Looks like a complicated version of:

any(map(opr.contains, cm2.maps, itl.repeat(key)))

Which is the best I have seen without loop:
In [16]: %timeit 'a' in cm2 # 360 ns
In [17]: %timeit 'c' in cm2 # 424 ns

bdraco added a commit to home-assistant/core that referenced this issue Aug 3, 2024
On slower systems, there is a noticable stall right after startup
when all the discoveries hit if there are a lot of discoveries.
While we have imporved this by avoding imports in the event loop,
this is still apparent on slower systems. The any expressions kept
coming up in the profile, but I had skipped over them, as I was
thinking they could not be optimized. After reading
python/cpython#118932, I realized
its the any expressions themselves that perform poorly.

This change unpacks the hot any expressions with the same
solution as python/cpython#118946
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

5 participants