-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add new consider-using-generator
checker Issue #3165
#3309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1 similar comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks for the contribution.
The only important change is the one about the message description. The code is all good though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice new checker, I had to talk about this problem in a code review recently it would have saved some time.
@ikraduya can you rebase on the latest master and take Ashley's remark into account ? :) We're going to release 2.7 soon, now is the time if you want this feature to make it ;) |
Sure, thanks for reminding |
I have rebased on the latest master and edit the message description. Please review @Pierre-Sassoulas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a two minor comments before merging, thank you for reacting so fast ;) !
@@ -684,18 +708,40 @@ def _check_consider_using_comprehension_constructor(self, node): | |||
message_name = "consider-using-set-comprehension" | |||
self.add_message(message_name, node=node) | |||
|
|||
def _check_consider_using_generator(self, node): | |||
checked_call = ["any", "all"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ngie-eign comment was relevant here #3309 (comment)
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, I have tested those.
The result is slower with the generator.
list:
python -m timeit "some_list = list(range(1000));b = list([x % 7 == 0 for x in some_list]);"
# output: 10000 loops, best of 3: 79.4 usec per loop
python -m timeit "some_list = list(range(1000));b = list(x % 7 == 0 for x in some_list);"
# output: 10000 loops, best of 3: 99.4 usec per loop
set:
python -m timeit "some_list = list(range(1000));b = set([x % 7 == 0 for x in some_list]);"
# output: 10000 loops, best of 3: 89.1 usec per loop
python -m timeit "some_list = list(range(1000));b = set(x % 7 == 0 for x in some_list);"
# output: 10000 loops, best of 3: 104 usec per loop
tuple:
python -m timeit "some_list = list(range(1000));b = tuple([x % 7 == 0 for x in some_list]);"
# output: 10000 loops, best of 3: 80.3 usec per loop
python -m timeit "some_list = list(range(1000));b = tuple(x % 7 == 0 for x in some_list);"
# output: 10000 loops, best of 3: 99.6 usec per loop
compared to any
and all
any:
python -m timeit "some_list = list(range(1000));b = any([x % 7 == 0 for x in some_list]);"
# output: 10000 loops, best of 3: 74.2 usec per loop
python -m timeit "some_list = list(range(1000));b = any(x % 7 == 0 for x in some_list);"
# output: 100000 loops, best of 3: 17 usec per loop
all:
python -m timeit "some_list = list(range(1000));b = all([x % 7 == 0 for x in some_list]);"
# output: 10000 loops, best of 3: 84.6 usec per loop
python -m timeit "some_list = list(range(1000));b = all(x % 7 == 0 for x in some_list);"
# output: 100000 loops, best of 3: 16.5 usec per loop
When I checked the python docs, any
and all
take iterable
as the parameter. But list
, set
, and tuple
takes [iterable]
. And there isn't any other function that takes iterable
. So this 'consider-using-generator' still only relevant with any
and all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok it makes senses.
checked_call = ["any", "all"] | |
# We only check 'any' and 'all' because for list, set, and tuple a generator performs worse | |
# See https://github.com/PyCQA/pylint/pull/3309#discussion_r576683109 | |
checked_call = ["any", "all"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the timing concerns, but when it boils down to it, it's the unnecessary data copies that I'm worried about with my original comment. With particularly large sequences or iterables, creating unnecessary copies really adds up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ikraduya: out of curiosity, what version of python was your output above tested on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we could create two separate messages. Actually for all
or any
there is nothing to 'consider', changing for a generator is a no-brainer, you can cut the execution tree and exit directly at the first element. Except in the worst possible case it's an easy performance win. So we could have a stronger use-generator-instead
message for this. Then for other container, maybe for really long lists or sets it would be worth it to use generator, but the question should be considered on a case by case basis. So the user should consider it and we could keep a consider-using-generator
for those. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngie-eign I am using python 3.6.9 version.
I have tested @ngie-eign codes using timeit
.
tuple(list(range(100)))
tuple(range(100))
set(list(range(100)))
set(range(100))
tuple(list(range(1000000)))
tuple(range(1000000))
set(list(range(1000000)))
set(range(1000000))
output:
1000000 loops, best of 3: 1.44 usec per loop
1000000 loops, best of 3: 0.927 usec per loop
100000 loops, best of 3: 2.72 usec per loop
1000000 loops, best of 3: 2.28 usec per loop
10 loops, best of 3: 45.9 msec per loop
10 loops, best of 3: 36.3 msec per loop
10 loops, best of 3: 93.9 msec per loop
10 loops, best of 3: 64.9 msec per loop
Clearly using the generator wins over the timing because the conversion is unnecessary. But the codes are not using list comprehension. I think we can create another case to check such unnecessary copies code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also profiled the memory for tuple
and set
when using list comprehension.
Codes:
a = set([x % 7 for x in range(100)])
a = set(x % 7 for x in range(100))
a = set([x % 7 for x in range(1000000)])
a = set(x % 7 for x in range(1000000))
a = tuple([x % 7 for x in range(100)])
a = tuple(x % 7 for x in range(100))
a = tuple([x % 7 for x in range(1000000)])
a = tuple(x % 7 for x in range(1000000))
Results:
Maximum resident set size (kbytes): 9584
Maximum resident set size (kbytes): 9556
Maximum resident set size (kbytes): 17020
Maximum resident set size (kbytes): 9556
Maximum resident set size (kbytes): 9584
Maximum resident set size (kbytes): 9572
Maximum resident set size (kbytes): 24936
Maximum resident set size (kbytes): 17816
It is clear that a very long list/set will get the benefit from using generator :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pierre-Sassoulas : I think your suggestion about creating additional suggestions for using generators judiciously with other containers makes a lot of sense (seems like a good candidate for an I class message).
I also think removing the “consider” part in the checker makes a lot of sense too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked_call = ["any", "all"] | |
checked_call = ["any", "all", "set", "list", "tuple"] |
Co-authored-by: Pierre Sassoulas <[email protected]>
Co-authored-by: Daniel Hahler <[email protected]>
Remove item from 2.5.rst Change `consider-using-generator` message ID
Co-authored-by: Pierre Sassoulas <[email protected]>
Co-authored-by: Daniel Hahler <[email protected]>
Remove item from 2.5.rst Change `consider-using-generator` message ID
There exist consider-using-set-comprehension check when the code is |
Thank you for implementing this! :) |
See my comment at pylint-dev/pylint#3309
Steps
doc/whatsnew/<current release.rst>
.Description
[Quoting from Issue #3165]
There are many cases of comprehensions inside of
any
orall
calls that are unnecessary and should be replaced by a generator.For example:
Instead, a generator would be less code, and way faster:
Speed comparisons:
any
andall
are functions that profit from that because the loop is breaking and can short cut; the remaining elements are not even looped over if the test returns a True in theany
function or a False in theall
function.This PR creates new
consider-using-generator
checker written in pylint/checkers/refactoring.py and add unit test consider_using_generator.py and consider_using_generator.txtType of Changes
Related Issue
Closes #3165