Add new `consider-using-generator` checker Issue #3165 #3309

ikraduya · 2019-12-17T14:24:20Z

Steps

Add yourself to CONTRIBUTORS if you are a new contributor.
Add a ChangeLog entry describing what your PR does.
If it's a new feature or an important bug fix, add a What's New entry in doc/whatsnew/<current release.rst>.
Write a good description on what the PR does.

Description

[Quoting from Issue #3165]
There are many cases of comprehensions inside of any or all calls that are unnecessary and should be replaced by a generator.
For example:

some_list = list(range(1000))
test_old = any([x % 7 == 0 for x in some_list])

Instead, a generator would be less code, and way faster:

test_new = any(x % 7 == 0 for x in some_list)

Speed comparisons:

64 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) # old
447 ns ± 3.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) # new

any and all are functions that profit from that because the loop is breaking and can short cut; the remaining elements are not even looped over if the test returns a True in the any function or a False in the all function.

This PR creates new consider-using-generator checker written in pylint/checkers/refactoring.py and add unit test consider_using_generator.py and consider_using_generator.txt

Type of Changes

	Type
	🐛 Bug fix
✓	✨ New feature
	🔨 Refactoring
	📜 Docs

Related Issue

Closes #3165

coveralls · 2019-12-17T14:40:14Z

Coverage increased (+0.09%) to 89.812% when pulling 25a92f1 on ikraduya:Issue3165 into dc83a86 on PyCQA:master.

coveralls · 2019-12-17T14:40:15Z

Coverage increased (+0.09%) to 89.812% when pulling 25a92f1 on ikraduya:Issue3165 into dc83a86 on PyCQA:master.

coveralls · 2019-12-17T14:40:15Z

Coverage increased (+0.007%) to 91.436% when pulling 522ca9b on ikraduya:Issue3165 into 570e655 on PyCQA:master.

AWhetter

This looks great! Thanks for the contribution.
The only important change is the one about the message description. The code is all good though.

pylint/checkers/refactoring.py

Pierre-Sassoulas

Nice new checker, I had to talk about this problem in a code review recently it would have saved some time.

pylint/checkers/refactoring.py

Pierre-Sassoulas · 2021-02-16T06:57:18Z

@ikraduya can you rebase on the latest master and take Ashley's remark into account ? :) We're going to release 2.7 soon, now is the time if you want this feature to make it ;)

ikraduya · 2021-02-16T07:03:21Z

@ikraduya can you rebase on the latest master and take Ashley's remark into account ? :) We're going to release 2.7 soon, now is the time if you want this feature to make it ;)

Sure, thanks for reminding

ikraduya · 2021-02-16T07:34:39Z

I have rebased on the latest master and edit the message description. Please review @Pierre-Sassoulas
Thanks

Pierre-Sassoulas

I just have a two minor comments before merging, thank you for reacting so fast ;) !

pylint/checkers/refactoring/refactoring_checker.py

Pierre-Sassoulas · 2021-02-16T07:46:54Z

pylint/checkers/refactoring/refactoring_checker.py

@@ -684,18 +708,40 @@ def _check_consider_using_comprehension_constructor(self, node):
                message_name = "consider-using-set-comprehension"
                self.add_message(message_name, node=node)

+    def _check_consider_using_generator(self, node):
+        checked_call = ["any", "all"]


I think @ngie-eign comment was relevant here #3309 (comment)
What do you think ?

I am not sure, I have tested those.
The result is slower with the generator.

list:

python -m timeit "some_list = list(range(1000));b = list([x % 7 == 0 for x in some_list]);" # output: 10000 loops, best of 3: 79.4 usec per loop python -m timeit "some_list = list(range(1000));b = list(x % 7 == 0 for x in some_list);" # output: 10000 loops, best of 3: 99.4 usec per loop

set:

python -m timeit "some_list = list(range(1000));b = set([x % 7 == 0 for x in some_list]);" # output: 10000 loops, best of 3: 89.1 usec per loop python -m timeit "some_list = list(range(1000));b = set(x % 7 == 0 for x in some_list);" # output: 10000 loops, best of 3: 104 usec per loop

tuple:

python -m timeit "some_list = list(range(1000));b = tuple([x % 7 == 0 for x in some_list]);" # output: 10000 loops, best of 3: 80.3 usec per loop python -m timeit "some_list = list(range(1000));b = tuple(x % 7 == 0 for x in some_list);" # output: 10000 loops, best of 3: 99.6 usec per loop

compared to any and all

any:

python -m timeit "some_list = list(range(1000));b = any([x % 7 == 0 for x in some_list]);" # output: 10000 loops, best of 3: 74.2 usec per loop python -m timeit "some_list = list(range(1000));b = any(x % 7 == 0 for x in some_list);" # output: 100000 loops, best of 3: 17 usec per loop

all:

python -m timeit "some_list = list(range(1000));b = all([x % 7 == 0 for x in some_list]);" # output: 10000 loops, best of 3: 84.6 usec per loop python -m timeit "some_list = list(range(1000));b = all(x % 7 == 0 for x in some_list);" # output: 100000 loops, best of 3: 16.5 usec per loop

When I checked the python docs, any and all take iterable as the parameter. But list, set, and tuple takes [iterable]. And there isn't any other function that takes iterable. So this 'consider-using-generator' still only relevant with any and all

Ok it makes senses.

Suggested change

checked_call = ["any", "all"]

# We only check 'any' and 'all' because for list, set, and tuple a generator performs worse

# See https://github.com/PyCQA/pylint/pull/3309#discussion_r576683109

checked_call = ["any", "all"]

I agree with the timing concerns, but when it boils down to it, it's the unnecessary data copies that I'm worried about with my original comment. With particularly large sequences or iterables, creating unnecessary copies really adds up.

@ikraduya: out of curiosity, what version of python was your output above tested on?

I think here we could create two separate messages. Actually for all or any there is nothing to 'consider', changing for a generator is a no-brainer, you can cut the execution tree and exit directly at the first element. Except in the worst possible case it's an easy performance win. So we could have a stronger use-generator-instead message for this. Then for other container, maybe for really long lists or sets it would be worth it to use generator, but the question should be considered on a case by case basis. So the user should consider it and we could keep a consider-using-generator for those. What do you think ?

@ngie-eign I am using python 3.6.9 version.

I have tested @ngie-eign codes using timeit.

tuple(list(range(100))) tuple(range(100)) set(list(range(100))) set(range(100)) tuple(list(range(1000000))) tuple(range(1000000)) set(list(range(1000000))) set(range(1000000))

output:

1000000 loops, best of 3: 1.44 usec per loop 1000000 loops, best of 3: 0.927 usec per loop 100000 loops, best of 3: 2.72 usec per loop 1000000 loops, best of 3: 2.28 usec per loop 10 loops, best of 3: 45.9 msec per loop 10 loops, best of 3: 36.3 msec per loop 10 loops, best of 3: 93.9 msec per loop 10 loops, best of 3: 64.9 msec per loop

Clearly using the generator wins over the timing because the conversion is unnecessary. But the codes are not using list comprehension. I think we can create another case to check such unnecessary copies code.

I also profiled the memory for tuple and set when using list comprehension.
Codes:

a = set([x % 7 for x in range(100)]) a = set(x % 7 for x in range(100)) a = set([x % 7 for x in range(1000000)]) a = set(x % 7 for x in range(1000000)) a = tuple([x % 7 for x in range(100)]) a = tuple(x % 7 for x in range(100)) a = tuple([x % 7 for x in range(1000000)]) a = tuple(x % 7 for x in range(1000000))

Results:

Maximum resident set size (kbytes): 9584 Maximum resident set size (kbytes): 9556 Maximum resident set size (kbytes): 17020 Maximum resident set size (kbytes): 9556 Maximum resident set size (kbytes): 9584 Maximum resident set size (kbytes): 9572 Maximum resident set size (kbytes): 24936 Maximum resident set size (kbytes): 17816

It is clear that a very long list/set will get the benefit from using generator :).

@Pierre-Sassoulas : I think your suggestion about creating additional suggestions for using generators judiciously with other containers makes a lot of sense (seems like a good candidate for an I class message).

I also think removing the “consider” part in the checker makes a lot of sense too.

Suggested change

checked_call = ["any", "all"]

checked_call = ["any", "all", "set", "list", "tuple"]

See pylint-dev#3309 (comment)

Co-authored-by: Pierre Sassoulas <[email protected]>

ChangeLog

pylint/checkers/refactoring/refactoring_checker.py

Co-authored-by: Daniel Hahler <[email protected]>

ChangeLog

doc/whatsnew/2.5.rst

pylint/checkers/refactoring/refactoring_checker.py

Remove item from 2.5.rst Change `consider-using-generator` message ID

See pylint-dev#3309 (comment)

Co-authored-by: Pierre Sassoulas <[email protected]>

Co-authored-by: Daniel Hahler <[email protected]>

Remove item from 2.5.rst Change `consider-using-generator` message ID

pylint/checkers/refactoring/refactoring_checker.py

ikraduya · 2021-02-20T06:14:36Z

There exist consider-using-set-comprehension check when the code is set([0 % 7 for i in range(10)])
I think we shouldn't include set in consider-using-generator` right?

Pierre-Sassoulas · 2021-02-20T14:52:12Z

Merging despite the change request review, because @AWhetter review was about the message content and it was fixed.

Thanks a lot for your work @ikraduya, this is a great checkers, will make a lot of codebase faster ! And congratulation on becoming a pylint contributor !

tweakimp · 2021-02-21T10:33:37Z

Thank you for implementing this! :)

See my comment at pylint-dev/pylint#3309

AWhetter requested changes Jan 6, 2020

View reviewed changes

pylint/checkers/refactoring.py Outdated Show resolved Hide resolved

ngie-eign reviewed Jan 13, 2020

View reviewed changes

pylint/checkers/refactoring.py Outdated Show resolved Hide resolved

PCManticore added the Work in progress label Feb 5, 2020

Pierre-Sassoulas approved these changes Mar 27, 2020

View reviewed changes

pylint/checkers/refactoring.py Outdated Show resolved Hide resolved

ikraduya added 4 commits February 16, 2021 14:23

Add new consider-using-generator checker

75c1f8a

Add consider-using-generator unit test

771aa26

Add consider-using-generator in ChangeLog and whatsnew

20e47d7

Change consider-using-generator message description

8df2c20

ikraduya force-pushed the Issue3165 branch from 25a92f1 to 8df2c20 Compare February 16, 2021 07:31

Pierre-Sassoulas approved these changes Feb 16, 2021

View reviewed changes

Pierre-Sassoulas added this to the 2.7.0 milestone Feb 16, 2021

Pierre-Sassoulas added Checkers Related to a checker Enhancement ✨ Improvement to a component and removed Work in progress labels Feb 16, 2021

Pierre-Sassoulas self-assigned this Feb 16, 2021

Pierre-Sassoulas and others added 2 commits February 16, 2021 10:47

Add design choices for 'consider-using-generator'

ea6c21c

See pylint-dev#3309 (comment)

Minor change in message description

8e8f599

Co-authored-by: Pierre Sassoulas <[email protected]>

Pierre-Sassoulas requested a review from AWhetter February 16, 2021 10:21

blueyed reviewed Feb 16, 2021

View reviewed changes

ChangeLog Outdated Show resolved Hide resolved

ChangeLog Outdated Show resolved Hide resolved

pylint/checkers/refactoring/refactoring_checker.py Outdated Show resolved Hide resolved

pylint/checkers/refactoring/refactoring_checker.py Outdated Show resolved Hide resolved

Grammar correction at ChangeLog, 2.5.rst, and refactoring_checker.py

a942725

Co-authored-by: Daniel Hahler <[email protected]>

Pierre-Sassoulas requested changes Feb 16, 2021

View reviewed changes

ChangeLog Outdated Show resolved Hide resolved

doc/whatsnew/2.5.rst Outdated Show resolved Hide resolved

pylint/checkers/refactoring/refactoring_checker.py Outdated Show resolved Hide resolved

pylint/checkers/refactoring/refactoring_checker.py Outdated Show resolved Hide resolved

Move ChangeLog item from 2.5.0 to 2.7.0

66381f4

Remove item from 2.5.rst Change `consider-using-generator` message ID

Pierre-Sassoulas approved these changes Feb 16, 2021

View reviewed changes

Merge branch 'master' into Issue3165

eb16eb2

ikraduya and others added 10 commits February 16, 2021 20:01

Add new consider-using-generator checker

5106349

Add consider-using-generator unit test

7f7ce47

Add consider-using-generator in ChangeLog and whatsnew

a33978a

Change consider-using-generator message description

e4cf934

Add design choices for 'consider-using-generator'

f928049

See pylint-dev#3309 (comment)

Minor change in message description

13da8d7

Co-authored-by: Pierre Sassoulas <[email protected]>

Grammar correction at ChangeLog, 2.5.rst, and refactoring_checker.py

9cb01f2

Co-authored-by: Daniel Hahler <[email protected]>

Move ChangeLog item from 2.5.0 to 2.7.0

4044347

Remove item from 2.5.rst Change `consider-using-generator` message ID

Add consider-using-generator in 2.7.rst

60fa295

Resolve ChangeLog consider-using-generator issue

eb3a407

Pierre-Sassoulas reviewed Feb 17, 2021

View reviewed changes

pylint/checkers/refactoring/refactoring_checker.py Outdated Show resolved Hide resolved

Pierre-Sassoulas reviewed Feb 17, 2021

View reviewed changes

pylint/checkers/refactoring/refactoring_checker.py Outdated Show resolved Hide resolved

ikraduya and others added 3 commits February 20, 2021 13:22

Creating additional 'use-a-generator' checker

750417d

Merge branch 'master' into Issue3165

ab3533f

Fix pre-commit checks

522ca9b

Pierre-Sassoulas merged commit 45c8245 into pylint-dev:master Feb 20, 2021

mscuthbert pushed a commit to cuthbertLab/music21 that referenced this pull request Feb 23, 2021

disable over-zealous pylint checks

3e102ed

See my comment at pylint-dev/pylint#3309

This was referenced Feb 23, 2021

regression in 2.7.0: false positives use-a-generator #4133

Closed

regression in 2.7.0: false positive consider-using-generator #4132

Closed

Pierre-Sassoulas mentioned this pull request Feb 24, 2021

Checker to replace 'any/all(not condition)' by 'not any/all(condition)' when 'not condition' is slower than 'condition' #4146

Open

lavaleri mentioned this pull request Feb 25, 2021

chore: Fix pylint R1729(use-a-generator) aws/aws-dynamodb-encryption-python#151

Merged

1 task

Pierre-Sassoulas mentioned this pull request Oct 16, 2021

use-a-generator hint in sum method #5166

Closed

Pierre-Sassoulas mentioned this pull request May 12, 2022

Raise use_a_generator for sum(), max(), min() #6595

Merged

Pierre-Sassoulas mentioned this pull request Jul 2, 2022

warn when redundant temporary lists are used when generators would work #2447

Closed

	checked_call = ["any", "all"]
	checked_call = ["any", "all", "set", "list", "tuple"]

Uh oh!

Add new consider-using-generator checker Issue #3165 #3309

Add new consider-using-generator checker Issue #3165 #3309

Uh oh!

Conversation

ikraduya commented Dec 17, 2019 • edited by AWhetter Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps

Description

Type of Changes

Related Issue

Uh oh!

coveralls commented Dec 17, 2019

Uh oh!

coveralls commented Dec 17, 2019

Uh oh!

coveralls commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AWhetter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pierre-Sassoulas commented Feb 16, 2021

Uh oh!

ikraduya commented Feb 16, 2021

Uh oh!

ikraduya commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ikraduya Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngie-eign Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ikraduya commented Feb 20, 2021

Uh oh!

Pierre-Sassoulas commented Feb 20, 2021

Uh oh!

tweakimp commented Feb 21, 2021

Uh oh!

Uh oh!

Add new `consider-using-generator` checker Issue #3165 #3309

Add new `consider-using-generator` checker Issue #3165 #3309

ikraduya commented Dec 17, 2019 •

edited by AWhetter

Loading

coveralls commented Dec 17, 2019 •

edited

Loading

ikraduya commented Feb 16, 2021 •

edited

Loading

ikraduya Feb 16, 2021 •

edited

Loading

ngie-eign Feb 16, 2021 •

edited

Loading