Making Pylint faster #497

nickdrozd · 2018-02-18T21:30:11Z

I recently learned how to do Python profiling with yappi. pylint has
always seemed slow to me, so I decided to see if it could be sped up.

Here is the call graph from running pylint against
https://github.com/PyCQA/pycodestyle/blob/master/pycodestyle.py:

On these graphs, nodes represent function calls, and the brighter the
node, the more time spent in that function. Each node has three
numbers: 1) the total time spent in the function, including its
subcalls; 2) the total time spent in that function but not its
subcalls; and 3) the total number of times the function was called.

I was somewhat surprised to learn (although it seems obvious in
retrospect) that the pylint code itself is not especially slow and
that most time is being spent in astroid functions. Looking at that
graph, it's obvious that nodes_of_clas and get_children are
bottlenecks, and optimizing those functions could have a big impact.
(To explain the numbers a bit, nodes_of_class is taking up almost
60% of total CPU time, and more than a third of that time is being
spent in get_children.)

First, I timed three runs of pylint with astroid on master to get
a benchmark:

real	1m17.996s
user	0m31.987s
sys	0m45.806s

real	1m17.828s
user	0m31.704s
sys 	0m45.936s

real	1m16.928s
user	0m31.327s
sys 	0m45.416s

Next, I timed three runs with astroid on the commit Add type-specific get_children. This commit gives each (or almost each) node class its
own get_children method instead of having them all use the same
generic function. This sped things up:

real	1m11.266s
user	0m30.778s
sys	0m40.306s

real	1m11.503s
user	0m31.215s
sys	0m40.036s

real	1m11.761s
user	0m31.376s
sys	0m40.194s

Here is the call graph with that change:

As compared with the previous graph, nodes_by_class takes slightly
less total CPU time, but of the time it does take, less of it is spent
in its subcalls. This is because the type-specific get_children
calls go much faster.

Okay, so nodes_of_class is now the sole bottleneck. First, I applied
the commit Move nodes_of_class null check out of inner loop, which,
as the name suggests, just shuffles around the null check logic in
that function. This provided a modest speedup:

real	1m10.510s
user	0m30.712s
sys	0m39.624s

real	1m11.590s
user	0m31.072s
sys	0m40.328s

real	1m11.274s
user	0m31.055s
sys	0m40.049s

According to the call graph (which I won't bother to post), this
change saved about .2% of total CPU time, which is not bad for such a
small change.

Finally, I applied a much larger change, the commit Add type-specific nodes_of_class. This commit adds a few functions that are just like
nodes_of_class except that they apply only to specific node
searches. This eliminates the need to do expensive runtime type
checking. These replacing calls to nodes_of_class within astroid
with these new functions sped things up significantly:

real	0m58.005s
user	0m25.634s
sys	0m32.200s

real	0m57.874s
user	0m25.870s
sys	0m31.838s

real	0m58.177s
user	0m26.026s
sys	0m31.984s

Here is the call graph

It's a little hard to interpret, but by my count nodes_of_class went
from taking ~58% of total CPU time to ~48%.

Note that all of this data comes from running pylint, as I said
earlier, against just a single file, pycodestyle.py. Larger pylint
targets (projects with lots of subdirectories, for instance) have
different profiles, and different functions become more prominent
(get_children and nodes_of_class are slow in all circumstances,
however). I have ideas for more optimizations, which I will come back
with after these first changes are taken care of.

rogalski · 2018-02-18T21:38:37Z

Wow, an effort put to this is very impressive. Thank you!

brycepg · 2018-02-19T04:47:50Z

Sorry ignore my comments if you saw them, I remembered that we haven't removed 2.7 yet from tox

brycepg · 2018-02-20T18:26:00Z

@nickdrozd What do you think we can do to help prevent performance regressions? At first glance, some of this code would make me want to refactor it away because of DRY (but I understand the trade off)

PCManticore · 2018-02-21T09:53:51Z

astroid/node_classes.py

        """

+    def get_children(self):
+        for elt in self.elts:


This could use yield from as @brycepg mentioned. We still have some bits of Python 2 compatibility in astroid, but we are in the process of removing it from both pylint and astroid (so if your PR fails on Python 2 for now, feel free to ignore that)

PCManticore · 2018-02-21T09:53:58Z

astroid/node_classes.py

        super(AssignName, self).__init__(lineno, col_offset, parent)

+    def get_children(self):
+        return (_ for _ in ())


PCManticore · 2018-02-21T09:56:18Z

astroid/node_classes.py

        """
+
+        def get_children(self):
+            if self.args is not None:


This one would also benefit from yield from (like the entire PR). But I'd also like to have these grouped, as in:

yield from self.defaults yield from self.kwonlyargs yield from ...

The reason is that the blocks seem to be independent only in the nature of the value that gets yielded, but other than that, it's all the same.

PCManticore · 2018-02-21T10:25:28Z

astroid/node_classes.py

            for matching in child_node.nodes_of_class(klass, skip_klass):
                yield matching

+    def get_assign_nodes(self):


I would prefer these to be private. Some comments before them mentioning why they are like this would also help in the future in case someone wonders why these couldn't have been made more DRY

PCManticore · 2018-02-21T10:29:30Z

Thank you for doing this amazing work @nickdrozd ! There's not much to be commented about this PR, left a couple of comments with things that we can improve, but overall looks like a pretty good gain in performance. Also I noticed from your yippi output that there might be other places where there are tons of calls, where we should have less, such as in the transforms for instance.
Also I support Bryce's question. Before we start optimising further, we should think about measuring the performance first; I imagine it would be super easy to introduce regressions for it.

nickdrozd · 2018-02-23T00:06:11Z

Thanks for the kind words! I was somewhat worried that these changes
wouldn't be well-received, so I'm glad that everyone is on board 🐈

A few points:

`yield from`

I was going to say that I don't feel comfortable being the one to
cause the Python 2 tests to fail, but it looks like all that stuff is
already getting ripped out. I will update everything to use yield from.

Performance Regression Testing

I have no experience testing for performance regression, so I don't
have any clever ideas. This may be a naive approach, but the simplest
thing I can think to do would be to add a pass in the CI that runs
some fixed version of Pylint (say, 1.8) with the version of Astroid in
master or the PR being tested against some fixed file (say,
pycodestyle.py). If the approximate speed of that run in the test
environment can be determined, then the test will fail if the run of
Pylint is not within some reasonable bound of the target time.

That's a general (and again, possibly naive) approach to performance
regression testing. As far as preventing regression in the specific
areas affected by this PR, there are a few options. One possibility
would be to delete the generic NodeNG.get_children entirely and
require that each node type implements its own. A variation of this
idea would have NodeNG.get_children raise a NotImplementedError,
accompanied by a comment that says something like

# DON'T REFACTOR THIS, I KNOW IT LOOKS INELEGANT
# BUT IT'S LIKE THIS FOR PERFORMANCE REASONS

(NodeNG could be made into an actual abstract base class, but that
seems a little dramatic to me.)

Use of nodes_of_class within Astroid code is always bad, so that
should be prohibited. Astroid's pylintrc could probably be modified
to ensure this. In the worst case, a custom Pylint plugin could be
written to check for its use.

Making `get_assign_nodes`, etc, private

@PCManticore suggests that the type-specific nodes_of_class
replacements be made private. This is reasonable, as making them
"public" functions increases the API surface. On the other hand,
Pylint makes lots of calls to nodes_of_class, and these calls are,
of course, slow:

~/pylint $ grep -r nodes_of_class pylint
pylint/checkers/async.py:        for child in node.nodes_of_class(astroid.Yield):
pylint/checkers/base.py:    for import_node in try_block.nodes_of_class((astroid.ImportFrom, astroid.Import)):
pylint/checkers/base.py:        _node for _node in loop.nodes_of_class(loop_nodes,
pylint/checkers/base.py:        _node for _node in loop.nodes_of_class(astroid.Break,
pylint/checkers/base.py:        starred = list(node.targets[0].nodes_of_class(astroid.Starred))
pylint/checkers/base.py:        returns = node.nodes_of_class(astroid.Return,
pylint/checkers/base.py:            for child in node.nodes_of_class(astroid.Global)
pylint/checkers/base.py:        for node_name in node.nodes_of_class(astroid.Name):
pylint/checkers/base.py:            child.names for child in node.nodes_of_class(astroid.Nonlocal)
pylint/checkers/base.py:            child.names for child in node.nodes_of_class(astroid.Global)
pylint/checkers/base.py:        if any(expr.nodes_of_class(astroid.Call)):
pylint/checkers/classes.py:            for call in infer_method.nodes_of_class(astroid.Call):
pylint/checkers/classes.py:    for call in fundef_node.nodes_of_class(astroid.Call):
pylint/checkers/classes.py:        for stmt in node.nodes_of_class(astroid.Call):
pylint/checkers/imports.py:                any(node.nodes_of_class((astroid.Import, astroid.ImportFrom))):
pylint/checkers/imports.py:            if any(root.nodes_of_class((astroid.Import, astroid.ImportFrom))):
pylint/checkers/newstyle.py:        for stmt in node.nodes_of_class(astroid.Call):
pylint/checkers/refactoring.py:        for defined_argument in scope.args.nodes_of_class(astroid.AssignName):
pylint/checkers/refactoring.py:        for name in node.target.nodes_of_class(astroid.AssignName):
pylint/checkers/refactoring.py:            for name in names.nodes_of_class(astroid.AssignName):
pylint/checkers/refactoring.py:        return_nodes = node.nodes_of_class(astroid.Return)
pylint/checkers/refactoring.py:            for subscript in child.nodes_of_class(astroid.Subscript):
pylint/checkers/typecheck.py:    for name in statement.nodes_of_class(astroid.Name):
pylint/checkers/typecheck.py:        returns = list(function_node.nodes_of_class(astroid.Return,
pylint/checkers/utils.py:            for ass_node in _node.nodes_of_class(astroid.AssignName):
pylint/checkers/utils.py:            for ass_node in _node.target.nodes_of_class(astroid.AssignName):
pylint/checkers/utils.py:        for ass_node in _node.nodes_of_class(astroid.AssignName):
pylint/checkers/utils.py:        for imp_node in _node.nodes_of_class((astroid.ImportFrom, astroid.Import)):
pylint/checkers/utils.py:            for default_name_node in default_node.nodes_of_class(astroid.Name):
pylint/checkers/utils.py:        if node in base.nodes_of_class(astroid.Name):
pylint/checkers/variables.py:    imports = frame.nodes_of_class((astroid.Import, astroid.ImportFrom))
pylint/checkers/variables.py:    assign_stmts = name_node.scope().nodes_of_class(astroid.AssignName)
pylint/checkers/variables.py:        assigned_to = [var.name for var in node.target.nodes_of_class(astroid.AssignName)]
pylint/checkers/variables.py:        global_names = _flattened_scope_names(node.nodes_of_class(astroid.Global))
pylint/checkers/variables.py:        nonlocal_names = _flattened_scope_names(node.nodes_of_class(astroid.Nonlocal))
pylint/extensions/_check_docs_utils.py:            for ret in target.nodes_of_class(astroid.Return):
pylint/extensions/docparams.py:        return_nodes = node.nodes_of_class(astroid.Return)
pylint/extensions/redefined_variable_type.py:                          redef_parent in orig_parent.nodes_of_class(astroid.If)):

A whole slew of these functions could be added to the API:
get_assign_nodes, get_assignname_nodes, get_global_nodes, etc.
Using these instead of the generic nodes_of_class would make Pylint
faster still. Doing that is a lot more work, however, and would
require coordinating changes in the two repos.

As I write this, I've convinced myself that they should be made
"private" for now, but this decision should certainly be revisited in
the near future.

get_children is elegant and flexible and slow. def get_children(self): for field in self._astroid_fields: attr = getattr(self, field) if attr is None: continue if isinstance(attr, (list, tuple)): for elt in attr: yield elt else: yield attr It iterates over a list, dynamically accesses attributes, does null checks, and does type checking. This function gets called a lot, and all that extra work is a real drag on performance. In most cases there isn't any need to do any of these checks. Take an Assign node for instance: def get_children(self): for elt in self.targets: yield elt yield self.value It's known in advance that Assign nodes have a list of targets and a value, so just yield those without checking anything.

The check was being repeated unnecessarily in a tight loop.

nodes_of_class is a very flexible method, which is great for use in client code (e.g. Pylint). However, that flexibility requires a great deal of runtime type checking: def nodes_of_class(self, klass, skip_klass=None): if isinstance(self, klass): yield self if skip_klass is None: for child_node in self.get_children(): for matching in child_node.nodes_of_class(klass, skip_klass): yield matching return for child_node in self.get_children(): if isinstance(child_node, skip_klass): continue for matching in child_node.nodes_of_class(klass, skip_klass): yield matching First, the node has to check its own type to see whether it's of the desired class. Then the skip_klass flag has to be checked to see whether anything needs to be skipped. If so, the type of every yielded node has to be check to see if it should be skipped. This is fine for calling code whose arguments can't be known in advance ("Give me all the Assign and ClassDef nodes, but skip all the BinOps, YieldFroms, and Globals."), but in Astroid itself, every call to this function can be known in advance. There's no need to do any type checking if all the nodes know how to respond to certain requests. Take get_assign_nodes for example. The Assign nodes know that they should yield themselves and then yield their Assign children. Other nodes know in advance that they aren't Assign nodes, so they don't need to check their own type, just immediately yield their Assign children. Overly specific functions like get_yield_nodes_skip_lambdas certainly aren't very elegant, but the tradeoff is to take advantage of knowing how the library code works to improve speed.

nickdrozd · 2018-02-24T02:24:19Z

The PR has been updated to

add get_children to every node,
use yield from, and
make the nodes_of_class replacements "private".

NodeNG.get_children could be deleted at this point.

brycepg · 2018-02-24T05:21:25Z

Looks pretty good. I think it's a good idea to keep NodeNG.get_children for yet to be added nodes and as an interface

PCManticore · 2018-03-02T09:10:01Z

I agree with Bryce that NodeNG.get_children could still be here for newly added nodes. Regarding pylint's usage of nodes_of_class I would say to not add those here, but rather implement them right into pylint, astroid should only provide fast nodes_of_class alternatives for what it's already using, while pylint could have its own variants.

Nevertheless, this was a fun patch @nickdrozd ! Thank you so much for contributing this work.
Regarding the performance regression, that sounds good, but I wonder if we're going to have potential varying results due to, say, CI being too slow at a given time. I wonder if there is any service that out there that provides this information somehow and that's free for open source projects. Anyway, I think we can merge this PR as is and we can revisit later how to measure performance.

52e6765 introduced a new type of child iteration. Some classes started yielding children in the incorrect order, which was causing test failures in pylint. This change corrects the ordering to match the _astroid_fields of the class. Relates to a change from #497

PCManticore reviewed Feb 21, 2018

View reviewed changes

nickdrozd added 3 commits February 23, 2018 20:12

Move nodes_of_class null check out of inner loop

bee7dc5

The check was being repeated unnecessarily in a tight loop.

nickdrozd force-pushed the speed branch from 851e149 to a954bc7 Compare February 24, 2018 02:16

PCManticore merged commit 9b5aa97 into pylint-dev:master Mar 2, 2018

nickdrozd mentioned this pull request Mar 17, 2018

Making Pylint faster 2 #519

Merged

PCManticore mentioned this pull request Apr 5, 2018

Django Manager.from_queryset causing significant performance issues #532

Closed

nickdrozd mentioned this pull request Apr 15, 2018

Making pycodestyle faster PyCQA/pycodestyle#754

Open

PCManticore mentioned this pull request Apr 24, 2018

Stop specifying a default linter microsoft/vscode-python#974

Closed

PCManticore mentioned this pull request May 7, 2018

Major performance slowdown after updating pylint-dev/pylint#2057

Closed

nickdrozd mentioned this pull request May 13, 2018

Making Pylint faster 3 #545

Closed

PCManticore mentioned this pull request Jul 3, 2018

Pylint slow when run on script with pandas pylint-dev/pylint#2198

Closed

PCManticore mentioned this pull request Jul 16, 2018

Possible to cache astroid AST between runs? pylint-dev/pylint#2292

Closed

Uh oh!

Making Pylint faster #497

Making Pylint faster #497

Uh oh!

Conversation

nickdrozd commented Feb 18, 2018

Uh oh!

rogalski commented Feb 18, 2018

Uh oh!

brycepg commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brycepg commented Feb 20, 2018

Uh oh!

PCManticore Feb 21, 2018

Choose a reason for hiding this comment

Uh oh!

PCManticore Feb 21, 2018

Choose a reason for hiding this comment

Uh oh!

PCManticore Feb 21, 2018

Choose a reason for hiding this comment

Uh oh!

PCManticore Feb 21, 2018

Choose a reason for hiding this comment

Uh oh!

PCManticore commented Feb 21, 2018

Uh oh!

nickdrozd commented Feb 23, 2018

yield from

Performance Regression Testing

Making get_assign_nodes, etc, private

Uh oh!

nickdrozd commented Feb 24, 2018

Uh oh!

brycepg commented Feb 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PCManticore commented Mar 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brycepg commented Feb 19, 2018 •

edited

Loading

`yield from`

Making `get_assign_nodes`, etc, private

brycepg commented Feb 24, 2018 •

edited

Loading