-
-
Notifications
You must be signed in to change notification settings - Fork 307
Making Pylint faster #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making Pylint faster #497
Conversation
|
Wow, an effort put to this is very impressive. Thank you! |
|
Sorry ignore my comments if you saw them, I remembered that we haven't removed 2.7 yet from tox |
|
@nickdrozd What do you think we can do to help prevent performance regressions? At first glance, some of this code would make me want to refactor it away because of DRY (but I understand the trade off) |
astroid/node_classes.py
Outdated
| """ | ||
|
|
||
| def get_children(self): | ||
| for elt in self.elts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use yield from as @brycepg mentioned. We still have some bits of Python 2 compatibility in astroid, but we are in the process of removing it from both pylint and astroid (so if your PR fails on Python 2 for now, feel free to ignore that)
astroid/node_classes.py
Outdated
| super(AssignName, self).__init__(lineno, col_offset, parent) | ||
|
|
||
| def get_children(self): | ||
| return (_ for _ in ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
astroid/node_classes.py
Outdated
| """ | ||
|
|
||
| def get_children(self): | ||
| if self.args is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one would also benefit from yield from (like the entire PR). But I'd also like to have these grouped, as in:
yield from self.defaults
yield from self.kwonlyargs
yield from ...
The reason is that the blocks seem to be independent only in the nature of the value that gets yielded, but other than that, it's all the same.
astroid/node_classes.py
Outdated
| for matching in child_node.nodes_of_class(klass, skip_klass): | ||
| yield matching | ||
|
|
||
| def get_assign_nodes(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer these to be private. Some comments before them mentioning why they are like this would also help in the future in case someone wonders why these couldn't have been made more DRY
|
Thank you for doing this amazing work @nickdrozd ! There's not much to be commented about this PR, left a couple of comments with things that we can improve, but overall looks like a pretty good gain in performance. Also I noticed from your yippi output that there might be other places where there are tons of calls, where we should have less, such as in the transforms for instance. |
|
Thanks for the kind words! I was somewhat worried that these changes A few points:
|
get_children is elegant and flexible and slow.
def get_children(self):
for field in self._astroid_fields:
attr = getattr(self, field)
if attr is None:
continue
if isinstance(attr, (list, tuple)):
for elt in attr:
yield elt
else:
yield attr
It iterates over a list, dynamically accesses attributes, does null
checks, and does type checking. This function gets called a lot, and
all that extra work is a real drag on performance.
In most cases there isn't any need to do any of these checks. Take an
Assign node for instance:
def get_children(self):
for elt in self.targets:
yield elt
yield self.value
It's known in advance that Assign nodes have a list of targets and a
value, so just yield those without checking anything.
The check was being repeated unnecessarily in a tight loop.
nodes_of_class is a very flexible method, which is great for use in
client code (e.g. Pylint). However, that flexibility requires a great
deal of runtime type checking:
def nodes_of_class(self, klass, skip_klass=None):
if isinstance(self, klass):
yield self
if skip_klass is None:
for child_node in self.get_children():
for matching in child_node.nodes_of_class(klass, skip_klass):
yield matching
return
for child_node in self.get_children():
if isinstance(child_node, skip_klass):
continue
for matching in child_node.nodes_of_class(klass, skip_klass):
yield matching
First, the node has to check its own type to see whether it's of the
desired class. Then the skip_klass flag has to be checked to see
whether anything needs to be skipped. If so, the type of every yielded
node has to be check to see if it should be skipped.
This is fine for calling code whose arguments can't be known in
advance ("Give me all the Assign and ClassDef nodes, but skip all the
BinOps, YieldFroms, and Globals."), but in Astroid itself, every call
to this function can be known in advance. There's no need to do any
type checking if all the nodes know how to respond to certain
requests. Take get_assign_nodes for example. The Assign nodes know
that they should yield themselves and then yield their Assign
children. Other nodes know in advance that they aren't Assign nodes,
so they don't need to check their own type, just immediately yield
their Assign children.
Overly specific functions like get_yield_nodes_skip_lambdas certainly
aren't very elegant, but the tradeoff is to take advantage of knowing
how the library code works to improve speed.
|
The PR has been updated to
|
|
Looks pretty good. I think it's a good idea to keep |
|
I agree with Bryce that Nevertheless, this was a fun patch @nickdrozd ! Thank you so much for contributing this work. |
I recently learned how to do Python profiling with
yappi.pylinthasalways seemed slow to me, so I decided to see if it could be sped up.
Here is the call graph from running
pylintagainsthttps://github.com/PyCQA/pycodestyle/blob/master/pycodestyle.py:
On these graphs, nodes represent function calls, and the brighter the
node, the more time spent in that function. Each node has three
numbers: 1) the total time spent in the function, including its
subcalls; 2) the total time spent in that function but not its
subcalls; and 3) the total number of times the function was called.
I was somewhat surprised to learn (although it seems obvious in
retrospect) that the
pylintcode itself is not especially slow andthat most time is being spent in
astroidfunctions. Looking at thatgraph, it's obvious that
nodes_of_clasandget_childrenarebottlenecks, and optimizing those functions could have a big impact.
(To explain the numbers a bit,
nodes_of_classis taking up almost60% of total CPU time, and more than a third of that time is being
spent in
get_children.)First, I timed three runs of
pylintwithastroidon master to geta benchmark:
Next, I timed three runs with astroid on the commit
Add type-specific get_children. This commit gives each (or almost each) node class itsown
get_childrenmethod instead of having them all use the samegeneric function. This sped things up:
Here is the call graph with that change:
As compared with the previous graph,
nodes_by_classtakes slightlyless total CPU time, but of the time it does take, less of it is spent
in its subcalls. This is because the type-specific
get_childrencalls go much faster.
Okay, so
nodes_of_classis now the sole bottleneck. First, I appliedthe commit
Move nodes_of_class null check out of inner loop, which,as the name suggests, just shuffles around the null check logic in
that function. This provided a modest speedup:
According to the call graph (which I won't bother to post), this
change saved about .2% of total CPU time, which is not bad for such a
small change.
Finally, I applied a much larger change, the commit
Add type-specific nodes_of_class. This commit adds a few functions that are just likenodes_of_classexcept that they apply only to specific nodesearches. This eliminates the need to do expensive runtime type
checking. These replacing calls to
nodes_of_classwithinastroidwith these new functions sped things up significantly:
Here is the call graph
It's a little hard to interpret, but by my count
nodes_of_classwentfrom taking ~58% of total CPU time to ~48%.
Note that all of this data comes from running
pylint, as I saidearlier, against just a single file,
pycodestyle.py. Largerpylinttargets (projects with lots of subdirectories, for instance) have
different profiles, and different functions become more prominent
(
get_childrenandnodes_of_classare slow in all circumstances,however). I have ideas for more optimizations, which I will come back
with after these first changes are taken care of.