-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Rework (and heavily optimize!) mypy.ini per-module configuration #4894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently, we compute the options for each module by scanning through the full list of per-module configuration options and seeing if the pattern matches against the module name. This is incredibly slow for large configuration files. On the internal S repo, mypy spends 23 seconds (!!) just computing per-module options. To fix this, we instead precompute an Options object for each config section, and in `clone_for_module` do a search for the most specific configured Options (so for foo.bar, we try foo.bar, foo.bar.*, foo.*, in that order). This cuts down the processing time from 23s to about 80ms. The catch is that this is actually a backwards-incompatible semantics change, in two ways: * Patterns can be of the form `foo.bar` and `foo.bar.*`, but can no longer be general purpose globs. We produce an error message to detect this misusage. * Patterns are now always applied based on specificity and not on the order they appear in the file. This means that some poorly-formed configuration files that contained options that were always overridden may have their meaning changed. This is probably not too common, so we don't do anything to deal with it. We could emit a warning when specificity-order and file-order disagree, but it seems wrong to lock people into the old behavior, and it seems like a silly thing to make configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! This fixed the performance regression in the S repo for me.
Left a few minor comments. Also, the typeshed update seems unrelated.
docs/source/config_file.rst
Outdated
<https://docs.python.org/3.6/library/fnmatch.html>`_ | ||
separated by commas. These sections specify additional flags that | ||
only apply to *modules* whose name matches at least one of the patterns. | ||
present, where ``PATTERN1``, ``PATTERN2`` etc. are comma-separated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nit: You are missing some commas around 'etc.'. Maybe: PATTERN2
, etc., are ....
xx.py:1: error: Function is missing a type annotation | ||
mypy.ini: [mypy-*x*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*' | ||
mypy.ini: [mypy-*y*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*' | ||
== Return code: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the return code be non-zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we don't currently error for most mypy.ini
so I've stayed consistent for now.
[file spam/eggs.py] | ||
[out] | ||
Warning: unused section(s) in mypy.ini: [mypy-bar], [mypy-baz.*], [mypy-emarg.*], [mypy-emarg.hatch] | ||
== Return code: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again the 0 exit status is unexpected (but this was like this before this PR).
Typeshed update was a misfire, will fix |
Hi; just to clarify the change that's needed to existing config files: Does this mean that if you have a directory (say, "repositories"), which contains a bunch of subdirs, (call them "a" through "e"), all of which contain a "migrations" directory that should be ignored, we need to go from: [mypy-*migrations*]
ignore_errors = True to [mypy-repositories.a.migrations.*,repositories.b.migrations.*,repositories.c.migrations.*,repositories.d.migrations.*,repositories.e.migrations.*]
ignore_errors = True Or is there a more concise option, or one that would be robust to the hypothetical future addition of "f/migrations", "g/migrations", etc? Thanks! |
Yeah, that is correct. |
With version 0.600 the way mypy uses per-module configuration has changed (see python/mypy#4894) and the glob pattern 'test_*' could not be used to exclude tests. The solution was to add a '__init__.py' file to identify 'tests' as a proper package and exclude it completely from mypy checks.
Currently, we compute the options for each module by scanning through
the full list of per-module configuration options and seeing if the
pattern matches against the module name.
This is incredibly slow for large configuration files. On the internal S repo,
mypy spends 23 seconds (!!) just computing per-module options.
To fix this, we instead precompute an Options object for each config
section, and in
clone_for_module
do a search for the most specificconfigured Options (so for foo.bar, we try foo.bar, foo.bar.,
foo., in that order).
This cuts down the processing time from 23s to about 80ms.
The catch is that this is actually a backwards-incompatible semantics
change, in two ways:
foo.bar
andfoo.bar.*
, but can no longer begeneral purpose globs. We produce an error message to detect this misusage.
appear in the file. This means that some poorly-formed configuration files that
contained options that were always overridden may have their meaning changed.
This is probably not too common, so we don't do anything to deal with it.
We could emit a warning when specificity-order and file-order disagree, but
it seems wrong to lock people into the old behavior, and it seems like a silly
thing to make configurable.
I had a really pretty trie-based solution that I wrote before realizing I could do it this way and it would probably be good enough. The trie-based solution is faster, but not enough faster to justify the extra complication :(