Rework (and heavily optimize!) mypy.ini per-module configuration #4894

msullivan · 2018-04-12T20:14:59Z

Currently, we compute the options for each module by scanning through
the full list of per-module configuration options and seeing if the
pattern matches against the module name.
This is incredibly slow for large configuration files. On the internal S repo,
mypy spends 23 seconds (!!) just computing per-module options.

To fix this, we instead precompute an Options object for each config
section, and in clone_for_module do a search for the most specific
configured Options (so for foo.bar, we try foo.bar, foo.bar.,
foo., in that order).

This cuts down the processing time from 23s to about 80ms.

The catch is that this is actually a backwards-incompatible semantics
change, in two ways:

Patterns can be of the form foo.bar and foo.bar.*, but can no longer be
general purpose globs. We produce an error message to detect this misusage.
Patterns are now always applied based on specificity and not on the order they
appear in the file. This means that some poorly-formed configuration files that
contained options that were always overridden may have their meaning changed.
This is probably not too common, so we don't do anything to deal with it.
We could emit a warning when specificity-order and file-order disagree, but
it seems wrong to lock people into the old behavior, and it seems like a silly
thing to make configurable.

I had a really pretty trie-based solution that I wrote before realizing I could do it this way and it would probably be good enough. The trie-based solution is faster, but not enough faster to justify the extra complication :(

Currently, we compute the options for each module by scanning through the full list of per-module configuration options and seeing if the pattern matches against the module name. This is incredibly slow for large configuration files. On the internal S repo, mypy spends 23 seconds (!!) just computing per-module options. To fix this, we instead precompute an Options object for each config section, and in `clone_for_module` do a search for the most specific configured Options (so for foo.bar, we try foo.bar, foo.bar.*, foo.*, in that order). This cuts down the processing time from 23s to about 80ms. The catch is that this is actually a backwards-incompatible semantics change, in two ways: * Patterns can be of the form `foo.bar` and `foo.bar.*`, but can no longer be general purpose globs. We produce an error message to detect this misusage. * Patterns are now always applied based on specificity and not on the order they appear in the file. This means that some poorly-formed configuration files that contained options that were always overridden may have their meaning changed. This is probably not too common, so we don't do anything to deal with it. We could emit a warning when specificity-order and file-order disagree, but it seems wrong to lock people into the old behavior, and it seems like a silly thing to make configurable.

JukkaL

Looks good! This fixed the performance regression in the S repo for me.

Left a few minor comments. Also, the typeshed update seems unrelated.

JukkaL · 2018-04-13T12:30:51Z

docs/source/config_file.rst

-  <https://docs.python.org/3.6/library/fnmatch.html>`_
-  separated by commas.  These sections specify additional flags that
-  only apply to *modules* whose name matches at least one of the patterns.
+  present, where ``PATTERN1``, ``PATTERN2`` etc. are comma-separated


Grammar nit: You are missing some commas around 'etc.'. Maybe: PATTERN2, etc., are ....

JukkaL · 2018-04-13T12:40:41Z

test-data/unit/cmdline.test

-xx.py:1: error: Function is missing a type annotation
+mypy.ini: [mypy-*x*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*'
+mypy.ini: [mypy-*y*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*'
+== Return code: 0


Should the return code be non-zero?

Yes, but we don't currently error for most mypy.ini so I've stayed consistent for now.

JukkaL · 2018-04-13T12:43:54Z

test-data/unit/cmdline.test

+[file spam/eggs.py]
+[out]
+Warning: unused section(s) in mypy.ini: [mypy-bar], [mypy-baz.*], [mypy-emarg.*], [mypy-emarg.hatch]
+== Return code: 0


Again the 0 exit status is unexpected (but this was like this before this PR).

msullivan · 2018-04-13T15:46:18Z

Typeshed update was a misfire, will fix

felixc · 2018-05-03T14:15:12Z

Hi; just to clarify the change that's needed to existing config files: Does this mean that if you have a directory (say, "repositories"), which contains a bunch of subdirs, (call them "a" through "e"), all of which contain a "migrations" directory that should be ignored, we need to go from:

[mypy-*migrations*]
ignore_errors = True

to

[mypy-repositories.a.migrations.*,repositories.b.migrations.*,repositories.c.migrations.*,repositories.d.migrations.*,repositories.e.migrations.*]
ignore_errors = True

Or is there a more concise option, or one that would be robust to the hypothetical future addition of "f/migrations", "g/migrations", etc? Thanks!

msullivan · 2018-05-03T15:32:48Z

Yeah, that is correct.

With version 0.600 the way mypy uses per-module configuration has changed (see python/mypy#4894) and the glob pattern 'test_*' could not be used to exclude tests. The solution was to add a '__init__.py' file to identify 'tests' as a proper package and exclude it completely from mypy checks.

msullivan requested review from JukkaL, gvanrossum and ilevkivskyi April 12, 2018 20:15

msullivan added 2 commits April 12, 2018 13:22

fix a test messup

78e1738

fix lint

3500618

JukkaL approved these changes Apr 13, 2018

View reviewed changes

msullivan added 2 commits April 13, 2018 09:39

Merge branch 'master' into options-optimiz2

a7748da

add some commas

7691a2c

msullivan merged commit 0d61fd0 into master Apr 13, 2018

msullivan deleted the options-optimiz2 branch April 13, 2018 17:22

emmatyping mentioned this pull request May 8, 2018

Version 0.600 disallows per module configuration globs like mypy-project.apps.*.migrations.* #5014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework (and heavily optimize!) mypy.ini per-module configuration #4894

Rework (and heavily optimize!) mypy.ini per-module configuration #4894

msullivan commented Apr 12, 2018

JukkaL left a comment

JukkaL Apr 13, 2018

JukkaL Apr 13, 2018

msullivan Apr 13, 2018

JukkaL Apr 13, 2018

msullivan commented Apr 13, 2018

felixc commented May 3, 2018

msullivan commented May 3, 2018

Rework (and heavily optimize!) mypy.ini per-module configuration #4894

Rework (and heavily optimize!) mypy.ini per-module configuration #4894

Conversation

msullivan commented Apr 12, 2018

JukkaL left a comment

Choose a reason for hiding this comment

JukkaL Apr 13, 2018

Choose a reason for hiding this comment

JukkaL Apr 13, 2018

Choose a reason for hiding this comment

msullivan Apr 13, 2018

Choose a reason for hiding this comment

JukkaL Apr 13, 2018

Choose a reason for hiding this comment

msullivan commented Apr 13, 2018

felixc commented May 3, 2018

msullivan commented May 3, 2018