Skip to content

Rework (and heavily optimize!) mypy.ini per-module configuration #4894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 13, 2018

Conversation

msullivan
Copy link
Collaborator

Currently, we compute the options for each module by scanning through
the full list of per-module configuration options and seeing if the
pattern matches against the module name.
This is incredibly slow for large configuration files. On the internal S repo,
mypy spends 23 seconds (!!) just computing per-module options.

To fix this, we instead precompute an Options object for each config
section, and in clone_for_module do a search for the most specific
configured Options (so for foo.bar, we try foo.bar, foo.bar.,
foo.
, in that order).

This cuts down the processing time from 23s to about 80ms.

The catch is that this is actually a backwards-incompatible semantics
change, in two ways:

  • Patterns can be of the form foo.bar and foo.bar.*, but can no longer be
    general purpose globs. We produce an error message to detect this misusage.
  • Patterns are now always applied based on specificity and not on the order they
    appear in the file. This means that some poorly-formed configuration files that
    contained options that were always overridden may have their meaning changed.
    This is probably not too common, so we don't do anything to deal with it.
    We could emit a warning when specificity-order and file-order disagree, but
    it seems wrong to lock people into the old behavior, and it seems like a silly
    thing to make configurable.

I had a really pretty trie-based solution that I wrote before realizing I could do it this way and it would probably be good enough. The trie-based solution is faster, but not enough faster to justify the extra complication :(

Currently, we compute the options for each module by scanning through
the full list of per-module configuration options and seeing if the
pattern matches against the module name.
This is incredibly slow for large configuration files. On the internal S repo,
mypy spends 23 seconds (!!) just computing per-module options.

To fix this, we instead precompute an Options object for each config
section, and in `clone_for_module` do a search for the most specific
configured Options (so for foo.bar, we try foo.bar, foo.bar.*,
foo.*, in that order).

This cuts down the processing time from 23s to about 80ms.

The catch is that this is actually a backwards-incompatible semantics
change, in two ways:
 * Patterns can be of the form `foo.bar` and `foo.bar.*`, but can no longer be
   general purpose globs. We produce an error message to detect this misusage.
 * Patterns are now always applied based on specificity and not on the order they
   appear in the file. This means that some poorly-formed configuration files that
   contained options that were always overridden may have their meaning changed.
   This is probably not too common, so we don't do anything to deal with it.
   We could emit a warning when specificity-order and file-order disagree, but
   it seems wrong to lock people into the old behavior, and it seems like a silly
   thing to make configurable.
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! This fixed the performance regression in the S repo for me.

Left a few minor comments. Also, the typeshed update seems unrelated.

<https://docs.python.org/3.6/library/fnmatch.html>`_
separated by commas. These sections specify additional flags that
only apply to *modules* whose name matches at least one of the patterns.
present, where ``PATTERN1``, ``PATTERN2`` etc. are comma-separated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nit: You are missing some commas around 'etc.'. Maybe: PATTERN2, etc., are ....

xx.py:1: error: Function is missing a type annotation
mypy.ini: [mypy-*x*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*'
mypy.ini: [mypy-*y*]: Invalid pattern. Patterns must be 'module_name' or 'module_name.*'
== Return code: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the return code be non-zero?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we don't currently error for most mypy.ini so I've stayed consistent for now.

[file spam/eggs.py]
[out]
Warning: unused section(s) in mypy.ini: [mypy-bar], [mypy-baz.*], [mypy-emarg.*], [mypy-emarg.hatch]
== Return code: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the 0 exit status is unexpected (but this was like this before this PR).

@msullivan
Copy link
Collaborator Author

Typeshed update was a misfire, will fix

@msullivan msullivan merged commit 0d61fd0 into master Apr 13, 2018
@msullivan msullivan deleted the options-optimiz2 branch April 13, 2018 17:22
@felixc
Copy link

felixc commented May 3, 2018

Hi; just to clarify the change that's needed to existing config files: Does this mean that if you have a directory (say, "repositories"), which contains a bunch of subdirs, (call them "a" through "e"), all of which contain a "migrations" directory that should be ignored, we need to go from:

[mypy-*migrations*]
ignore_errors = True

to

[mypy-repositories.a.migrations.*,repositories.b.migrations.*,repositories.c.migrations.*,repositories.d.migrations.*,repositories.e.migrations.*]
ignore_errors = True

Or is there a more concise option, or one that would be robust to the hypothetical future addition of "f/migrations", "g/migrations", etc? Thanks!

@msullivan
Copy link
Collaborator Author

Yeah, that is correct.

enrico-ghidoni added a commit to enrico-ghidoni/pyarmasync that referenced this pull request Sep 25, 2018
With version 0.600 the way mypy uses per-module configuration has changed (see python/mypy#4894) and the glob pattern 'test_*' could not be used to exclude tests. The solution was to add a '__init__.py' file to identify 'tests' as a proper package and exclude it completely from mypy checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants