Skip to content

E0401 (import-error) checks perform many isinstance calls when searching for zipimporters #9607

@correctmost

Description

@correctmost

Bug description

In astroid, there is a _get_zipimporters function that scans sys.path_importer_cache for zipimporter finders:

def _get_zipimporters() -> Iterator[tuple[str, zipimport.zipimporter]]:
    for filepath, importer in sys.path_importer_cache.items():
        if isinstance(importer, zipimport.zipimporter):
            yield filepath, importer

The isinstance call gets executed ~35 million times when running pylint on the yt-dlp repo. ~32 million of these invocations can be avoided by checking for None first:

if importer is not None and isinstance(importer, zipimport.zipimporter)

A further optimization would be to reduce the number of times path_importer_cache is scanned, but that seems trickier to accomplish.

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp
git checkout 5904853ae5788509fdc4892cb7ecdfa9ae7f78e6

cat << EOF > ./profile_pylint.py
import cProfile
import pstats
import sys

sys.argv = ['pylint', '--recursive=y', '.']
cProfile.run('from pylint import __main__', filename='stats')

with open('profiler_stats', 'w', encoding='utf-8') as file:
    stats = pstats.Stats('stats', stream=file)
    stats.sort_stats('tottime')
    stats.print_stats()
EOF

cat << EOF > .pylintrc
[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no
EOF

python ./profile_pylint.py

Analysis

_get_zipimporters calls isinstance ~35 million times

import pstats

stats = pstats.Stats('stats')
stats.print_callers('isinstance')

  ncalls  tottime  cumtime
35188734    4.255    4.255  astroid/interpreter/_import/spec.py:342(_get_zipimporters)

Checking for None first seems twice as fast:

$ python -m timeit -s 'import zipimport; x=None' -n 1000000 'isinstance(x, zipimport.zipimporter)' 
1000000 loops, best of 5: 48.9 nsec per loop

$ python -m timeit -s 'import zipimport; x=None' -n 1000000 'x is not None and isinstance(x, zipimport.zipimporter)' 
1000000 loops, best of 5: 24.3 nsec per loop

Pylint output

There may be some import errors depending on your (virtual) environment, but the output is less important than the performance numbers.

Expected behavior

Improved performance via reduced isinstance calls

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@a4a9fcc44ae0d71773dc3bff6baa78fc571ecb7d
pylint @ git+https://github.com/pylint-dev/pylint.git@500774ae5a4e49e2aa0c8d3f2b64613e21aa676e
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    AstroidRelated to astroidEnhancement ✨Improvement to a componentNeeds PRThis issue is accepted, sufficiently specified and now needs an implementationperformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions