Skip to content

Replace module level mutable containers with immutable containers #139003

@eendebakpt

Description

@eendebakpt

There are many module level containers with constants. By replacing the mutable containers with immutable variants (e.g. replace a set with a frozenset or a list with a tuple) we improve performance (especially in the free-threaded build, see for example #138429) and avoid accidental modification of these containers.

The number of module level lists, dicts and sets on current main is:

number of mutable module level containers by type:
<class 'dict'>: 266
<class 'list'>: 150
<class 'set'>: 63
<class 'collections.defaultdict'>: 2
<class '_strptime.TimeRE'>: 1
<class 'email._encoded_words._QByteMap'>: 1
Script to list all the module level mutable containers
import sys
import importlib
import pkgutil
from collections import Counter

blacklist = ('idlelib.idle')
excluded_submodule_names = ('__main__')
search_submodules = 2
mutable_containers = (list, dict,  set)

def list_container_types(module, mcc):
    print_module = False
    for a in dir(module):
        if a in ('__all__', '__path__', '__builtins__', '__annotations__', '__conditional_annotations__'):
            # why is __all__ a list and not a tuple?
            continue
        # if not a.startswith('_'):
        #    continue
        attr = getattr(module, a)
        tp = type(attr)
        if issubclass(tp, mutable_containers):
            if not print_module:
                print(f'{module}:')
                print_module = True
            print(f'  {a}: {tp}')
            mcc.update([tp])


def search_modules(module_names, search_submodules: int, mcc):
    for name in module_names:
        if name in blacklist:
            continue
        try:
            module = importlib.import_module(name)
        except:
            print(f'{name}: error on import')
            module = None
        list_container_types(module, mcc)

        if search_submodules:
            try:
                sub_names = list(z.name for z in pkgutil.iter_modules(module.__path__))
            except Exception as ex:
                sub_names = []
            mm = [name + '.' + sub_name for sub_name in sub_names if sub_name not in excluded_submodule_names]
            search_modules(mm, search_submodules - 1, mcc)

mcc = Counter()
module_names = sorted(list(sys.builtin_module_names)) + sorted(list(sys.stdlib_module_names))
search_modules(module_names, search_submodules=2, mcc = mcc)

print()
print('number of module level containers by type:')
for key, value in mcc.items():
    print(f'{key}: {value}')

Not all the mutable containers can be replaced by immutable containers. Some of them need to be mutable (e.g. copyreg.dispatch_table). And some of them are part of the public API and we might not want to change the type only for performance reasons.

Example candidates: _pydatetime._DAYNAMES (would improve performance of date.cdate), token.EXACT_TOKEN_TYPES, xml.etree.ElementTree.HTML_EMPTY

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytopic-free-threading

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions