Remove `lru_cache` from `get_message_definitions` #5673

DanielNoord · 2022-01-13T13:19:30Z

Add yourself to CONTRIBUTORS if you are a new contributor.
Write a good description on what the PR does.

Type of Changes

	Type
✓	🔨 Refactoring

Description

Ref #5670, #5605

This cache isn't actually that necessary. As long as message_id_store.get_active_msgids is reasonably fast (or cached) this function is also quite fast.

I tested using a similar pattern as in #5672 but actually setting values to a dict instance attribute and retrieving them takes longer than just doing the list comprehension for the return values of get_active_msgids. I think this is because if get_active_msgids returns just one msgid it's just one dict lookup whereas with a primitive cache it would be one dict lookup but also one dict assignment.
Anyway, a cache seems over engineered looking at the changes in these run times as long as get_active_msgids is cached.

Tested on pylint over pylint.
With lru_cache:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
105    0.000    0.000    0.001    0.000 message_definition_store.py:50(get_message_definitions)

With new cache:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
5069    0.008    0.000    0.011    0.000 message_definition_store.py:51(get_message_definitions)

Without new cache:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
5069    0.005    0.000    0.008    0.000 message_definition_store.py:49(get_message_definitions)

Problem introduced in #5605

Ref pylint-dev#5670

coveralls · 2022-01-13T13:26:08Z

Pull Request Test Coverage Report for Build 1692882877

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.0008%) to 93.828%

Totals
Change from base Build 1692435585:	-0.0008%
Covered Lines:	14640
Relevant Lines:	15603

💛 - Coveralls

Pierre-Sassoulas · 2022-01-13T15:31:18Z

Are you sure that caching 'self' is an issue ? Messages stores are singletons. 0,01s for each pylint startup is significant imo. (Well it should be, and it will once astroid's import is faster).

DanielNoord · 2022-01-13T16:02:27Z

get_message_definitions itself only takes 0.005s with this change (n=1 obviously, so take exact values with grain of salt).

I'm not sure whether this creates any real trouble immediately, but I think it would be good to enforce lru-cache-decorating-method. I really have no idea how the garbage collection works for multiprocessing, but can also imagine the other tools spawn multiple instances of PyLinter to do their own implementation of running PyLinter in parallel.

I did a quick test for running pylint over an empty file, with this change.
Sorted with cumtime get_message_definitions is the 685th function.
Sorted with tottime get_message_definitions is the 375th function.

Same test with pylint over pylint.
Sorted with cumtime get_message_definitions is the 1060th function.
Sorted with tottime get_message_definitions is the 685th function.

I don't think this will be the first function that needs performance optimisation 😄
Personally I'm fine with taking the 0.005-0.01s hit for the security that we're not creating any hard to debug memory leaks here.

Pierre-Sassoulas · 2022-01-13T18:10:55Z

We never change the content of the cache and the value for a given pylint version are always the same. Also there is always only one instance. As we said in the original MR the RAM used is negligible and deterministic. I don't see how it could become a problem.

For multiprocessing the cache is not shared but actually generated for each thread see this comment.

(It makes me want to generate the message store code into an optimized data structure. It would remove the checks for consistency at each initialization, we can do it once for each release instead of doing it once for each time pylint is used. This is #5607 .)

DanielNoord · 2022-01-13T19:06:50Z

We never change the content of the cache and the value for a given pylint version are always the same.

Don't plugin messages get registered on the MessageDefinitionStore as well?

Also there is always only one instance. As we said in the original MR the RAM used is negligible and deterministic. I don't see how it could become a problem.

You're right but I'd think I'd prefer to follow our new message here rather than adding a disable for it. The impact of this optimisations would be minimal:
For a run over pylint cumtime is 0.009s on a total runtime of 54.281s.
For a run over an empty file it is 0.000s (😅) on a total of 1.034s.
So for the pylint run we're discussing a 54,281/54,272 - 1 = 0,00016583% improvement here.

For me that warrants readability/maintainability over performance. Not adding the disable removes the need to explain why we're disabling here. Like I said, your argument is true but I don't think this small improvement warrants the "extra complexity" of special casing this method.

Pierre-Sassoulas · 2022-01-13T19:24:59Z

Don't plugin messages get registered on the MessageDefinitionStore as well?

Yes it's discussed in #5607, we can't just remove the code doing the check entirely.

0,00016583% improvement here

I agree there's more pressing optimization to do, and this function scale with run time (contrary to other optimization that have a fixed cost that we pay each time pylint is launched). But pylint is also downloaded 250 000 time a day, used maybe 10 time for each download, if we take into account an average use time of 10s, it adds up to more than an hour saved each day if we just disable this one false positive that we know will lower performance ever so slightly. We also have worst complexity to care about, the disable + short explanation won't be what make pylint unmaintainable imo 😄

DanielNoord · 2022-01-13T19:37:27Z

I agree there's more pressing optimization to do, and this function scale with run time (contrary to other optimization that have a fixed cost that we pay each time pylint is launched). But pylint is also downloaded 250 000 time a day, used maybe 10 time for each download, if we take into account an average use time of 10s, it adds up to more than an hour saved each day if we just disable this one false positive that we know will lower performance ever so slightly. We also have worst complexity to care about, the disable + short explanation won't be what make pylint unmaintainable imo 😄

I'll give in. I'll add a disable in #5674 and close this. 😄

Do you have any good ideas on how to allow this behaviour without adding a disable? Perhaps we should create it before committing the checker to main?

Pierre-Sassoulas · 2022-01-13T22:08:33Z

Do you have any good ideas on how to allow this behaviour without adding a disable?

I think singleton should be exempted, but I don't think there is a consensus on a pattern to indicate that an instance is a singleton in python. (See https://stackoverflow.com/questions/6760685/creating-a-singleton-in-python)

Remove lru_cache from get_message_definitions

35430e9

Ref pylint-dev#5670

DanielNoord added performance Maintenance Discussion or action around maintaining pylint or the dev workflow labels Jan 13, 2022

DanielNoord added this to the 2.13.0 milestone Jan 13, 2022

DanielNoord mentioned this pull request Jan 13, 2022

Add lru-cache-decorating-method checker #5674

Merged

4 tasks

DanielNoord requested a review from Pierre-Sassoulas January 13, 2022 15:30

DanielNoord closed this Jan 13, 2022

DanielNoord deleted the lru-cache-2 branch January 13, 2022 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Remove `lru_cache` from `get_message_definitions` #5673

Remove `lru_cache` from `get_message_definitions` #5673

Uh oh!

DanielNoord commented Jan 13, 2022

Uh oh!

coveralls commented Jan 13, 2022

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022

Uh oh!

DanielNoord commented Jan 13, 2022

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022 •

edited

Loading

Uh oh!

DanielNoord commented Jan 13, 2022

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022

Uh oh!

DanielNoord commented Jan 13, 2022 •

edited

Loading

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022

Uh oh!

Uh oh!

Uh oh!

Remove lru_cache from get_message_definitions #5673

Remove lru_cache from get_message_definitions #5673

Uh oh!

Conversation

DanielNoord commented Jan 13, 2022

Type of Changes

Description

Uh oh!

coveralls commented Jan 13, 2022

Pull Request Test Coverage Report for Build 1692882877

💛 - Coveralls

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022

Uh oh!

DanielNoord commented Jan 13, 2022

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DanielNoord commented Jan 13, 2022

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022

Uh oh!

DanielNoord commented Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pierre-Sassoulas commented Jan 13, 2022

Uh oh!

Uh oh!

Remove `lru_cache` from `get_message_definitions` #5673

Remove `lru_cache` from `get_message_definitions` #5673

Pierre-Sassoulas commented Jan 13, 2022 •

edited

Loading

DanielNoord commented Jan 13, 2022 •

edited

Loading