Simplify async timeouts and allowing `timeout=None` in `PubSub.get_message()` to wait forever #2295

kristjanvalur · 2022-07-20T19:31:38Z

Pull Request check-list

Please make sure to review and check all of these items:

Does $ tox pass with this change (including linting)?
Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
Is the new or changed code fully tested?
Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
Is there an example added to the examples folder (if applicable)?
Was the change added to CHANGES file?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Description of change

async PubSub no longer does a separate can_read for every message read. Instead async timeout mechanisms are used at the highest level to time out operations.

The use of can_read is a holdover from the non-async times, when timeout had to happen at the lowest level, at the actual blocking socket call. With asyncio, timeouts happen higher up in the stack. This allows for simplifications, because there is no longer any need to have timeout code lower down in the stack.

can_read() is still kept around, but its only use now is assertions in the ConnectionPool. This could be simplified further, since having can_read() around requires additional code paths for read, but not consumed, data.

Obsolete code pertaining to blocking exceptions was also removed.

Additionally, get_message() now accepts timeout=None argument to wait indefinitely for a message.

codecov-commenter · 2022-07-21T00:03:08Z

Codecov Report

Base: 92.16% // Head: 92.20% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (38351fe) compared to base (cdbc662).
Patch coverage: 97.61% of modified lines in pull request are covered.

❗ Current head 38351fe differs from pull request most recent head 305f848. Consider uploading reports for the commit 305f848 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2295      +/-   ##
==========================================
+ Coverage   92.16%   92.20%   +0.03%     
==========================================
  Files         110      110              
  Lines       28925    28899      -26     
==========================================
- Hits        26659    26645      -14     
+ Misses       2266     2254      -12

Impacted Files	Coverage Δ
redis/asyncio/connection.py	`87.13% <97.22%> (+0.98%)`	⬆️
redis/asyncio/client.py	`92.06% <100.00%> (-0.27%)`	⬇️
redis/client.py	`89.06% <100.00%> (ø)`
tests/test_asyncio/test_pubsub.py	`99.37% <100.00%> (ø)`
tests/test_asyncio/test_search.py	`98.75% <0.00%> (+0.31%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

kristjanvalur · 2022-07-21T11:39:10Z

I'm submitting this for review, even though there are one or two weird sporatic test failures that don't seem to have anything to do with this. (Note the added code for killing containers, seems to have fixed test stability when container ports were in-use when starting tox run, probably due to rest-runner being re-used.)

utkarshgupta137 · 2022-07-24T12:41:35Z

redis/asyncio/connection.py

-                        disable_decoding=disable_decoding
-                    )
-            else:
+            async with lock_ctxt, async_timeout.timeout(read_timeout):


Wouldn't an if/else on with_lock directly be more performant instead of a null context? Same for async_timeout. Even with 0 timeout, async_timeout adds significant latency.

That is a matter of opinion. pub-sub is primarily IO bound. An application spends its time waiting for a message. Once a message arrives, two extra calls to __aexit__() make no difference in the large scheme of things.

The not with_lock is a special case, used only in cluster mode.
the async_timeout is usually needed, and always for timeout=0. It could be skipped for timeout==None but this blocking case is, again, an uncommon special case.

There is an argument for reducing code duplication and simplifying program flow to ensure correctness, even if the cost is an extra __aexit__() call in special circumstances. The alternative is to have four differenc control paths, for with_lock == True/False and timeout is None or not None. In my opinion, that is not very pythonic.

Normally I would always argue against code duplication, but the foremost feature of redis is its speed. If we've to make small ergonomic sacrifices so that the code is faster, then I think it should be considered.
Also, async calls add significantly more overhead than sync calls, which is why I even bothered commenting. Out of the two, we should at least have an if/else for the timeout block, as it adds a lot of latency.

And if you are keen on removing the if/else for _without_lock function, can you at least directly call this new function instead of adding a level of indirection in cluster?

I'll oblige with the changes you suggest, of course, because getting fixes into async redis-py has been a very arduous process indeed.

But as a python developer for 20 years, allow me to suggest that the changes will not amount to any measurable difference. Remember, this is Python we're talking about. Redis may be fast, but all that is dwarfed by the fact that we are running inside Python. We already have layers and layers of buffers, abstractions, and classes, even lower than this function of which you speak. An extra null function call, in some minority special case, just does not matter.

My suggestion would be, instead, to simplify the code as much as possible removing interdependencies and special cases, so that in further steps, we can remove those intermediate buffering layers. The async code in redis-py suffers too much from being verbatim copy of the blocking code, with the odd async keyword thrown in.

I didn't add the direct call to the without_lock function in cluster since I didn't want the changes to spread too far, nor change the api of the connection.py too be backwards incompatible. But removing the special _without_lock function seems very reasonable.

(why the without-lock is required in the cluster client is also not clear to me, not documented anywhere, maybe you can explain why there is no need for locking in the cluster client? I've never used cluster)

The connectionpool can be simplified. It is already not thread-safe. (since async IO and threading doesn't mix. And having it Fork-safe is also dubious.

I also suggest that the fork-protection be removed from asyncio connection pools. When forking using asyncio, all bets are off. Threading and forking are fundamentally incompatible with asyncio. I´ll make a separate PR for that.

I would say a separate PR would probably be better, since it may need other changes and/or improved tests.

I agree, threading support should neither be supported nor be expected.
Regarding removing multiprocessing support, I'm definitely in favour of it. But @chayim @dvora-h would've to take a call for release plan.

I prefer tight, surgical PRs. It allows us to both review code, comment effectively - and frankly, helps us not miss things. This includes surgical tests.

Feature parity (where possible) belongs in all of our usage models:

Sync

Async

Sync Cluster

Async Cluster

I'm generally against removing features that people are using. However, I'm pro pre-announced deprecation (see the Python announcement in the README).

@chayim Since we're already dropping 3.6 support in the next release, can we also announce that support for multiprocessing will also be dropped or perhaps be allowed through a flag in the same breaking release?

.github/workflows/integration.yaml

redis/asyncio/connection.py

chayim · 2022-07-27T06:25:14Z

redis/asyncio/connection.py

-            raise ConnectionError(f"Error while reading from socket: {ex.args}")
+        return True
+
+    async def read_from_socket(self):


@dvora-h can you dig into this function?

kristjanvalur · 2022-08-22T14:31:16Z

A separate PR, #2308, proposes to remove this read lock entirely, and would simplify this PR further.

dvora-h · 2022-09-28T11:19:58Z

@kristjanvalur Can you merge master in so I can see if tests pass and merge this PR?

chayim · 2022-09-29T08:33:45Z

@kristjanvalur I think a better title for this may be your comment in the CHANGES file - which completely changes how I think about this PR. IMHO this becomes a feature. For 4.4.0-rc3. WDYT?

kristjanvalur · 2022-09-29T09:09:27Z

Well, the timeout=None was actually just something I threw in there as a bonus. My primary purpose with this PR was to remove some of the cruft in the async code which exists because it is a direct copy of the non-async code. Timeouts in blocking code need to be handled at the IO level, while in async they are generally free, and implemented via the event loop.

I had actually been thinking about removing this "feature" and adding it separately, post-hoc, to make the PR more "clean".

But whatever you think is best, I'm happy with. I think it is important to straighten out and simplify the async connection code so that we can more easily work on simplifications closer to the metal, for example, remove double buffering, etc.

Would you like me to change the title?

dvora-h · 2022-09-29T11:10:23Z

@kristjanvalur Same here, conflicts...

chayim · 2022-09-29T11:17:50Z

Given that we'd like to get this into 4.4.0 rc2... I'm voting for take as is. GitHub really needs a "100%" reaction. I think this is great :)

dvora-h · 2022-09-29T11:57:18Z

@kristjanvalur last one...

…nnection

remove nullcontext

kristjanvalur · 2022-09-29T13:00:44Z

Ok, this one was a bit trickier, lots of related commits that went in... lets see how it does in CI

kristjanvalur · 2022-09-29T13:39:54Z

Thanks. That was a bunch of related PRs that all sort of complemented each other. I hope with this in place to be able to do further work, particularly on the hiredis parser, to reduce overhead.

kristjanvalur force-pushed the simplify-async-timeouts branch from e01e4dc to e58b783 Compare July 20, 2022 19:34

kristjanvalur force-pushed the simplify-async-timeouts branch from 117045a to d244ab0 Compare July 21, 2022 11:14

kristjanvalur marked this pull request as ready for review July 21, 2022 11:37

kristjanvalur force-pushed the simplify-async-timeouts branch 2 times, most recently from 2bd56fe to e29731d Compare July 21, 2022 14:18

utkarshgupta137 reviewed Jul 24, 2022

View reviewed changes

chayim reviewed Jul 27, 2022

View reviewed changes

.github/workflows/integration.yaml Outdated Show resolved Hide resolved

chayim reviewed Jul 27, 2022

View reviewed changes

redis/asyncio/connection.py Outdated Show resolved Hide resolved

chayim reviewed Jul 27, 2022

View reviewed changes

kristjanvalur force-pushed the simplify-async-timeouts branch from b6e6137 to 27addb5 Compare July 27, 2022 18:57

kristjanvalur mentioned this pull request Aug 22, 2022

Dev/no lock #2308

Merged

6 tasks

kristjanvalur force-pushed the simplify-async-timeouts branch 3 times, most recently from 4d0ea9d to b7bbcc8 Compare August 22, 2022 14:11

kristjanvalur force-pushed the simplify-async-timeouts branch from cba774f to 409aedc Compare September 20, 2022 06:28

kristjanvalur force-pushed the simplify-async-timeouts branch from 409aedc to 5ea981b Compare September 28, 2022 19:17

utkarshgupta137 approved these changes Sep 28, 2022

View reviewed changes

chayim added the maintenance Maintenance (CI, Releases, etc) label Sep 29, 2022

chayim changed the title ~~Simplify async timeouts~~ Simplify async timeouts and allowing timeout=None in PubSub.get_message() to wait forever Sep 29, 2022

chayim added the feature New feature label Sep 29, 2022

chayim approved these changes Sep 29, 2022

View reviewed changes

kristjanvalur force-pushed the simplify-async-timeouts branch from 5ea981b to 38351fe Compare September 29, 2022 12:59

kristjanvalur added 6 commits September 29, 2022 13:00

Avoid an extra "can_read" call and use timeout directly.

d5fc899

Remove low-level read timeouts from the Parser, now handled in the Co…

0da78c8

…nnection

Allow pubsub.get_message(time=None) to block.

9a97fe3

update Changes

9abe3eb

increase test timeout for robustness

36fda7b

expand with statement to avoid invoking null context managers.

305f848

remove nullcontext

kristjanvalur force-pushed the simplify-async-timeouts branch from 38351fe to 305f848 Compare September 29, 2022 13:00

Remove unused import

7fa672d

dvora-h merged commit b0883b7 into redis:master Sep 29, 2022

kristjanvalur deleted the simplify-async-timeouts branch September 29, 2022 13:40

kristjanvalur mentioned this pull request Dec 14, 2022

Timeouts when parsing messages in redis/asyncio can corrupt connection #2513

Closed

Simplify async timeouts and allowing timeout=None in PubSub.get_message() to wait forever #2295

Simplify async timeouts and allowing timeout=None in PubSub.get_message() to wait forever #2295

Uh oh!

Conversation

kristjanvalur commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request check-list

Description of change

Uh oh!

codecov-commenter commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kristjanvalur commented Jul 21, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kristjanvalur Jul 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

utkarshgupta137 Jul 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kristjanvalur commented Aug 22, 2022

Uh oh!

dvora-h commented Sep 28, 2022

Uh oh!

chayim commented Sep 29, 2022

Uh oh!

kristjanvalur commented Sep 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvora-h commented Sep 29, 2022

Uh oh!

chayim commented Sep 29, 2022

Uh oh!

dvora-h commented Sep 29, 2022

Uh oh!

kristjanvalur commented Sep 29, 2022

Uh oh!

kristjanvalur commented Sep 29, 2022

Uh oh!

Uh oh!

Simplify async timeouts and allowing `timeout=None` in `PubSub.get_message()` to wait forever #2295

Simplify async timeouts and allowing `timeout=None` in `PubSub.get_message()` to wait forever #2295

kristjanvalur commented Jul 20, 2022 •

edited

Loading

codecov-commenter commented Jul 21, 2022 •

edited

Loading

kristjanvalur Jul 24, 2022 •

edited

Loading

utkarshgupta137 Jul 24, 2022 •

edited

Loading

kristjanvalur commented Sep 29, 2022 •

edited

Loading