Skip to content

Simplify async timeouts and allowing timeout=None in PubSub.get_message() to wait forever #2295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 29, 2022

Conversation

kristjanvalur
Copy link
Contributor

@kristjanvalur kristjanvalur commented Jul 20, 2022

Pull Request check-list

Please make sure to review and check all of these items:

  • Does $ tox pass with this change (including linting)?
  • Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • Is there an example added to the examples folder (if applicable)?
  • Was the change added to CHANGES file?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Description of change

async PubSub no longer does a separate can_read for every message read. Instead async timeout mechanisms are used at the highest level to time out operations.

The use of can_read is a holdover from the non-async times, when timeout had to happen at the lowest level, at the actual blocking socket call. With asyncio, timeouts happen higher up in the stack. This allows for simplifications, because there is no longer any need to have timeout code lower down in the stack.

can_read() is still kept around, but its only use now is assertions in the ConnectionPool. This could be simplified further, since having can_read() around requires additional code paths for read, but not consumed, data.

Obsolete code pertaining to blocking exceptions was also removed.

Additionally, get_message() now accepts timeout=None argument to wait indefinitely for a message.

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from e01e4dc to e58b783 Compare July 20, 2022 19:34
@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2022

Codecov Report

Base: 92.16% // Head: 92.20% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (38351fe) compared to base (cdbc662).
Patch coverage: 97.61% of modified lines in pull request are covered.

❗ Current head 38351fe differs from pull request most recent head 305f848. Consider uploading reports for the commit 305f848 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2295      +/-   ##
==========================================
+ Coverage   92.16%   92.20%   +0.03%     
==========================================
  Files         110      110              
  Lines       28925    28899      -26     
==========================================
- Hits        26659    26645      -14     
+ Misses       2266     2254      -12     
Impacted Files Coverage Δ
redis/asyncio/connection.py 87.13% <97.22%> (+0.98%) ⬆️
redis/asyncio/client.py 92.06% <100.00%> (-0.27%) ⬇️
redis/client.py 89.06% <100.00%> (ø)
tests/test_asyncio/test_pubsub.py 99.37% <100.00%> (ø)
tests/test_asyncio/test_search.py 98.75% <0.00%> (+0.31%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from 117045a to d244ab0 Compare July 21, 2022 11:14
@kristjanvalur kristjanvalur marked this pull request as ready for review July 21, 2022 11:37
@kristjanvalur
Copy link
Contributor Author

I'm submitting this for review, even though there are one or two weird sporatic test failures that don't seem to have anything to do with this. (Note the added code for killing containers, seems to have fixed test stability when container ports were in-use when starting tox run, probably due to rest-runner being re-used.)

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch 2 times, most recently from 2bd56fe to e29731d Compare July 21, 2022 14:18
disable_decoding=disable_decoding
)
else:
async with lock_ctxt, async_timeout.timeout(read_timeout):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't an if/else on with_lock directly be more performant instead of a null context? Same for async_timeout. Even with 0 timeout, async_timeout adds significant latency.

Copy link
Contributor Author

@kristjanvalur kristjanvalur Jul 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a matter of opinion. pub-sub is primarily IO bound. An application spends its time waiting for a message. Once a message arrives, two extra calls to __aexit__() make no difference in the large scheme of things.

The not with_lock is a special case, used only in cluster mode.
the async_timeout is usually needed, and always for timeout=0. It could be skipped for timeout==None but this blocking case is, again, an uncommon special case.

There is an argument for reducing code duplication and simplifying program flow to ensure correctness, even if the cost is an extra __aexit__() call in special circumstances. The alternative is to have four differenc control paths, for with_lock == True/False and timeout is None or not None. In my opinion, that is not very pythonic.

Copy link
Contributor

@utkarshgupta137 utkarshgupta137 Jul 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I would always argue against code duplication, but the foremost feature of redis is its speed. If we've to make small ergonomic sacrifices so that the code is faster, then I think it should be considered.
Also, async calls add significantly more overhead than sync calls, which is why I even bothered commenting. Out of the two, we should at least have an if/else for the timeout block, as it adds a lot of latency.

And if you are keen on removing the if/else for _without_lock function, can you at least directly call this new function instead of adding a level of indirection in cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll oblige with the changes you suggest, of course, because getting fixes into async redis-py has been a very arduous process indeed.

But as a python developer for 20 years, allow me to suggest that the changes will not amount to any measurable difference. Remember, this is Python we're talking about. Redis may be fast, but all that is dwarfed by the fact that we are running inside Python. We already have layers and layers of buffers, abstractions, and classes, even lower than this function of which you speak. An extra null function call, in some minority special case, just does not matter.

My suggestion would be, instead, to simplify the code as much as possible removing interdependencies and special cases, so that in further steps, we can remove those intermediate buffering layers. The async code in redis-py suffers too much from being verbatim copy of the blocking code, with the odd async keyword thrown in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add the direct call to the without_lock function in cluster since I didn't want the changes to spread too far, nor change the api of the connection.py too be backwards incompatible. But removing the special _without_lock function seems very reasonable.

(why the without-lock is required in the cluster client is also not clear to me, not documented anywhere, maybe you can explain why there is no need for locking in the cluster client? I've never used cluster)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connectionpool can be simplified. It is already not thread-safe. (since async IO and threading doesn't mix. And having it Fork-safe is also dubious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest that the fork-protection be removed from asyncio connection pools. When forking using asyncio, all bets are off. Threading and forking are fundamentally incompatible with asyncio. I´ll make a separate PR for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a separate PR would probably be better, since it may need other changes and/or improved tests.

I agree, threading support should neither be supported nor be expected.
Regarding removing multiprocessing support, I'm definitely in favour of it. But @chayim @dvora-h would've to take a call for release plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer tight, surgical PRs. It allows us to both review code, comment effectively - and frankly, helps us not miss things. This includes surgical tests.

Feature parity (where possible) belongs in all of our usage models:

  1. Sync
  2. Async
  3. Sync Cluster
  4. Async Cluster

I'm generally against removing features that people are using. However, I'm pro pre-announced deprecation (see the Python announcement in the README).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chayim Since we're already dropping 3.6 support in the next release, can we also announce that support for multiprocessing will also be dropped or perhaps be allowed through a flag in the same breaking release?

raise ConnectionError(f"Error while reading from socket: {ex.args}")
return True

async def read_from_socket(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dvora-h can you dig into this function?

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from b6e6137 to 27addb5 Compare July 27, 2022 18:57
@kristjanvalur kristjanvalur mentioned this pull request Aug 22, 2022
6 tasks
@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch 3 times, most recently from 4d0ea9d to b7bbcc8 Compare August 22, 2022 14:11
@kristjanvalur
Copy link
Contributor Author

A separate PR, #2308, proposes to remove this read lock entirely, and would simplify this PR further.

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from cba774f to 409aedc Compare September 20, 2022 06:28
@dvora-h
Copy link
Collaborator

dvora-h commented Sep 28, 2022

@kristjanvalur Can you merge master in so I can see if tests pass and merge this PR?

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from 409aedc to 5ea981b Compare September 28, 2022 19:17
@chayim chayim added the maintenance Maintenance (CI, Releases, etc) label Sep 29, 2022
@chayim
Copy link
Contributor

chayim commented Sep 29, 2022

@kristjanvalur I think a better title for this may be your comment in the CHANGES file - which completely changes how I think about this PR. IMHO this becomes a feature. For 4.4.0-rc3. WDYT?

@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Sep 29, 2022

Well, the timeout=None was actually just something I threw in there as a bonus. My primary purpose with this PR was to remove some of the cruft in the async code which exists because it is a direct copy of the non-async code. Timeouts in blocking code need to be handled at the IO level, while in async they are generally free, and implemented via the event loop.

I had actually been thinking about removing this "feature" and adding it separately, post-hoc, to make the PR more "clean".

But whatever you think is best, I'm happy with. I think it is important to straighten out and simplify the async connection code so that we can more easily work on simplifications closer to the metal, for example, remove double buffering, etc.

Would you like me to change the title?

@dvora-h
Copy link
Collaborator

dvora-h commented Sep 29, 2022

@kristjanvalur Same here, conflicts...

@chayim chayim changed the title Simplify async timeouts Simplify async timeouts and allowing timeout=None in PubSub.get_message() to wait forever Sep 29, 2022
@chayim
Copy link
Contributor

chayim commented Sep 29, 2022

Given that we'd like to get this into 4.4.0 rc2... I'm voting for take as is. GitHub really needs a "100%" reaction. I think this is great :)

@chayim chayim added the feature New feature label Sep 29, 2022
@dvora-h
Copy link
Collaborator

dvora-h commented Sep 29, 2022

@kristjanvalur last one...

@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from 5ea981b to 38351fe Compare September 29, 2022 12:59
@kristjanvalur kristjanvalur force-pushed the simplify-async-timeouts branch from 38351fe to 305f848 Compare September 29, 2022 13:00
@kristjanvalur
Copy link
Contributor Author

Ok, this one was a bit trickier, lots of related commits that went in... lets see how it does in CI

@dvora-h dvora-h merged commit b0883b7 into redis:master Sep 29, 2022
@kristjanvalur
Copy link
Contributor Author

Thanks. That was a bunch of related PRs that all sort of complemented each other. I hope with this in place to be able to do further work, particularly on the hiredis parser, to reduce overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature maintenance Maintenance (CI, Releases, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants