-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Simplify async timeouts and allowing timeout=None
in PubSub.get_message()
to wait forever
#2295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify async timeouts and allowing timeout=None
in PubSub.get_message()
to wait forever
#2295
Conversation
e01e4dc
to
e58b783
Compare
Codecov ReportBase: 92.16% // Head: 92.20% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2295 +/- ##
==========================================
+ Coverage 92.16% 92.20% +0.03%
==========================================
Files 110 110
Lines 28925 28899 -26
==========================================
- Hits 26659 26645 -14
+ Misses 2266 2254 -12
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
117045a
to
d244ab0
Compare
I'm submitting this for review, even though there are one or two weird sporatic test failures that don't seem to have anything to do with this. (Note the added code for killing containers, seems to have fixed test stability when container ports were in-use when starting tox run, probably due to rest-runner being re-used.) |
2bd56fe
to
e29731d
Compare
redis/asyncio/connection.py
Outdated
disable_decoding=disable_decoding | ||
) | ||
else: | ||
async with lock_ctxt, async_timeout.timeout(read_timeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't an if/else on with_lock directly be more performant instead of a null context? Same for async_timeout. Even with 0 timeout, async_timeout adds significant latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a matter of opinion. pub-sub is primarily IO bound. An application spends its time waiting for a message. Once a message arrives, two extra calls to __aexit__()
make no difference in the large scheme of things.
The not with_lock
is a special case, used only in cluster mode.
the async_timeout is usually needed, and always for timeout=0. It could be skipped for timeout==None
but this blocking case is, again, an uncommon special case.
There is an argument for reducing code duplication and simplifying program flow to ensure correctness, even if the cost is an extra __aexit__()
call in special circumstances. The alternative is to have four differenc control paths, for with_lock == True/False
and timeout is None or not None
. In my opinion, that is not very pythonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I would always argue against code duplication, but the foremost feature of redis is its speed. If we've to make small ergonomic sacrifices so that the code is faster, then I think it should be considered.
Also, async calls add significantly more overhead than sync calls, which is why I even bothered commenting. Out of the two, we should at least have an if/else for the timeout block, as it adds a lot of latency.
And if you are keen on removing the if/else for _without_lock function, can you at least directly call this new function instead of adding a level of indirection in cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll oblige with the changes you suggest, of course, because getting fixes into async redis-py has been a very arduous process indeed.
But as a python developer for 20 years, allow me to suggest that the changes will not amount to any measurable difference. Remember, this is Python we're talking about. Redis may be fast, but all that is dwarfed by the fact that we are running inside Python. We already have layers and layers of buffers, abstractions, and classes, even lower than this function of which you speak. An extra null function call, in some minority special case, just does not matter.
My suggestion would be, instead, to simplify the code as much as possible removing interdependencies and special cases, so that in further steps, we can remove those intermediate buffering layers. The async code in redis-py suffers too much from being verbatim copy of the blocking code, with the odd async
keyword thrown in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add the direct call to the without_lock function in cluster since I didn't want the changes to spread too far, nor change the api of the connection.py too be backwards incompatible. But removing the special _without_lock function seems very reasonable.
(why the without-lock is required in the cluster client is also not clear to me, not documented anywhere, maybe you can explain why there is no need for locking in the cluster client? I've never used cluster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connectionpool can be simplified. It is already not thread-safe. (since async IO and threading doesn't mix. And having it Fork-safe is also dubious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also suggest that the fork-protection be removed from asyncio connection pools. When forking using asyncio, all bets are off. Threading and forking are fundamentally incompatible with asyncio. I´ll make a separate PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say a separate PR would probably be better, since it may need other changes and/or improved tests.
I agree, threading support should neither be supported nor be expected.
Regarding removing multiprocessing support, I'm definitely in favour of it. But @chayim @dvora-h would've to take a call for release plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer tight, surgical PRs. It allows us to both review code, comment effectively - and frankly, helps us not miss things. This includes surgical tests.
Feature parity (where possible) belongs in all of our usage models:
- Sync
- Async
- Sync Cluster
- Async Cluster
I'm generally against removing features that people are using. However, I'm pro pre-announced deprecation (see the Python announcement in the README).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chayim Since we're already dropping 3.6 support in the next release, can we also announce that support for multiprocessing will also be dropped or perhaps be allowed through a flag in the same breaking release?
raise ConnectionError(f"Error while reading from socket: {ex.args}") | ||
return True | ||
|
||
async def read_from_socket(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dvora-h can you dig into this function?
b6e6137
to
27addb5
Compare
4d0ea9d
to
b7bbcc8
Compare
A separate PR, #2308, proposes to remove this read lock entirely, and would simplify this PR further. |
cba774f
to
409aedc
Compare
@kristjanvalur Can you merge master in so I can see if tests pass and merge this PR? |
409aedc
to
5ea981b
Compare
@kristjanvalur I think a better title for this may be your comment in the CHANGES file - which completely changes how I think about this PR. IMHO this becomes a feature. For 4.4.0-rc3. WDYT? |
Well, the I had actually been thinking about removing this "feature" and adding it separately, post-hoc, to make the PR more "clean". But whatever you think is best, I'm happy with. I think it is important to straighten out and simplify the async connection code so that we can more easily work on simplifications closer to the metal, for example, remove double buffering, etc. Would you like me to change the title? |
@kristjanvalur Same here, conflicts... |
timeout=None
in PubSub.get_message()
to wait forever
Given that we'd like to get this into 4.4.0 rc2... I'm voting for take as is. GitHub really needs a "100%" reaction. I think this is great :) |
@kristjanvalur last one... |
5ea981b
to
38351fe
Compare
remove nullcontext
38351fe
to
305f848
Compare
Ok, this one was a bit trickier, lots of related commits that went in... lets see how it does in CI |
Thanks. That was a bunch of related PRs that all sort of complemented each other. I hope with this in place to be able to do further work, particularly on the hiredis parser, to reduce overhead. |
Pull Request check-list
Please make sure to review and check all of these items:
$ tox
pass with this change (including linting)?NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Description of change
async
PubSub
no longer does a separatecan_read
for every message read. Instead async timeout mechanisms are used at the highest level to time out operations.The use of
can_read
is a holdover from the non-async times, when timeout had to happen at the lowest level, at the actual blocking socket call. With asyncio, timeouts happen higher up in the stack. This allows for simplifications, because there is no longer any need to have timeout code lower down in the stack.can_read()
is still kept around, but its only use now is assertions in theConnectionPool
. This could be simplified further, since havingcan_read()
around requires additional code paths for read, but not consumed, data.Obsolete code pertaining to blocking exceptions was also removed.
Additionally,
get_message()
now acceptstimeout=None
argument to wait indefinitely for a message.