-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Proposed fix for GC rooting + extensive notes on how/why are in TimerToken #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed fix for GC rooting + extensive notes on how/why are in TimerToken #2413
Conversation
|
Test failure is expecting |
fix: actually increment _backlogCurrentEnqueued (oops)
Co-authored-by: Nick Craver <[email protected]>
…ctAndReconnectThrowsConnectionExceptionSync - fix tyop
|
one test still failing (
@NickCraver you know more about this test than me; care to opine? You mentioned that this was one of the brittle ones; we seem to have accidentally made it non-brittle by making it reliably fail; should we handle both messages? or is this wrong? |
| // we can still process it to avoid making things worse/more complex, | ||
| // but: we can't reliably assume this works, so: shout now! | ||
| next.Cancel(); | ||
| next.Complete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A return might be missing here so that the cancelled message is not enqueued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that; another version had throw. I'm not too concerned about writing it though - although nobody can ever see the reply. But if we were to do anything, I think it would need to be return false; and not write if we didn't enqueue. However, in reality, this should.never happen; the muxer should still be alive simply by the nature of being here. Another way we could fix it would be to pass the muxer upwards when doing this, and GC.KeepAlive before exiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to pre-empt the obvious "then why is this code here?": If I can't be absolutely sure we can't get into this scenario, I'd rather be paranoid; without the heartbeat, there are scenarios where messages might not be terminated. Emphasis: we should have a muxer: that's literally what the rest of the PR is meant to guarantee. Just if I'm wrong, I'd rather give people cancellation faults than give them nothing at all.
| ## Unreleased | ||
|
|
||
| No pending changes. | ||
| - Fix [#2392](https://github.com/StackExchange/StackExchange.Redis/issues/2392): critical (but rare) GC bug that can lead to async tasks never completing if the multiplexer is not held by the consumer ([#2413 by mgravell](https://github.com/StackExchange/StackExchange.Redis/pull/2413)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that #2392 is a different issue, the issue fixed by this PR was discovered by @NickCraver while working on PR #2408.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - we'll refine release notes here, want to test with all combined manually
|
@mgravell +1 on test changes - I think that's fine, the goal of that specific test set isn't the type thrown (that's checked very explicitly in the exception tests), so being more lenient there is perfectly fine by me. |
|
Status: my only concern here (hopefully unfounded) is lock perf - we're going to run some tests on that front and then should be good to merge in - I'll update release docs and all that goodness here but not imperative in this PR since it's going into the other. |
|
Oh, right; forgot I wasn't merging to main! My PR number will be confusing in the release notes; we should fix that in the final outer PR |
…fixes for outstanding scenarios (#2408) This combination PR is both fixing a GC issue (see below, and #2413 for details) and improves timeout exception. Basically if a timeout happens for a message that was in the backlog but was never sent, the user now gets a much more informative message like this: > Exception: The message timed out in the backlog attempting to send because no connection became available - Last Connection Exception: InternalFailure on 127.0.0.1:6379/Interactive, Initializing/NotStarted, last: GET, origin: ConnectedAsync, outstanding: 0, last-read: 0s ago, last-write: 0s ago, keep-alive: 500s, state: Connecting, mgr: 10 of 10 available, last-heartbeat: never, last-mbeat: 0s ago, global: 0s ago, v: 2.6.99.22667, command=PING, inst: 0, qu: 0, qs: 0, aw: False, bw: CheckingForTimeout, last-in: 0, cur-in: 0, sync-ops: 1, async-ops: 1, serverEndpoint: 127.0.0.1:6379, conn-sec: n/a, aoc: 0, mc: 1/1/0, mgr: 10 of 10 available, clientName: NAMISTOU-3(SE.Redis-v2.6.99.22667), IOCP: (Busy=0,Free=1000,Min=32,Max=1000), WORKER: (Busy=2,Free=32765,Min=32,Max=32767), POOL: (Threads=18,QueuedItems=0,CompletedItems=65), v: 2.6.99.22667 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts) Today, this isn't intuitive especially for connections with `AbortOnConnectFail` set to `false`. What happens is a multiplexer _never_ connects successfully, but the user just gets generic timeouts. This makes the error more specific and includes the inner exception (also as `.InnerException`) for more details, informing the user of a config/auth/whatever error underneath as to why things are never successfully sending. Also adds `aoc: (0|1)` to the exception message for easier advice in issues (reflecting what `AbortOnConnectFail` is set to). Co-authored-by: Nick Craver <[email protected]> Co-authored-by: Marc Gravell <[email protected]>
Adds automatic root/unroot; please read notes in the
TimerTokenclass (inConnectionMultiplexer.cs)