Blazor reconnection tests are broken #12788

rynowak · 2019-08-01T02:57:29Z

These tests are currently broken in release/3.0

@javiercn made changes this morning that implement circuit cleanup on graceful disconnect. However, our reconnection tests trigger a graceful reconnect - which cleans up the circuit and prevents reconnection.

We have no reason to believe anything is wrong in the product at this point. When you fix this, look for skipped tests with a link to this issue.

javiercn · 2019-08-01T10:46:27Z

Yeah, I'm trying to work out how to simulate a non-graceful disconnection on the tests. If we don't find a way we might need to test the feature in some other way.

javiercn · 2019-08-01T13:47:33Z

I've been trying to use an alternative approach to enable these tests. I've replaced the IHttpUpgrade feature so that I can provide my own test stream that allows me to introduce errors in the application.

I've been trying different approaches:

Throwing exceptions while read/write.
Modifying the payload on the results of read/write.
Avoid returning any data on read/write.

I haven't managed to create a condition where the connection appears to go bad from the SignalR perspective (manage websocket will always throw an exception saying connection closed prematurely and we will catch and silently ignore it, disconnecting gracefully).

This makes me put in perspective several things about the feature:

Is the event of an ungraceful disconnection common?
Is it not the case that in most cases we will go through this path where the close frame was not sent and we will abort the connection and close "gracefully"?
If that is the case, should we revert the "graceful" disconnection feature that we added, as it might cause more harm than good by prematurely disposing circuits to which otherwise we would have ben able to successfully reconnect.

I'm currently trying to correctly read the WebSocket header payload but cause an error after that to see if that causes an issue.

I also plan to recruit help from someone on the SignalR team to see if we can get to something reasonable, but at this point I'm concerned that ungraceful disconnections get confused with "graceful ones" (closing the browser, refreshing the page, navigating away) and that we dispose circuits to aggressively as a result.

SteveSandersonMS · 2019-08-01T14:48:39Z

Is the event of an ungraceful disconnection common?

I'd go further and check we can prove it's even possible in the real world, not just tests.

If you deploy an app to Azure, connect through wifi (e.g., on a phone), then walk out of range of the wifi signal, does that cause a graceful or a disgraceful disconnect?
Which one happens if you switch the phone into flight mode?
Which one happens if you're on a laptop and close the lid?

We ought to be sure that disgraceful disconnects are the common case (except when genuinely closing the browser/tab), otherwise we've invested a lot into a feature that will never be used.

SteveSandersonMS · 2019-08-01T14:51:26Z

There is a different technical solution here, which is to only do a "graceful disconnect" when we receive the beacon from an unload handler. That bypasses all the uncertainty about whether the network traffic is right about whether the user really intended to disconnect.

Drawbacks to using beacon:

Doesn't cover the case where the browser crashes (we'd keep the circuit open, even though there's actually no hope of reconnecting)
Doesn't work if you lack client-server affinity, since the beacon could go to any server
Makes it much easier to fill the server with abandoned circuits, since you can just choose not to send the beacon. The existing design is harder to trick because you need to leave a half-open connection, which involves dropping down to a very low level in the network stack, beyond what most networking APIs allow.

rynowak · 2019-08-01T15:59:11Z

I went over and spoke to @anurse and @BrennanConroy about this

One way we can trigger a bad disconnect would be to call abort on the context on the server. They are not sure if there's another way to do it that would work in tests.

javiercn · 2019-08-01T16:12:36Z

Doesn't work if you lack client-server affinity, since the beacon could go to any server

Reconnection wouldn't work either.

Makes it much easier to fill the server with abandoned circuits, since you can just choose not to send the beacon. The existing design is harder to trick because you need to leave a half-open connection, which involves dropping down to a very low level in the network stack, beyond what most networking APIs allow.

Re-connection is best effort and we can always remove the circuits in a FIFO fashion.

Doesn't cover the case where the browser crashes (we'd keep the circuit open, even though there's actually no hope of reconnecting)

If the browser crashes I think it would be treated as a graceful disconnect in the current scenario.

I'd go further and check we can prove it's even possible in the real world, not just tests.

If you deploy an app to Azure, connect through wifi (e.g., on a phone), then walk out of range of the wifi signal, does that cause a graceful or a disgraceful disconnect?

Which one happens if you switch the phone into flight mode?

Which one happens if you're on a laptop and close the lid?

We ought to be sure that disgraceful disconnects are the common case (except when genuinely closing the browser/tab), otherwise we've invested a lot into a feature that will never be used.

I still think there are ways we can cause it to be ungracefully terminated, the main thing is that it won't be treated as ungraceful unless you are in the middle of receiving a message.

For example, if we throw while waiting for a message to arrive, it won't be considered ungraceful, and that might happen either due to a network loss or the browser being closed.
If we throw after we've read the header but not the entire payload, that can cause the server to barf too.

I plan to physically try some of these things:

Turn of phone wifi.
Turn on airplane mode.
Go away from wifi range.
Kill browser process.

Update:
I managed to make this work in test scenarios. It involves a bit of trickery, but it works. We provide our own IHttpWebSocketsFeature, it does the same thing that the WebSocketsMiddleware one does.

We wrap the websocket it creates with a test socket of our own we can use to trigger exceptions whenever we want.
We do this conditionally on tests based on the presence of a cookie.
We trigger the crash with a request from javascript to a special controller.

Although we can make sure the feature works, we need to better understand the scenarios E2E to make sure there is no situation in which it can have issues where a temporary disconnect gets interpreted as a graceful one.

SteveSandersonMS · 2019-08-02T09:24:55Z

Doesn't cover the case where the browser crashes (we'd keep the circuit open, even though there's actually no hope of reconnecting)
If the browser crashes I think it would be treated as a graceful disconnect in the current scenario.

Yes that's my point exactly. On browser crash, the current implementation would discard the circuit, whereas a beacon-based one wouldn't.

SteveSandersonMS · 2019-08-02T09:37:45Z

Moving the discussion about whether reconnection actually works to #12828

Let's now use this issue only to discuss the state of the E2E tests, not the actual product code.

rynowak added bug This issue describes a behavior which is not expected - a bug. area-blazor Includes: Blazor, Razor Components labels Aug 1, 2019

rynowak added this to the 3.0.0-preview9 milestone Aug 1, 2019

rynowak assigned javiercn Aug 1, 2019

javiercn added the Working label Aug 1, 2019

This was referenced Aug 2, 2019

Understand when disconnections are graceful or not #12828

Closed

[Blazor][Fixes #12788] Fixes blazor reconnection tests #12813

Closed

javiercn added Done This issue has been fixed and removed Working labels Aug 7, 2019

javiercn closed this as completed Aug 7, 2019

ghost locked as resolved and limited conversation to collaborators Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blazor reconnection tests are broken #12788

Blazor reconnection tests are broken #12788

rynowak commented Aug 1, 2019

javiercn commented Aug 1, 2019

Uh oh!

javiercn commented Aug 1, 2019

Uh oh!

SteveSandersonMS commented Aug 1, 2019 •

edited

Loading

Uh oh!

SteveSandersonMS commented Aug 1, 2019 •

edited

Loading

Uh oh!

rynowak commented Aug 1, 2019

Uh oh!

javiercn commented Aug 1, 2019

Uh oh!

SteveSandersonMS commented Aug 2, 2019

Uh oh!

SteveSandersonMS commented Aug 2, 2019

Uh oh!

Blazor reconnection tests are broken #12788

Blazor reconnection tests are broken #12788

Comments

rynowak commented Aug 1, 2019

javiercn commented Aug 1, 2019

Uh oh!

javiercn commented Aug 1, 2019

Uh oh!

SteveSandersonMS commented Aug 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SteveSandersonMS commented Aug 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rynowak commented Aug 1, 2019

Uh oh!

javiercn commented Aug 1, 2019

Uh oh!

SteveSandersonMS commented Aug 2, 2019

Uh oh!

SteveSandersonMS commented Aug 2, 2019

Uh oh!

SteveSandersonMS commented Aug 1, 2019 •

edited

Loading

SteveSandersonMS commented Aug 1, 2019 •

edited

Loading