Skip to content

Blazor reconnection tests are broken #12788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rynowak opened this issue Aug 1, 2019 · 8 comments
Closed

Blazor reconnection tests are broken #12788

rynowak opened this issue Aug 1, 2019 · 8 comments
Assignees
Labels
area-blazor Includes: Blazor, Razor Components bug This issue describes a behavior which is not expected - a bug. Done This issue has been fixed

Comments

@rynowak
Copy link
Member

rynowak commented Aug 1, 2019

These tests are currently broken in release/3.0

@javiercn made changes this morning that implement circuit cleanup on graceful disconnect. However, our reconnection tests trigger a graceful reconnect - which cleans up the circuit and prevents reconnection.

We have no reason to believe anything is wrong in the product at this point. When you fix this, look for skipped tests with a link to this issue.

@rynowak rynowak added bug This issue describes a behavior which is not expected - a bug. area-blazor Includes: Blazor, Razor Components labels Aug 1, 2019
@rynowak rynowak added this to the 3.0.0-preview9 milestone Aug 1, 2019
@javiercn
Copy link
Member

javiercn commented Aug 1, 2019

Yeah, I'm trying to work out how to simulate a non-graceful disconnection on the tests. If we don't find a way we might need to test the feature in some other way.

@javiercn
Copy link
Member

javiercn commented Aug 1, 2019

I've been trying to use an alternative approach to enable these tests. I've replaced the IHttpUpgrade feature so that I can provide my own test stream that allows me to introduce errors in the application.

I've been trying different approaches:

  • Throwing exceptions while read/write.
  • Modifying the payload on the results of read/write.
  • Avoid returning any data on read/write.

I haven't managed to create a condition where the connection appears to go bad from the SignalR perspective (manage websocket will always throw an exception saying connection closed prematurely and we will catch and silently ignore it, disconnecting gracefully).

This makes me put in perspective several things about the feature:

  • Is the event of an ungraceful disconnection common?
  • Is it not the case that in most cases we will go through this path where the close frame was not sent and we will abort the connection and close "gracefully"?
  • If that is the case, should we revert the "graceful" disconnection feature that we added, as it might cause more harm than good by prematurely disposing circuits to which otherwise we would have ben able to successfully reconnect.

I'm currently trying to correctly read the WebSocket header payload but cause an error after that to see if that causes an issue.

I also plan to recruit help from someone on the SignalR team to see if we can get to something reasonable, but at this point I'm concerned that ungraceful disconnections get confused with "graceful ones" (closing the browser, refreshing the page, navigating away) and that we dispose circuits to aggressively as a result.

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Aug 1, 2019

Is the event of an ungraceful disconnection common?

I'd go further and check we can prove it's even possible in the real world, not just tests.

  • If you deploy an app to Azure, connect through wifi (e.g., on a phone), then walk out of range of the wifi signal, does that cause a graceful or a disgraceful disconnect?
  • Which one happens if you switch the phone into flight mode?
  • Which one happens if you're on a laptop and close the lid?

We ought to be sure that disgraceful disconnects are the common case (except when genuinely closing the browser/tab), otherwise we've invested a lot into a feature that will never be used.

@SteveSandersonMS
Copy link
Member

SteveSandersonMS commented Aug 1, 2019

There is a different technical solution here, which is to only do a "graceful disconnect" when we receive the beacon from an unload handler. That bypasses all the uncertainty about whether the network traffic is right about whether the user really intended to disconnect.

Drawbacks to using beacon:

  • Doesn't cover the case where the browser crashes (we'd keep the circuit open, even though there's actually no hope of reconnecting)
  • Doesn't work if you lack client-server affinity, since the beacon could go to any server
  • Makes it much easier to fill the server with abandoned circuits, since you can just choose not to send the beacon. The existing design is harder to trick because you need to leave a half-open connection, which involves dropping down to a very low level in the network stack, beyond what most networking APIs allow.

@rynowak
Copy link
Member Author

rynowak commented Aug 1, 2019

I went over and spoke to @anurse and @BrennanConroy about this

One way we can trigger a bad disconnect would be to call abort on the context on the server. They are not sure if there's another way to do it that would work in tests.

@javiercn
Copy link
Member

javiercn commented Aug 1, 2019

  • Doesn't work if you lack client-server affinity, since the beacon could go to any server

Reconnection wouldn't work either.

  • Makes it much easier to fill the server with abandoned circuits, since you can just choose not to send the beacon. The existing design is harder to trick because you need to leave a half-open connection, which involves dropping down to a very low level in the network stack, beyond what most networking APIs allow.

Re-connection is best effort and we can always remove the circuits in a FIFO fashion.

  • Doesn't cover the case where the browser crashes (we'd keep the circuit open, even though there's actually no hope of reconnecting)

If the browser crashes I think it would be treated as a graceful disconnect in the current scenario.

I'd go further and check we can prove it's even possible in the real world, not just tests.

  • If you deploy an app to Azure, connect through wifi (e.g., on a phone), then walk out of range of the wifi signal, does that cause a graceful or a disgraceful disconnect?
  • Which one happens if you switch the phone into flight mode?
  • Which one happens if you're on a laptop and close the lid?

We ought to be sure that disgraceful disconnects are the common case (except when genuinely closing the browser/tab), otherwise we've invested a lot into a feature that will never be used.

I still think there are ways we can cause it to be ungracefully terminated, the main thing is that it won't be treated as ungraceful unless you are in the middle of receiving a message.

  • For example, if we throw while waiting for a message to arrive, it won't be considered ungraceful, and that might happen either due to a network loss or the browser being closed.
  • If we throw after we've read the header but not the entire payload, that can cause the server to barf too.

I plan to physically try some of these things:

  • Turn of phone wifi.
  • Turn on airplane mode.
  • Go away from wifi range.
  • Kill browser process.

Update:
I managed to make this work in test scenarios. It involves a bit of trickery, but it works. We provide our own IHttpWebSocketsFeature, it does the same thing that the WebSocketsMiddleware one does.

  • We wrap the websocket it creates with a test socket of our own we can use to trigger exceptions whenever we want.
  • We do this conditionally on tests based on the presence of a cookie.
  • We trigger the crash with a request from javascript to a special controller.

Although we can make sure the feature works, we need to better understand the scenarios E2E to make sure there is no situation in which it can have issues where a temporary disconnect gets interpreted as a graceful one.

@SteveSandersonMS
Copy link
Member

Doesn't cover the case where the browser crashes (we'd keep the circuit open, even though there's actually no hope of reconnecting)
If the browser crashes I think it would be treated as a graceful disconnect in the current scenario.

Yes that's my point exactly. On browser crash, the current implementation would discard the circuit, whereas a beacon-based one wouldn't.

@SteveSandersonMS
Copy link
Member

Moving the discussion about whether reconnection actually works to #12828

Let's now use this issue only to discuss the state of the E2E tests, not the actual product code.

@javiercn javiercn added Done This issue has been fixed and removed Working labels Aug 7, 2019
@javiercn javiercn closed this as completed Aug 7, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Dec 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-blazor Includes: Blazor, Razor Components bug This issue describes a behavior which is not expected - a bug. Done This issue has been fixed
Projects
None yet
Development

No branches or pull requests

3 participants