Skip to content

SignalR Core Abort() leads to undefined state #21422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tommigun1980 opened this issue May 2, 2020 · 9 comments
Closed

SignalR Core Abort() leads to undefined state #21422

Tommigun1980 opened this issue May 2, 2020 · 9 comments
Assignees
Labels
area-signalr Includes: SignalR clients and servers investigate
Milestone

Comments

@Tommigun1980
Copy link

Tommigun1980 commented May 2, 2020

Describe the bug

Calling Abort() in a hub leads to the client's connection status being set to Reconnecting, but only sometimes is a reconnection attempt made. Client's Closed callback is not invoked when this happens. Stopping the connection on the client, while in this state, throws an exception (this part is correct I guess).

Expected behaviour:
When Abort() is called in a hub on server:
Client's Closed callback should get invoked, and client's status should be set to Disconnected. Client should not try to reconnect. It is problematic that it ends up in a Reconnecting state and sometimes tries to reconnect, while completely skipping the closed callback.

To Reproduce

In server's hub, call the following. (Please note that I am actually calling Abort() from outside the hub, by storing the context of the hub in a ConcurrentDictionary in a singleton service as per #5333, until there is a real way to disconnect clients outside the hub. I would assume the bug is exposed even when just calling Abort() inside a hub such as:).

this.Context.Abort();

On client:

        public async Task Connect()
        {
            // close existing connection to allow for a new token
            if (this.connection != null)
            {
                if (this.connection.State != HubConnectionState.Disconnected)
                    await this.connection.StopAsync();
                await this.connection.DisposeAsync(); // TODO: is this needed??
                this.connection = null;
            }

            this.connection = new HubConnectionBuilder()
                .WithUrl("someUrl", (options) =>
                    .options.AccessTokenProvider = () => Task.FromResult("SomeToken");
                )
                .WithAutomaticReconnect()
                .Build();

            this.connection.Closed -= this.OnConnectionClosed;
            this.connection.Closed += this.OnConnectionClosed;
            await this.connection.StartAsync();
        }

        private Task OnConnectionClosed(Exception exception)
        {
            Console.WriteLine($"Connection closed: {exception}");
            if (exception is HubException)
            {
                // cleanup here
            }
            return Task.CompletedTask;
        }

Further technical details

  • ASP.NET Core version:
    3.1.2

  • Include the output of dotnet --info:
    .NET Core SDK (reflecting any global.json):
    Version: 3.1.200
    Commit: c5123d973b

Runtime Environment:
OS Name: Mac OS X
OS Version: 10.15
OS Platform: Darwin
RID: osx.10.15-x64
Base Path: /usr/local/share/dotnet/sdk/3.1.200/

Host (useful for support):
Version: 3.1.2
Commit: 916b5cba26

.NET Core SDKs installed:
3.1.200 [/usr/local/share/dotnet/sdk]

.NET Core runtimes installed:
Microsoft.AspNetCore.App 3.1.2 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.1.16 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.17 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 3.1.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

  • The IDE (VS / VS Code/ VS4Mac) you're running on, and it's version:
    VS For Mac 8.6 Preview (build 4387)
@mkArtakMSFT mkArtakMSFT added the area-signalr Includes: SignalR clients and servers label May 2, 2020
@Tommigun1980
Copy link
Author

Tommigun1980 commented May 2, 2020

I edited my post to say that the client does actually try to reconnect sometimes. For the first five or so tries the client didn't reconnect, but today it sometimes tries to reconnect. This is even worse as it's sporadic. It is extremely difficult to come up with code that works reliably with how this works atm and I'd be thrilled if this could get fixed. It is absolutely mandatory that the server can close a connection reliably due to privacy reasons (a user logging out should not receive any data after that point). Please classify this as a high impact bug.

Thanks.

@davidfowl
Copy link
Member

@Tommigun1980 can you provide a sample? It will help use reproduce the problem. Those code snippets are helpful but don't really show how the application is aborting the socket. A few comments:

  • There's an inherent race between the client disconnecting and you calling Abort. If we're in the process of tearing down a connection and you call Abort then it may or may not reconnect.
  • To aid us in debugging to this:
    • Turn up the logging verbosity on the client and server side. It'll be a bit spammy but if you can reproduce the problem it'll might help to show what the server did and what the client expects.
    • I might be able to give you some logic to print out the raw client messages (I'll look into this).
  • When the server closes the connection gracefully, it sends a close message. This close message has an AllowReconnect boolean that the client uses to determines if it should reconnect after a close. The Abort API on the server side should be setting this boolean to false so that it forces the client to stop reconnecting. That's something that you can use to help determine if this is indeed random or if it makes sense (maybe the close message didn't have it for other reasons).

See this document that explains how to get logs https://docs.microsoft.com/en-us/aspnet/core/signalr/diagnostics?view=aspnetcore-3.1

@halter73 maybe a bug in the reconnect logic.

@BrennanConroy BrennanConroy added the Needs: Author Feedback The author of this issue needs to respond in order for us to continue investigating this issue. label May 4, 2020
@halter73
Copy link
Member

halter73 commented May 4, 2020

When the server closes the connection gracefully, it sends a close message. This close message has an AllowReconnect boolean that the client uses to determines if it should reconnect after a close. The Abort API on the server side should be setting this boolean to false so that it forces the client to stop reconnecting. That's something that you can use to help determine if this is indeed random or if it makes sense (maybe the close message didn't have it for other reasons).

I've noticed that HubConnectionContext.Abort() usually doesn't result in the close message (which contains AllowReconnect = false in the Abort() case) actually being sent, because we abort the underlying connection immediately after the CloseMessage is buffered. This is a race though, so if the client does get the message, it will not try to reconnect. Of course, when the client doesn't get the close message, the disconnect is indistinguishable from other connection issues, and the client will try to reconnect.

Since we do want to abort the underlying the connection when Abort() is called, ensuring the client gets the close message is a little tricky. It would probably involve flushing the message, and then waiting for the ConnectionClosed token to fire with a timeout that aborts the underlying connection if the client doesn't close the connection itself quickly enough in response to the close message.

Another option is to always allow the client to reconnect after HubConnectionContext.Abort() is called, so at least the behavior is consistent whether or not the close message is received.

Reconnecting, but only sometimes is a reconnection attempt made.

This shouldn't happen. Client logs and/or a sample would help a lot.

Stopping the connection on the client, while in this state, throws an exception (this part is correct I guess)

Stopping the connection on the client while in the Reconnecting state should work fine. Client logs and/or a sample would likely help diagnose what's going on here too.

@ghost
Copy link

ghost commented May 8, 2020

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

@davidfowl
Copy link
Member

I've noticed that HubConnectionContext.Abort() usually doesn't result in the close message (which contains AllowReconnect = false in the Abort() case) actually being sent, because we abort the underlying connection immediately after the CloseMessage is buffered. This is a race though, so if the client does get the message, it will not try to reconnect. Of course, when the client doesn't get the close message, the disconnect is indistinguishable from other connection issues, and the client will try to reconnect.

That seems like something we should fix.

@ghost
Copy link

ghost commented May 12, 2020

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

@BrennanConroy BrennanConroy added investigate and removed Needs: Author Feedback The author of this issue needs to respond in order for us to continue investigating this issue. labels May 12, 2020
@BrennanConroy BrennanConroy added this to the Next sprint planning milestone May 13, 2020
@BrennanConroy BrennanConroy self-assigned this Aug 26, 2020
@Tommigun1980
Copy link
Author

Thanks for looking into this! I see it's tagged into 5.0.0 so is it safe to assume this will be fixed by the time 5.0.0 rolls out?

Thanks so much.

@halter73
Copy link
Member

halter73 commented Sep 2, 2020

We still need to make the fix, but that's the plan.

@BrennanConroy
Copy link
Member

Closed via #25693

@ghost ghost locked as resolved and limited conversation to collaborators Oct 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-signalr Includes: SignalR clients and servers investigate
Projects
None yet
Development

No branches or pull requests

5 participants