Skip to content

Customizing HTTP/2 abort response #10886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JamesNK opened this issue Jun 5, 2019 · 43 comments
Closed

Customizing HTTP/2 abort response #10886

JamesNK opened this issue Jun 5, 2019 · 43 comments
Assignees
Labels
accepted This issue has completed "acceptance" testing (including accessibility) area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Done This issue has been fixed feature-kestrel HTTP2

Comments

@JamesNK
Copy link
Member

JamesNK commented Jun 5, 2019

In gRPC there is the concept of a deadline. The deadline is sent by the client to the server with a gRPC call in a header. It is the client saying to the server "you have X seconds to finish this call, after-which give up".

  • If the server finished the call in less than X sends then the response completes as usual.
  • If the server exceeds the X seconds then it immediately finishes the HTTP/2 response, sending a RST_STREAM. The user's gRPC method code will likely still be executing after this. If they are good then it will be using a cancellation token, and they stop whatever they're doing quickly. But they might not, and continue doing stuff after the response is complete. This is fine. It is also fine if an error eventually gets thrown because they tried to do an HTTP thing.

Today the gRPC server is finishing the connection when the deadline is hit by calling HttpContext.Abort(). This immediately sends RST_STREAM with an INTERNAL_ERROR error code. The request delegate with the user's code is still running. Using Abort is not ideal, but its response seems to be acceptable to the various gRPC clients of the world.

Today that response on Abort looks very hard-coded in Http2Stream:

protected override void ApplicationAbort()
{
    var abortReason = new ConnectionAbortedException(CoreStrings.ConnectionAbortedByApplication);
    ResetAndAbort(abortReason, Http2ErrorCode.INTERNAL_ERROR);
}

I think the current abort behavior is fine for gRPC to use with deadline, the one thing that could be improved is the abort response over HTTP/2. We're sending INTERNAL_ERROR, and no status trailers. Another gRPC implementation I looked at immediately ends the response, but it sends RST_STREAM (NO_ERROR), along with some HTTP/2 trailing headers.

It would be nice to have a hook in ASP.NET Core to modify the response when an HTTP/2 abort occurs, such as changing the Http2ErrorCode used with RST_STREAM, or appending trailers.

Thoughts around the area?
Or alternatives?

@Tratcher
Copy link
Member

Tratcher commented Jun 5, 2019

it sends RST_STREAM (NO_ERROR), along with some HTTP/2 trailing headers.

How does it do that? It should only work if they send the trailers first as trailers after RST would be invalid.

It would be nice to have a hook in ASP.NET Core to modify the response when an HTTP/2 abort occurs, such as changing the Http2ErrorCode used with RST_STREAM, or appending trailers.

I could image having an HTTP Feature and response extension for providing a Reset method with a custom error code (and another one for GoAway). Also including trailers would be complicated.

You still have a multi-threading race condition here though. Only one thread is supposed to touch HttpContext, so calling Reset from a timer thread would not be safe.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 5, 2019

How does it do that? It should only work if they send the trailers first as trailers after RST would be invalid.

HEADER frame first, then RST_STREAM

image

Only one thread is supposed to touch HttpContext

Does that include HttpContext.Abort? I don't mind if regular ASP.NET Core thread throws an error because abort has been called. How would you suggest implementing deadline functionality that I've described without another thread?

@Tratcher
Copy link
Member

Tratcher commented Jun 5, 2019

Does that include HttpContext.Abort? I don't mind if regular ASP.NET Core thread throws an error because abort has been called. How would you suggest implementing deadline functionality that I've described without another thread?

Yes, right now it does. Let's discuss the thread safety issue over on #9239 (comment).

@halter73
Copy link
Member

halter73 commented Jun 5, 2019

It would be nice to have a hook in ASP.NET Core to modify the response when an HTTP/2 abort occurs, such as changing the Http2ErrorCode used with RST_STREAM, or appending trailers.

Could we just add a new HTTP/2-specific feature that allows you to both abort the request and configure RST_STREAM all at once? Basically it would be a specially HTTP/2 version of support with extra parameters. Since you already control the timer callback that calls IHttpRequestLifetimeFeature.Abort(), I don't see the need for a callback-based API.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 5, 2019

IHttp2AbortThingyFeature.Abort(Http2ErrorCode) could be enough. I want to double check whether returning trailers with a gRPC deadline is a nice to have or not.

@Tratcher
Copy link
Member

Tratcher commented Jun 5, 2019

RE: Trailers, sending them would be a protocol violation if the response had a content-length and you hadn't already finished the body.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 6, 2019

Ok, I've refactored gRPC deadline logic so that it no longer uses a Timer plus HttpContext.Abort

PR here - https://github.com/grpc/grpc-dotnet/pull/301/files

There is now a separate path the server uses when there is a deadline. It produces a better response than HttpContext.Abort, but it allocates a lot more.

  • Before there was just a Timer. Allocations were 520B.
  • With refactor there is a CTS, Task.Delay, Task.WhenAny. Allocations are 1.09KB

Deadline related code here:
https://github.com/grpc/grpc-dotnet/blob/e2391970cce9b8428a1dc30ea8e612d617bd4e02/src/Grpc.AspNetCore.Server/Internal/CallHandlers/ServerCallHandlerBase.cs#L99-L145

I do not know whether this a good idea. It would be great to have input from experts in this area.

// @davidfowl @Tratcher @halter73

@JamesNK
Copy link
Member Author

JamesNK commented Jun 7, 2019

I don't think my refactor in gRPC will work. It results in holding onto the HttpContext/Http2Stream after the request pipeline has finished. Calling HttpContext.Abort is not a solution because trailers need to be written and they aren't when the request is aborted.

I think the solution is some changes in Kestrel and this issue's original request is the best way forward I can think of:

  • Add an option to customize the Http2ErrorCode that is sent on Abort
  • Add an option to write trailers on Abort (or maybe always writing trailers on abort makes sense in HTTP/2)

Good idea? Bad idea?
Simple to do? Hard?

@JamesNK
Copy link
Member Author

JamesNK commented Jun 7, 2019

Alternatively, an option could be to use Response.BodyWriter.Complete() to send the trailers and RST_STREAM

#7370

@halter73
Copy link
Member

halter73 commented Jun 7, 2019

If BodyWriter.Complete() was used to send the trailers, that would result in an a frame with an end-of-stream flag. I'm not sure it's valid to send an RST stream after that.

Add an option to customize the Http2ErrorCode that is sent on Abort

This seems necessary.

Add an option to write trailers on Abort (or maybe always writing trailers on abort makes sense in HTTP/2)

@Tratcher and I were discussing this yesterday. This seems weird to us. What's the other gRPC implementation that does this?

@JamesNK
Copy link
Member Author

JamesNK commented Jun 7, 2019

If BodyWriter.Complete() was used to send the trailers, that would result in an a frame with an end-of-stream flag. I'm not sure it's valid to send an RST stream after that.

It is valid.

An HTTP response is complete after the server sends — or the client receives — a frame with the END_STREAM flag set (including any CONTINUATION frames needed to complete a header block). A server can send a complete response prior to the client sending an entire request if the response does not depend on any portion of the request that has not been sent and received. When this is true, a server MAY request that the client abort transmission of a request without error by sending a RST_STREAM with an error code of NO_ERROR after sending a complete response (i.e., a frame with the END_STREAM flag). Clients MUST NOT discard responses as a result of receiving such a RST_STREAM, though clients can always discard responses at their discretion for other reasons.

https://httpwg.org/specs/rfc7540.html#rfc.section.8.1

What's the other gRPC implementation that does this?

The primary gRPC implementation (C-core) does this. In this Wireshark trace of the deadline response from C-core you can see trailers followed by a RST_STREAM

image

image

@JamesNK
Copy link
Member Author

JamesNK commented Jun 8, 2019

Here are 3 scenarios around deadline, and what C-core does in each:

  1. Server exceeds deadline without writing headers. Initial state is nothing has been written. On deadline C-core will send trailers and headers back together in one HEADERS frame. The frame has END_STREAM=true. RST_STREAM NO_ERROR frame is then sent.

  2. Server exceeds deadline after writing headers. Initial state is HEADERS has been written. On deadline C-core will send trailers in a new HEADERS frame. The frame has END_STREAM=true. RST_STREAM NO_ERROR frame is then sent.

  3. Server exceeds deadline after writing headers+message. Initial state is HEADERS and a DATA with the message have been written. On deadline C-core will send trailers in a new HEADERS frame. The frame has END_STREAM=true. RST_STREAM NO_ERROR frame is then sent.

Note that gRPC never uses content-length. content-length is not a problem for us but I understand that it something you will need to validate for other users.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 8, 2019

Proposal (high-level, name and shape TBD):

  • CompleteAsync() - Like FlushAsync() but will force trailers to be sent. It will also cause the last frame to have END_STREAM=true set on it. The current Response.BodyWriter.Complete is not enough because we need to wait for it to finish. After CompleteAsync is finished then abort would be called to send RST_STREAM.
  • ResetHttp2Stream(Http2ErrorCode) - Like HttpContext.Abort but lets you specify the error code sent with RST_STREAM frame.

How gRPC would use these in the scenarios mentioned above:

  1. gRPC would add the status trailers to the headers collection, call CompleteAsync (this would cause the HEADERS frame to have END_STREAM=true set on it), then call abort. User code would be awaited.

  2. gRPC would add the status trailers to the trailers collection, call CompleteAsync (this would cause the HEADERS frame to have END_STREAM=true set on it), then call abort. User code would be awaited.

  3. Same as 2.

private Timer DeadlineTimer = new System.Threading.Timer(DeadlineExceeded, DeadlineDuration);

public async Task DeadlineExceeded()
{
    lock (DeadlineLock) // SemaphoreSlim?
    {
        // Set DeadlineExpired status
        HttpContext.Response.ConsolidateTrailers(this);

        // Send remaining content, END_STREAM=true. No further content can be written
        // Returns once content and trailers have been sent
        await HttpContext.Feature.Get<ICompleteThingFeature>().CompleteAsync();

        // Send RST_STREAM (NO_ERROR)
        HttpContext.Feature.Get<IResetThingFeature>().ResetHttp2Stream(Http2ErrorCode.NoError);    
    }
}

@Tratcher
Copy link
Member

Tratcher commented Jun 8, 2019

Something else that came up when James and I discussed this is that the GRPC handler will manage the synchronization such that the timeout code path and the normal response code path are never trying to write headers, trailers or body at the same time. These are all discrete operations that happen within the handler and not out in user code.

@Tratcher
Copy link
Member

Tratcher commented Jun 10, 2019

Note: CompleteAsync isn't HTTP/2 specific, it could equally apply for HTTP/1.1 chunked responses with or without trailers.

  • For Content-Length responses it would throw if the exact content-length was not already sent. (HTTP/1 and 2)
  • Assuming we provide an HttpResponse extension method wrapper for this feature, what should it do if the server does not implement CompleteAsync? FlushAsync makes a poor fallback. Throw NotSupportedException?

What do the GRPC components do in general if run on a server other than Kestrel?

@davidfowl
Copy link
Member

Assuming we provide an HttpResponse extension method wrapper for this feature, what should it do if the server does not implement CompleteAsync? FlushAsync makes a poor fallback. Throw NotSupportedException?

That question makes it unfit to go onto HttpResponse.

@Tratcher
Copy link
Member

That question makes it unfit to go onto HttpResponse.

Perhaps, it depends on the answer.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 10, 2019

What do the GRPC components do in general if run on a server other than Kestrel?

Functional tests run in TestServer. When gRPC uses an API that isn't properly supported in TestServer, e.g. IResponseTrailersFeature, then I improve TestServer. I'll do the same with the new feature that comes out of this issue.

@halter73
Copy link
Member

Assuming we provide an HttpResponse extension method wrapper for this feature, what should it do if the server does not implement CompleteAsync? FlushAsync makes a poor fallback. Throw NotSupportedException?

That question makes it unfit to go onto HttpResponse.

You could fall back to an HttpResponse.CompleteAsync() that just calls FlushAsync() and then Complete() on the response body PipeWriter. It would still result in unnecessary empty END_STREAM frames, but it's not exactly wrong.

But what if instead we added an API similar to the proposed AbortHttp2Stream, but instead of aborting the stream immediately, it buffered the RST frame after the response body and trailers if any?

@Tratcher
Copy link
Member

@JamesNK I was referring more to IIS or HttpSys. Does gRPC fail or fall back gracefully to other mechanisms?

@JamesNK
Copy link
Member Author

JamesNK commented Jun 10, 2019

Do they support HTTP/2 and trailers today? We haven't tested much outside of Kestrel/TestServer. I don't know.

For this feature, what I could do is check to see whether the feature is in HttpContext.Features. If it is then CompleteAsync/AbortHttp2Stream could be used. If it is not then the gRPC would fallback to HttpContext.Abort.

@Tratcher
Copy link
Member

@JamesNK they do support HTTP/2 but not trailers.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 10, 2019

gRPC would fail on IIS when it calls HttpResponse.AppendTrailer (unless IIS has a dummy IResponseTrailersFeature that no-ops). That's fine. gRPC doesn't work without trailers.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 11, 2019

What is the next step here? I can organize a meeting to come up with a design, or you could guys could informally talk about it and come up with one that I double check.

Up to you 🤷‍♂

@davidfowl
Copy link
Member

@JamesNK setup a design meeting so we can flesh this out this week?

@Tratcher Tratcher self-assigned this Jun 12, 2019
@halter73
Copy link
Member

IHttp2ResetFeature.Reset(int errorCode)

What the int errorCode parameter means is specified by the HTTP/2 spec. I much prefer keeping "Http2" in the feature name.

@Tratcher
Copy link
Member

@halter73 except it's an open set, you could send custom error codes, or quick error codes. It's only HTTP/2 specific if we provide an enum with the spec values.

@halter73
Copy link
Member

I guess the question is then do we expect to use this feature for any other protocol that happens to support integer error codes in some sort of reset frame/message. I don't find this super likely, and I like the somewhat self-documenting nature of including "Http2" in the feature interface name.

Just removing it from the HTTP/1 feature collection doesn't really help. It doesn't affect the discoverability of the feature at all unless you happen to be looking at the feature collection in a debugger, so I think having a name indicating this is a feature for HTTP/2 will help avoid confusion.

@Tratcher
Copy link
Member

CompleteAsync is turning out much easier than I was expecting. I should have a PR by the end of the day. One side effect that we did not discuss: RequestAborted will no longer fire after calling CompleteAsync.

This logic was already in place for things like ContentLength responses so you wouldn't get aborted for graceful disconnects after you've sent the full response. CompleteAsync is another way of saying you've sent the whole response.

This conflicts with the gRPC scenario that wants to complete the response and then abort. Abort will still send a reset to the client, but RequestAborted does not fire. Any ongoing request body reads fail with OperationCancelledException.

Option A) Do nothing in Kestrel. gRPC should add their own chained token to notify the application when gRPC is aborting the response. This would integrate reasonably well with the timeout.
Option B) Keep the cancellation token alive until the end of the Request AND Response bodies. This will be complicated and still might not be enough for gRPC in some modes.

@halter73
Copy link
Member

Option A) Do nothing in Kestrel. gRPC should add their own chained token to notify the application when gRPC is aborting the response. This would integrate reasonably well with the timeout.
Option B) Keep the cancellation token alive until the end of the Request AND Response bodies. This will be complicated and still might not be enough for gRPC in some modes.

I also think I prefer option A. After all, from Kestrel's perspective, if you call CompleteAsync() was the request really aborted? I agree this shouldn't be treated differently than writing out the full content-length response.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 13, 2019

With option A there will be a performance penalty in gRPC because we'll have to create our own CTS rather than reusing Kestrel's.

I don't think this is a big deal. It is pay-for-play: it will only affect perf when using a deadline.

@Tratcher
Copy link
Member

You already have the overhead of a timer, you could replace that with the CTS and likely not add any additional overhead.

@JamesNK
Copy link
Member Author

JamesNK commented Jun 13, 2019

Maybe. We need to be able to distinguish between the request being aborted and deadline being exceeded. Might be able to do that with a single CTS - I'll investigate when switching to use CompleteAsync.

@Tratcher
Copy link
Member

Picking up from #10886 (comment)

Looking closer at QUIC, it defines the RST_STREAM frame, but not the individual error code values.

the management of application error codes are left to application protocols.

The actual error codes are defined in the HTTP/3 spec.
https://tools.ietf.org/html/draft-ietf-quic-http-20#section-8.1

In this case a more generic reset makes sense, the values are disconnected from the functionality.

@halter73
Copy link
Member

If you think it's generic enough, I'm fine with IHttpResetFeature instead of IHttp2ResetFeature

@analogrelay
Copy link
Contributor

Is this done now that #11193 and #11300 are merged?

@Tratcher
Copy link
Member

I want to add the same features to TestServer quick.

@Tratcher
Copy link
Member

Closing this out. The TestServer work is going to be more involved than I'd thought.

@Tratcher Tratcher added the Done This issue has been fixed label Jun 19, 2019
@JamesNK
Copy link
Member Author

JamesNK commented Jun 19, 2019

Is there an issue for test server implementation?

@Tratcher
Copy link
Member

Tratcher commented Jul 1, 2019

@JamesNK #11598

I'm marking this as Accepted based on gRPC's consumption.

@Tratcher Tratcher added the accepted This issue has completed "acceptance" testing (including accessibility) label Jul 1, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted This issue has completed "acceptance" testing (including accessibility) area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Done This issue has been fixed feature-kestrel HTTP2
Projects
None yet
Development

No branches or pull requests

6 participants