-
Notifications
You must be signed in to change notification settings - Fork 10.4k
Start pooling Http2Streams #18601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start pooling Http2Streams #18601
Conversation
As part of this can you share a profile of allocations in this Http2 scenario, before and after? I just want to make sure we're focusing on the largest objects. |
@@ -67,6 +67,13 @@ internal partial class Http2Connection : IHttp2StreamLifetimeHandler, IHttpHeade | |||
private int _gracefulCloseInitiator; | |||
private int _isClosed; | |||
|
|||
private readonly Http2Stream[] _streamPool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a Queue or Stack so you don't have to do your own bookkeeping? It could even be a concurrent one which have better locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConcurrentQueueSegment<T>
is the ideal one, removing everything from of it that's not needed for TryDequeue
and TryEnqueue
.
Size must be a power of two; however is ideal for object pooling.
Alas it never was turned into a BoundedConcurrentQueue api
Can you clarify this more? Are you asking for a microbenchmark of a single request and the allocations before and after? Are you asking for a profile specifically on the benchmark server (I'm not sure how to get a profile there). It seems fairly clear to me that the Http2Streams take up a large portion of allocations. By pooling them, we are saving around 30% on allocations per second (2,760,162,909 -> 1,862,939,990). |
Each Http2Stream allocates 768 bytes per stream itself and retains 5352 bytes. The rest of the ~4500 bytes are coming from Http2OutputProducer, KestrelServerOptions, HttpRequestHeaders, HttpResponseHeaders, and the RequestBodyPipe. Other areas where we could see improvement with stream allocations include pooling Http2OutputProducers. HttpRequestHeaders/HttpResponseHeaders are resettable, so we shouoldn't need to do anything there. KestrelServerOptions isn't a concern. RequestBodyPipe could be reset instead of allocated each request as well? |
I just realized I had a mistake that was causing HttpRequestHeaders not to be pooled. Now they are; updated numbers:
Continuing to play around with other things that can be pooled. |
That's a lot of test failures. Any chance they're related? |
There are two that are definitely related that I can repro locally. Jasmine ones I doubt are related. I'm still researching more about which things we should be caching. I'll eventually update this PR with feedback. |
{ | ||
_context = context; | ||
|
||
ServerOptions = ServiceContext.ServerOptions; | ||
HttpRequestHeaders = new HttpRequestHeaders(reuseHeaderValues: !ServerOptions.DisableStringReuse); | ||
HttpRequestHeaders.ReuseHeaderValues = !ServerOptions.DisableStringReuse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServerOptions.DisableStringReuse can't change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't certain, hence I did this. It comes from KestrelServerOptions which I believe can't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KestrelServerOptions can change, users have access to it, we just never told anybody what would happen if they did so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can update Endpoints
and Certificates
? Spicy 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try 😁. Let us know how that works out for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, HttpSys does support live endpoint and cert changes. Kestrel could too but it would be a lot of work.
@aspnet-hello benchmark http2 |
Starting 'http2' pipelined plaintext benchmark with session ID 'a5af887470b54a629133311f92d90016'. This could take up to 30 minutes... |
Baseline
PR
|
I think I regressed something with my latest change. Investigating locally 😄 . I will continue to use this though! |
It's also certainly possible that we need to tune the parameters or increase the VM size for the benchmark machines. The relevant benchmark changes are in aspnet/Benchmarks@555dae1. |
@aspnet-hello benchmark http2 |
3 similar comments
@aspnet-hello benchmark http2 |
@aspnet-hello benchmark http2 |
@aspnet-hello benchmark http2 |
@halter73 did you hard code your name 😄 |
I'm still seeing good perf numbers:
After:
|
Starting 'grpc' pipelined plaintext benchmark with session ID '9b3eec27e6ef462eae7072dbf3bb0722'. This could take up to 30 minutes... |
Baseline
PR
|
Holding this PR for a bit. For some reason, I'm seeing a ton of exceptions when running the grpc benchmarks. |
1ee4da5
to
ecfcc16
Compare
There are a couple of gRPC benchmarks that would be useful to run:
Streaming benchmarks shouldn't see a difference. They start a connection stream and leave it open for the entire benchmark. There are also some gRPC benchmarks that send 1MB messages, but the relatively small stream allocation savings wouldn't change those. |
gRPC features to test still work:
|
@JamesNK could you help me run these benchmarks on CI? I don't know what parameters I should be passing in for everything. |
I don't know what is involved with running benchmarks on CI. They are configured here in aspnet/benchmarks: https://github.com/aspnet/Benchmarks/blob/7cf6b39a6472df1108a95429834b25cfd5937e8e/build/scenarios-grpc.sh Scenarios:
Change h2c to h2 for TLS. Does that help? |
Perf results from gRPC: GrpcUnary-h2load
After:
GrpcServerStreaming-GrpcNetClient Before:
After:
Ping pong Before:
After:
|
Overall, not a lot of changes, but the allocations/second have reduced significantly in GrpcUnary-h2load and Ping pong as expected. |
@aspnet-hello benchmark http2 |
Starting 'http2' pipelined plaintext benchmark with session ID 'acef10e275b0431c92cf668ce6b6755f'. This could take up to 30 minutes... |
Baseline
PR
|
Overall, seems like the grpc scenarios won't significantly improve and/or stay the same, but the http2 scenario seems to improve a good amount. Is this good to merge? |
I wouldn't have expected ping pong to change. It is using the same stream for the whole benchmark. Perhaps allocations/s is slightly lower just because RPS is slightly lower in this run. I think you posted the same before and after for server streaming. The results are exactly the same. |
Updated. |
Fixes #6192, however, I'd like to discuss the tradeoffs here with this change before adding tests and doing some cleanup work.
The perf generally seems to have improved a slight amount. I've run it multiple times and here are the results I'm seeing.
Before:
After:
As you can see, the Allocation Rate has dropped by around 30%. However, there is a lot more idle memory per connection. The GC Generations aren't exactly indicative of the amount of memory, as GCs can happen at any time, however, there is a lot more memory in Gen 1 and Gen 2, presumably due to the pooled streams not being GC'd. cc @anurse @davidfowl @Tratcher @halter73 for your thoughts 😄
I'm curious to see what other people think about pooling strategies to reduce idle memory. Few things I have thought of:
There are also some other wins we could get by pooling other objects; I'll explore them as well.