Skip to content

Merge HTTP/2 and HTTP/3 request cookies on Kestrel #41591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Daniel-Genkin-MS-2
Copy link
Contributor

Merge HTTP/2 and HTTP/3 request cookies on Kestrel

Description

Added a check to squash request cookies into a single cookie string delimited by ; as per the HTTP/2 and HTTP/3 specs. I also added unit tests to make sure we don't regress this in the future.

Fixes: #26461

Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Just some comments and style cleanup now.

@Tratcher Tratcher requested a review from davidfowl May 9, 2022 18:42
Copy link
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a performance run before merge

@Daniel-Genkin-MS-2
Copy link
Contributor Author

Daniel-Genkin-MS-2 commented May 10, 2022

Hi all, following @halter73 suggestion, I unified the 2 implementations by re-implementing it in the HttpRequestHeaders.Generated.cs file. However, Brennan brought up a good point that string.Join is potentially inefficient as it might do a ton of redundant allocations in the backend. Is this ok to merge or is there some better way to do this? I looked into StringBuilder as it lets you set capacity and avoid rescaling of memory, but that will involve looping over cookies and summing their lengths. Is this a good performance tradeoff?

@BrennanConroy
Copy link
Member

string.Join is potentially inefficient as it might do a ton of redundant allocations in the backend

Looking at the impl, they look like they're doing the smart thing and allocating a single string and copying the values into that new string.

@Daniel-Genkin-MS-2
Copy link
Contributor Author

Needs a performance run before merge

Command run:

crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/grpc.benchmarks.yml --scenario grpcaspnetcoreserver-h2loadclient --profile intel-lin-app --profile intel-load-load --variable streams=70 --variable connections=1 --variable threads=1 --variable protocol=h2c --variable body=AAAAAAcKBVdvcmxk --variable path=/grpc.testing.BenchmarkService/UnaryCall --application.framework net7.0 --application.options.collectCounters true --application.aspNetCoreVersion 7.0.0-preview.5.22259.9 --application.runtimeVersion 7.0.0-preview.5.22259.4 --application.sdkVersion 7.0.100-preview.5.22258.1

Before my changes:

application
CPU Usage (%) 53
Cores usage (%) 1,497
Working Set (MB) 569
Private Memory (MB) 940
Build Time (ms) 12,546
Start Time (ms) 256
Published Size (KB) 91,469
.NET Core SDK Version 7.0.100-preview.5.22258.1
ASP.NET Core Version 7.0.0-preview.5.22259.9+52c11c3
.NET Runtime Version 7.0.0-preview.5.22259.4+0533792
Max CPU Usage (%) 53
Max Working Set (MB) 596
Max GC Heap Size (MB) 284
Size of committed memory by the GC (MB) 438
Max Number of Gen 0 GCs / sec 2.00
Max Number of Gen 1 GCs / sec 1.00
Max Number of Gen 2 GCs / sec 0.00
Max Time in GC (%) 0.00
Max Gen 0 Size (B) 528
Max Gen 1 Size (B) 1,455,200
Max Gen 2 Size (B) 1,398,624
Max LOH Size (B) 92,640
Max POH Size (B) 397,936
Max Allocation Rate (B/sec) 367,480,696
Max GC Heap Fragmentation 1
# of Assemblies Loaded 111
Max Exceptions (#/s) 0
Max Lock Contention (#/s) 164
Max ThreadPool Threads Count 29
Max ThreadPool Queue Length 2
Max ThreadPool Items (#/s) 481,125
Max Active Timers 1
IL Jitted (B) 227,763
Methods Jitted 2,702
load
CPU Usage (%) 3
Cores usage (%) 85
Working Set (MB) 46
Private Memory (MB) 129
Build Time (ms) 9,155
Start Time (ms) 87
Published Size (KB) 74,013
.NET Core SDK Version 5.0.408
ASP.NET Core Version 5.0.17+02c6de4
.NET Runtime Version 5.0.17+6a98414
Requests 2,632,894
Bad responses 0
Socket errors 0
Mean latency (ms) 0.39
Max latency (ms) 6.73
Max RPS 175,526

After my changes

application
CPU Usage (%) 47
Cores usage (%) 1,329
Working Set (MB) 556
Private Memory (MB) 929
Build Time (ms) 6,312
Start Time (ms) 268
Published Size (KB) 91,469
.NET Core SDK Version 7.0.100-preview.5.22258.1
ASP.NET Core Version 7.0.0-preview.5.22259.9+52c11c3
.NET Runtime Version 7.0.0-preview.5.22259.4+0533792
Max CPU Usage (%) 47
Max Working Set (MB) 583
Max GC Heap Size (MB) 267
Size of committed memory by the GC (MB) 417
Max Number of Gen 0 GCs / sec 2.00
Max Number of Gen 1 GCs / sec 1.00
Max Number of Gen 2 GCs / sec 0.00
Max Time in GC (%) 0.00
Max Gen 0 Size (B) 528
Max Gen 1 Size (B) 1,402,768
Max Gen 2 Size (B) 1,351,496
Max LOH Size (B) 191,024
Max POH Size (B) 397,936
Max Allocation Rate (B/sec) 355,749,088
Max GC Heap Fragmentation 1
# of Assemblies Loaded 111
Max Exceptions (#/s) 0
Max Lock Contention (#/s) 452
Max ThreadPool Threads Count 29
Max ThreadPool Queue Length 58
Max ThreadPool Items (#/s) 466,486
Max Active Timers 1
IL Jitted (B) 302,416
Methods Jitted 3,814
load
CPU Usage (%) 3
Cores usage (%) 84
Working Set (MB) 46
Private Memory (MB) 129
Start Time (ms) 83
Requests 2,563,147
Bad responses 0
Socket errors 0
Mean latency (ms) 0.40
Max latency (ms) 1.82
Max RPS 170,876

I did some crank testing (results are above). So, there is definitely a slow down, so I will try the _bits idea that Stephen brought up.

@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 requested review from a team, dougbu, wtgodbe and Pilchie as code owners May 10, 2022 21:59
@Daniel-Genkin-MS-2
Copy link
Contributor Author

Sorry, messed up the rebase

@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 removed the request for review from wtgodbe May 10, 2022 22:08
@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 force-pushed the t-dagenkin/Squash-Cookies branch from e377ffb to 0b6b561 Compare May 10, 2022 22:15
@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 removed request for a team, dougbu and Pilchie May 10, 2022 22:16
@Daniel-Genkin-MS-2
Copy link
Contributor Author

Good news! Looks like crank was just showing noise (see below). Does anybody have any other suggestions before I merge this?

Running the Http2ConnectionBenchmark tests with my change

Method ResponseDataLength numCookies Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
MakeRequest 0 0 6.068 us 0.0598 us 0.0559 us 164,812.1 - - - 409 B
MakeRequest 0 1 6.256 us 0.0703 us 0.0658 us 159,856.7 - - - 409 B
MakeRequest 0 3 6.086 us 0.0758 us 0.0592 us 164,323.2 - - - 408 B

Before the change

Method ResponseDataLength numCookies Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
MakeRequest 0 0 6.105 us 0.0553 us 0.0462 us 163,806.8 - - - 409 B
MakeRequest 0 1 6.264 us 0.1229 us 0.2248 us 159,652.3 - - - 408 B
MakeRequest 0 3 6.026 us 0.1144 us 0.1014 us 165,952.6 - - - 408 B

System config

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19043.1706 (21H1/May2021Update)
11th Gen Intel Core i9-11950H 2.60GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100-preview.5.22258.1
[Host] : .NET 7.0.0 (7.0.22.25907), X64 RyuJIT
Job-VVBVWH : .NET 7.0.0 (7.0.22.25515), X64 RyuJIT

Server=True Toolchain=.NET Core 7.0 RunStrategy=Throughput

@Daniel-Genkin-MS-2
Copy link
Contributor Author

Daniel-Genkin-MS-2 commented May 12, 2022

While doing these Http2ConnectionBenchmark tests, I found an NRE and discovered that the test was broken. So, Stephen helped me fix it. I just pushed the commits to avoid these issues from affecting other people.

EDIT: Oh, and I added params to test cookies.

@halter73
Copy link
Member

halter73 commented May 12, 2022

How is "Allocated" the same before and after the change in the 3-cookie case at 408 bytes per operation? After the change, it has to allocate the extra array and string per request, right? I don't expect the header value caching would work in this case.

…citly convert to StringValues rather than via the cast
@Daniel-Genkin-MS-2
Copy link
Contributor Author

How is "Allocated" the same before and after the change in the 3-cookie case at 408 bytes per operation? After the change, it has to allocate the extra array and string per request, right? I don't expect the header value caching would work in this case.

I agree, this is quite strange. Is there a chance that this statistic is being rounded? Or, perhaps it is not measuring all allocations? I am currently re-running the benchmark to double check the output.

@Daniel-Genkin-MS-2
Copy link
Contributor Author

I did some testing and found some interesting results (see below). I think this boils down to inaccuracies in the test as, when I had the computer do a lot of other heavy work (install windows on virtualbox), seems like C# allocated more memory than during tests where the computer was not doing anything besides the benchmark. As you can see, in the test with my changes, the delta between 1 and 2 cookies is non-existent (and the value is 409B) but the delta between 2 and 3 cookies is huge. So, I think the reason is probably just the GC and other performance optimizations that I did not touch are creating noise. @halter73 what do you make of this? I ran a few other tests that were more fair but they seemed to have the same issue. I think these 2 just show the issue more nicely.

Without my changes (while running a heavy load simultaneously)

Method ResponseDataLength NumCookies Mean Error StdDev Median Op/s Gen 0 Gen 1 Gen 2 Allocated
MakeRequest 0 0 8.795 us 0.3810 us 1.1054 us 9.026 us 113,702.2 - - - 416 B
MakeRequest 0 1 9.803 us 0.1959 us 0.5683 us 9.839 us 102,007.2 - - - 411 B
MakeRequest 0 3 10.190 us 0.4419 us 1.2537 us 10.532 us 98,139.8 - - - 596 B

With my changes (with computer not doing anything else)

Method ResponseDataLength NumCookies Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
MakeRequest 0 0 6.375 us 0.0860 us 0.0805 us 156,857.1 - - - 409 B
MakeRequest 0 1 6.500 us 0.0727 us 0.0680 us 153,855.2 - - - 409 B
MakeRequest 0 3 6.826 us 0.0982 us 0.0870 us 146,503.7 - - - 641 B

@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 force-pushed the t-dagenkin/Squash-Cookies branch from d85deaa to 28e767d Compare May 12, 2022 20:32
@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 enabled auto-merge (squash) May 12, 2022 20:54
@halter73
Copy link
Member

I agree, this is quite strange. Is there a chance that this statistic is being rounded? Or, perhaps it is not measuring all allocations? I am currently re-running the benchmark to double check the output.

The allocation measurements are usually very consistent and accurate in my experience. I think the issue with the previous benchmark was calling the LINQ Append method wasn't mutating the StringValues, so all the tests were really identical. All of them were sending a single cookie. Once you updated the setup to assign the appended value back to cookies, the tests really started sending multiple cookies.

I did some testing and found some interesting results (see below). I think this boils down to inaccuracies in the test as, when I had the computer do a lot of other heavy work (install windows on virtualbox), seems like C# allocated more memory than during tests where the computer was not doing anything besides the benchmark. As you can see, in the test with my changes, the delta between 1 and 2 cookies is non-existent (and the value is 409B) but the delta between 2 and 3 cookies is huge. So, I think the reason is probably just the GC and other performance optimizations that I did not touch are creating noise. @halter73 what do you make of this?

In most benchmarks, this should not happen. You'll see identical allocations per operation every single run no matter what's going on with the rest of the environment. This benchmarks is a little unique in that it is allocating some on background threads, so the environment causing a small variation in allocations per operation is explainable. The difference between 409 bytes and 411 bytes or even 409 bytes and 416 bytes isn't huge.

The difference between 596 B and 641 B is more significant but completely expected since we're now allocating an additional array and string in this case. This is what I wanted to see before but wasn't because the GlobalSetup() wasn't properly adding additional cookie headers.

We really should try to keep the environment as similar as possible between runs though. It makes apples-to-apples comparisons easier.

@Daniel-Genkin-MS-2
Copy link
Contributor Author

Daniel-Genkin-MS-2 commented May 12, 2022

Append method wasn't mutating the StringValues, so all the tests were really identical. All of them were sending a single cookie. Once you updated the setup to assign the appended value back to cookies, the tests really started sending multiple cookies.

Ohhh, ok then everything makes sense. I didn't know that Append creates a new object instead of mutating the old one. Probably the fluctuations then that I saw during the other tests were then caused by some mistake that I made while building or running them instead of by the actual test. I'll make sure to keep this in mind for the future.

@halter73
Copy link
Member

I've learned the hard way to add a Thread.Sleep(10) in the critical part of the changed code when I start benchmarking. This helps me make sure the infrastructure is all set up correctly and I don't make mistakes like benchmarking the old version of the code when I meant to test the new version. If a Thread.Sleep(10) was added inside the if (HasCookie && _headers._Cookie.Count > 1) condition and the 3 NumCookies variation wasn't horrendously slow, you'd know something was wrong with the benchmark environment or setup.

@Daniel-Genkin-MS-2 Daniel-Genkin-MS-2 merged commit afa4860 into dotnet:main May 14, 2022
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Dec 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kestrel should merge HTTP/2 request Cookie headers
8 participants