Skip to content

Proxy scenario perf regression #9404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sebastienros opened this issue Apr 15, 2019 · 29 comments · Fixed by dotnet/coreclr#24099
Closed

Proxy scenario perf regression #9404

sebastienros opened this issue Apr 15, 2019 · 29 comments · Fixed by dotnet/coreclr#24099

Comments

@sebastienros
Copy link
Member

sebastienros commented Apr 15, 2019

Actually in CoreFx, but opening in aspnet until I get more information

image

Updated

3.0.0-preview-27408-1 -> 3.0.0-preview-27409-1

Microsoft.NetCore.App / Core FX
dotnet/corefx@c2bbb6a...e1c0e1d

Microsoft.NetCore.App / Core CLR
dotnet/coreclr@a51af08...4e5df11

Command lines:

dotnet run -- --server http://10.10.10.2:5001 --client http://10.10.10.4:5002 `
-j ..\Benchmarks\benchmarks.httpclient.json -n HttpClientFactory --sdk Latest --self-contained --aspnetcoreversion 3.0.0-preview3-19116-01 --runtimeversion 3.0.0-preview-27408-1

dotnet run -- --server http://10.10.10.2:5001 --client http://10.10.10.4:5002 `
-j ..\Benchmarks\benchmarks.httpclient.json -n HttpClientFactory --sdk Latest --self-contained --aspnetcoreversion 3.0.0-preview3-19116-01 --runtimeversion 3.0.0-preview-27409-1
@benaadams
Copy link
Member

Looks like two drops?

image

@sebastienros
Copy link
Member Author

The first drop corresponds to when I changed the benchmarks to actually flow the headers.

@sebastienros
Copy link
Member Author

@stephentoub this benchmark is mostly about HttpClient under load, no SSL. Do you see anything in the changes I linked to that could explain the regression?

@stephentoub
Copy link
Member

Do you see anything in the changes I linked to that could explain the regression?

Nothing jumps out at me.

@sebastienros
Copy link
Member Author

I will investigate then.

@sebastienros
Copy link
Member Author

After investigating, it looks like a regression between

3.0.0-preview-27408-1 -> 3.0.0-preview-27408-3

Microsoft.NetCore.App / Core FX
dotnet/corefx@c2bbb6a...ede577c

Microsoft.NetCore.App / Core CLR
dotnet/coreclr@a51af08...4e5df11

Then I fetched both runtimes and tried to find which changed assemblies might cause this, and I can recover the previous performance by using System.Globalization.Native.so from the previous runtime.

Looking at the diff, there is this file that has changed:

src/corefx/System.Globalization.Native/pal_collation.c

Something I need to mention also is that after this runtime version is taken into account only this benchmarks is changed. Everything seems stable.

Here is the code that the benchmark is running:

https://github.com/aspnet/Benchmarks/blob/711f2b81cbd5dc1c2022cae0606e76a7eb79edc1/src/Proxy/Program.cs#L134-L144

@sebastienros
Copy link
Member Author

sebastienros commented Apr 18, 2019

/cc @filipnavara @janvorli as it seems to come from this commit: https://github.com/dotnet/coreclr/pull/22390/files

I'd say that it's not because of this PR dotnet/coreclr@34d50b0 though they are pretty close but the diff starts on Feb 04. To be sure I could run a benchmark if someone could give me the System.Gloablization.Native.so that only reverts this commit.

@filipnavara
Copy link
Member

filipnavara commented Apr 18, 2019

Are you absolutely positive about those commit lists? I'd expect dotnet/coreclr#22378 to be the primary suspect if System.Globalization.Native is involved. It's just one commit prior to the lists you linked. The change in https://github.com/dotnet/coreclr/pull/22390/files should be harmless unless the C runtime glibc library is significantly broken.

I'll try to reproduce the results on the weekend, but if you find anything out please share it here.

@filipnavara
Copy link
Member

I'm going through possible regressions from dotnet/coreclr#22378.

This line is missing customRules->size == 0 condition. Some callocs could be replaced with mallocs and avoid zeroing memory. There's potential for some performance regression in some of the *SortHandle functions, but not sure if it's big enough to account for this. I will submit patch to address the obvious things.

@janvorli
Copy link
Member

I have applied a delta to the current master that undoes the change made in dotnet/coreclr#22390 and shared the System.Globalization.Native.so with @sebastienros. He has verified that it fixes the perf regression.

@sebastienros
Copy link
Member Author

@filipnavara if you are not sure whether a change is beneficial or not, just build many versions and I can run the numbers for each build. Put them on //scratch2.

@filipnavara
Copy link
Member

@janvorli Wow. That's very surprising. I'll look at it again. The change itself doesn't fix any bug and it should have generally improved performance which makes it surprising that quite the opposite happened. If I don't come up with a good explanation I'll submit a PR to revert it.

@sebastienros
Copy link
Member Author

I am quite confident with the way I ran it, but there is always space for a human error (me). To be very confident feel free to give me versions with obvious regressions, even based on the reverted commit, to be sure I get the expected results. That would also make me fell more confident.

Something I want to insist on also is that I just patch "this" file, no other one, and I am not sure if one or the other versions could actually trigger some behavior coming from a separate module. I can also share the full standalone application if you want to check which bits are measured. But you will see that only this file will have changed in the end.

@filipnavara
Copy link
Member

I don't doubt your findings. It must be some weird thing with the way the tsearch function is used. I'll try to produce a more localized test to see if I can reproduce the behaviour. I'd actually be happy if it turns out to be caused by this small change since the one preceding it was huge and much harder to check for any regressions.

@filipnavara
Copy link
Member

I'm getting a suspicion that the problem is actually the changed structure layout in that commit that causes the pthread mutex to be aligned differently. However that's kinda hard to measure.

@janvorli Would it be possible for you to make a build that swaps these two lines: https://github.com/dotnet/coreclr/blob/42b5509b1b8f3be12876579aa23a3cb3ed2ca7a4/src/corefx/System.Globalization.Native/pal_collation.c#L45-L46 ? My only Linux build machines are in the cloud and it would take me a while to set it up properly.

@janvorli
Copy link
Member

Sure, I will do that.

@janvorli
Copy link
Member

@filipnavara that change had no effect.

@filipnavara
Copy link
Member

Thanks for testing it. I will submit a PR to revert the change and continue investigating on my Linux machine to see if it's anything I can reproduce.

@filipnavara
Copy link
Member

I reduced it to the following test case in plain C: https://gist.github.com/filipnavara/31791b216078706e0b465f7a103ff01f

On macOS using tsearch is a notch faster than tfind, but it's close to a measurement error for a sparse distribution. On Ubuntu (Linux mono-linux-build 4.15.0-1019-azure #19-Ubuntu SMP Thu Jul 26 19:54:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux) running in Azure tsearch is consistently around 15%-18% slower.

@sebastienros
Copy link
Member Author

How do you run measure the differences? I ran the code but it simply output 12.

@filipnavara
Copy link
Member

filipnavara commented Apr 18, 2019

@sebastienros by running it under time command.

gcc tsearch.c -o tsearch
time ./tsearch
gcc tfind.c -o tfind
time ./tfind

@sebastienros
Copy link
Member Author

on my physical machines I get this:

Cores tsearch tfind
6/12 HT 0.900s 0.750s
14/28 HT 1.113s 0.927s

@benaadams
Copy link
Member

A side question; if the regression is in Globalization; how is HttpClient using globalization? Wouldn't have thought it would be involved; though I may be wrong?

i.e isn't it all mostly ascii formats and fairly invariant?

@stephentoub
Copy link
Member

stephentoub commented Apr 19, 2019

It's not just HttpClient involved here... isn't this an ASP.NET site that in turn makes HttpClient requests (or HttpClient requests to an ASP.NET site)?

@benaadams
Copy link
Member

Yeah; but it doesn't do anything particularly interesting, except this line...

new Uri(UriHelper.BuildAbsolute(_scheme, _host, _pathBase, context.Request.Path, context.Request.QueryString.Add(_appendQuery)))

Checking UriHelper its doing a bunch of string.IndexOf(string) which is CurrentCulture, so it might be that...

@benaadams
Copy link
Member

Made a PR for that class #9537

@benaadams
Copy link
Member

Looks like its back to normal

image

@filipnavara
Copy link
Member

filipnavara commented Apr 21, 2019

Thanks for checking (and fixing the incorrect usage of CurrentCulture)!

@sebastienros
Copy link
Member Author

Fixed, thanks @filipnavara @benaadams @janvorli

And as a side note, we are now measuring on windows with the same hardware, and it's more than 200K RPS where Linux is at 80K. For some comparison, on the same proxy scenario HAProxy is at 110K RPS on linux with the same conditions. And we keep a 99th latency of 6ms.

@NickCraver maybe something interesting for you here? Providing my numbers are correct, there is always room for errors.

@sebastienros sebastienros added this to the 3.0.0-preview5 milestone Apr 22, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants