-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
In #35330 and #35800 we have changed the crucial part of Sockets implementation on Linux which allowed us to get some really nice gains in the vast majority of the Benchmarks and hardware configurations.
Today I've looked into the AMD numbers (the AMD machine was off when we were working on recent changes) and it looks like overall 5.0 is much faster than 3.1 but the recent changes have slightly decreased the performance:
I've quickly profiled it by running the following command:
dotnet run -- --server http://$secret1 --client $secret2 --connections 512 --jobs ..\BenchmarksApps\Kestrel\PlatformBenchmarks\benchmarks.json.json --scenario JsonPlatform --sdk 5.0.100-preview.5.20264.2 --runtime 5.0.0-preview.6.20262.14 --collect-trace15% of the total exclusive CPU time is spent in two ConcurrentQueue methods:
This machine has 46 cores. Even if I set the number of epoll threads to the old value, we never get 100% CPU utilization. In fact even before our recent changes we never did. I've even run Netty benchmarks and it's also struggling - it's consuming only 39% of CPU.
On an Intel machine with 56 cores we spent a similar amount of time in these two methods:
But despite that, the recent changes have almost doubled the throughput on this machine.
On 28 core Intel machine it's less than 3% in total:
I believe that this phenomenon requires further investigation.
Perhaps we should give a single-producer-multiple-consumer concurrent queue a try (a suggestion from @stephentoub from a while ago)?



