Skip to content

ConcurrentQueue is HOT in TechEmpower profiles for machines with MANY cores #36447

@adamsitnik

Description

@adamsitnik

In #35330 and #35800 we have changed the crucial part of Sockets implementation on Linux which allowed us to get some really nice gains in the vast majority of the Benchmarks and hardware configurations.

Today I've looked into the AMD numbers (the AMD machine was off when we were working on recent changes) and it looks like overall 5.0 is much faster than 3.1 but the recent changes have slightly decreased the performance:

obraz

I've quickly profiled it by running the following command:

dotnet run -- --server http://$secret1 --client $secret2 --connections 512 --jobs ..\BenchmarksApps\Kestrel\PlatformBenchmarks\benchmarks.json.json --scenario JsonPlatform  --sdk 5.0.100-preview.5.20264.2 --runtime  5.0.0-preview.6.20262.14  --collect-trace

15% of the total exclusive CPU time is spent in two ConcurrentQueue methods:

obraz

This machine has 46 cores. Even if I set the number of epoll threads to the old value, we never get 100% CPU utilization. In fact even before our recent changes we never did. I've even run Netty benchmarks and it's also struggling - it's consuming only 39% of CPU.

On an Intel machine with 56 cores we spent a similar amount of time in these two methods:

obraz

But despite that, the recent changes have almost doubled the throughput on this machine.

On 28 core Intel machine it's less than 3% in total:

obraz

I believe that this phenomenon requires further investigation.

Perhaps we should give a single-producer-multiple-consumer concurrent queue a try (a suggestion from @stephentoub from a while ago)?

cc @stephentoub @kouvel @tannergooding @tmds @benaadams

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions