-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
I've noticed that when a ConcurrentQueue instance has many enqueuers/dequeuers, there is a lot of extra time spent in SpinWait.SpinOnce. This seems to be because the SpinWait.SpinOnce call is passing the optional parameter sleep1Threshold: -1, which disables the call to Thread.Sleep that a thread would eventually call after spinning too long.
When I change the parameter from sleep1Threshold: -1 to sleep1Threshold: Thread.OptimalMaxSpinWaitsPerSpinIteration, I see a significant increase in performance on some of my machines in certain microbenchmark cases. I ran the benchmarks against local builds from the release/5.0-rc2 branch, with and without the change to the sleeping behavior. The microbenchmarks I ran against are from the dotnet/performance repository, and can be reproduced with:
sudo python3 ./script/benchmarks_ci.py -c Release -f netcoreapp5.0 --filter '*ConcurrentQueue*' --corerun $CORERUN_PATH --bdn-artifacts $BDN_ARTIFACTS_DIRI've also included BenchmarkDotNet results from the dotnet/performance Microbenchmarks to show the effect of the change below.
Configuration
Each machine is an x64 machine running Ubuntu 20.04. More information in the BenchmarkDotNet results.
Regression?
It looks like this was changed to improve performance back in .NET 3.0 based on this merge.
Data
Skylake
Base (SpinOnce(sleep1Threshold: -1))
------------------------------------
BenchmarkDotNet=v0.12.1.1405-nightly, OS=ubuntu 20.04
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=6.0.100-alpha.1.20528.4
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
Job-EQUTZA : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
Job-ISAALG : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable Toolchain=CoreRun
InvocationCount=1 IterationCount=100 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15
| Namespace | Type | Method | Job | MaxWarmupIterationCount | MinWarmupIterationCount | UnrollFactor | WarmupCount | Count | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |-------------------------------------- |---------------- |----------- |------------------------ |------------------------ |------------- |------------ |------ |-------- |------------------:|-----------------:|------------------:|------------------:|------------------:|------------------:|-------:|-------:|------:|----------:|
| System.Collections | CtorDefaultSize<Int32> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | ? | 72.31 ns | 0.056 ns | 0.164 ns | 72.24 ns | 72.08 ns | 72.75 ns | 0.1376 | - | - | 576 B |
| System.Collections | CtorDefaultSize<String> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | ? | 85.67 ns | 0.249 ns | 0.723 ns | 86.14 ns | 84.69 ns | 86.65 ns | 0.1988 | - | - | 832 B |
| System.Collections.Tests | Add_Remove_SteadyState<Int32> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | 512 | ? | 20.71 ns | 0.001 ns | 0.002 ns | 20.71 ns | 20.71 ns | 20.72 ns | - | - | - | - |
| System.Collections.Tests | Add_Remove_SteadyState<String> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | 512 | ? | 21.20 ns | 0.001 ns | 0.003 ns | 21.20 ns | 21.19 ns | 21.21 ns | - | - | - | - |
| System.Collections | CtorFromCollection<Int32> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | 512 | 7,447.65 ns | 1.621 ns | 4.755 ns | 7,448.25 ns | 7,437.28 ns | 7,460.69 ns | 1.0432 | - | - | 4448 B |
| System.Collections | CtorFromCollection<String> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | 512 | 8,138.52 ns | 1.695 ns | 4.809 ns | 8,139.33 ns | 8,124.25 ns | 8,148.08 ns | 2.0214 | 0.0978 | - | 8544 B |
| System.Collections | IterateForEach<Int32> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | 512 | 4,358.00 ns | 4.003 ns | 11.290 ns | 4,358.73 ns | 4,331.23 ns | 4,383.26 ns | - | - | - | 72 B |
| System.Collections | IterateForEach<String> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | 512 | 5,604.88 ns | 2.521 ns | 7.272 ns | 5,603.75 ns | 5,586.93 ns | 5,625.49 ns | - | - | - | 72 B |
| System.Collections | CreateAddAndClear<Int32> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | 512 | 7,811.29 ns | 1.145 ns | 3.190 ns | 7,811.23 ns | 7,799.31 ns | 7,819.26 ns | 2.3137 | - | - | 9792 B |
| System.Collections | CreateAddAndClear<String> | ConcurrentQueue | Job-EQUTZA | Default | Default | 16 | 1 | ? | 512 | 8,553.62 ns | 2.325 ns | 6.166 ns | 8,555.10 ns | 8,537.96 ns | 8,569.89 ns | 4.2855 | - | - | 17984 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<Int32> | ConcurrentQueue | Job-ISAALG | 10 | 6 | 1 | -1 | ? | 2000000 | 31,107,778.63 ns | 2,041,343.278 ns | 5,690,449.134 ns | 28,001,835.00 ns | 26,040,926.50 ns | 49,191,892.50 ns | - | - | - | 9168 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<String> | ConcurrentQueue | Job-ISAALG | 10 | 6 | 1 | -1 | ? | 2000000 | 30,865,184.17 ns | 1,633,527.591 ns | 4,444,132.284 ns | 28,404,088.50 ns | 27,454,580.00 ns | 44,963,492.00 ns | - | - | - | 526032 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<Int32> | ConcurrentQueue | Job-ISAALG | 10 | 6 | 1 | -1 | ? | 2000000 | 187,964,095.59 ns | 4,103,621.184 ns | 11,970,455.607 ns | 189,390,999.50 ns | 144,958,019.50 ns | 212,779,102.50 ns | - | - | - | 1192 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<String> | ConcurrentQueue | Job-ISAALG | 10 | 6 | 1 | -1 | ? | 2000000 | 180,518,506.05 ns | 4,279,085.954 ns | 12,616,981.480 ns | 181,038,632.00 ns | 141,594,659.00 ns | 205,936,025.00 ns | - | - | - | 4008 B |
Diff (SpinOnce(sleep1Threshold: Thread.OptimalMaxSpinWaitsPerSpinIteration))
----------------------------------------------------------------------------
BenchmarkDotNet=v0.12.1.1405-nightly, OS=ubuntu 20.04
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=6.0.100-alpha.1.20528.4
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
Job-MIAVZY : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
Job-SIKWHO : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable Toolchain=CoreRun
InvocationCount=1 IterationCount=100 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15
| Namespace | Type | Method | Job | MaxWarmupIterationCount | MinWarmupIterationCount | UnrollFactor | WarmupCount | Count | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |-------------------------------------- |---------------- |----------- |------------------------ |------------------------ |------------- |------------ |------ |-------- |-----------------:|-----------------:|-----------------:|-----------------:|-----------------:|-----------------:|-------:|-------:|------:|----------:|
| System.Collections | CtorDefaultSize<Int32> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | ? | 87.06 ns | 0.049 ns | 0.143 ns | 87.12 ns | 86.76 ns | 87.38 ns | 0.1395 | - | - | 584 B |
| System.Collections | CtorDefaultSize<String> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | ? | 101.25 ns | 0.017 ns | 0.043 ns | 101.25 ns | 101.16 ns | 101.42 ns | 0.2007 | - | - | 840 B |
| System.Collections.Tests | Add_Remove_SteadyState<Int32> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | 512 | ? | 21.28 ns | 0.001 ns | 0.004 ns | 21.28 ns | 21.27 ns | 21.29 ns | - | - | - | - |
| System.Collections.Tests | Add_Remove_SteadyState<String> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | 512 | ? | 21.27 ns | 0.001 ns | 0.002 ns | 21.27 ns | 21.26 ns | 21.27 ns | - | - | - | - |
| System.Collections | CtorFromCollection<Int32> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | 512 | 7,717.92 ns | 2.011 ns | 5.834 ns | 7,717.93 ns | 7,704.51 ns | 7,730.39 ns | 1.0483 | - | - | 4456 B |
| System.Collections | CtorFromCollection<String> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | 512 | 8,386.39 ns | 1.475 ns | 4.185 ns | 8,386.13 ns | 8,378.25 ns | 8,397.01 ns | 2.0161 | 0.1008 | - | 8552 B |
| System.Collections | IterateForEach<Int32> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | 512 | 4,867.26 ns | 1.113 ns | 3.229 ns | 4,867.12 ns | 4,862.70 ns | 4,876.69 ns | - | - | - | 72 B |
| System.Collections | IterateForEach<String> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | 512 | 5,658.29 ns | 2.533 ns | 7.469 ns | 5,656.20 ns | 5,647.76 ns | 5,677.85 ns | - | - | - | 72 B |
| System.Collections | CreateAddAndClear<Int32> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | 512 | 8,204.22 ns | 1.247 ns | 3.578 ns | 8,204.07 ns | 8,195.18 ns | 8,212.79 ns | 2.3306 | - | - | 9840 B |
| System.Collections | CreateAddAndClear<String> | ConcurrentQueue | Job-MIAVZY | Default | Default | 16 | 1 | ? | 512 | 8,603.39 ns | 2.441 ns | 6.641 ns | 8,601.63 ns | 8,596.32 ns | 8,626.47 ns | 4.3044 | - | - | 18032 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<Int32> | ConcurrentQueue | Job-SIKWHO | 10 | 6 | 1 | -1 | ? | 2000000 | 31,377,565.45 ns | 1,724,624.611 ns | 4,920,451.059 ns | 29,505,311.50 ns | 26,032,254.00 ns | 44,129,165.00 ns | - | - | - | 526880 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<String> | ConcurrentQueue | Job-SIKWHO | 10 | 6 | 1 | -1 | ? | 2000000 | 31,061,566.93 ns | 1,654,339.740 ns | 4,638,945.997 ns | 28,762,799.00 ns | 27,510,170.00 ns | 45,084,255.00 ns | - | - | - | 1050656 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<Int32> | ConcurrentQueue | Job-SIKWHO | 10 | 6 | 1 | -1 | ? | 2000000 | 83,496,964.09 ns | 415,818.700 ns | 1,166,000.215 ns | 83,076,257.00 ns | 81,850,532.00 ns | 86,766,334.00 ns | - | - | - | 424 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<String> | ConcurrentQueue | Job-SIKWHO | 10 | 6 | 1 | -1 | ? | 2000000 | 88,636,063.77 ns | 350,925.444 ns | 1,006,870.882 ns | 88,184,104.00 ns | 87,145,891.00 ns | 91,002,926.00 ns | - | - | - | 424 B |
Comparison
----------
summary:
better: 2, geomean: 2.148
worse: 3, geomean: 1.039
total diff: 5
| Slower | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------------------------------------ | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.CreateAddAndClear<Int32>.ConcurrentQueue(Size: 512) | 1.05 | 7804.02 | 8202.04 | |
| System.Collections.CtorFromCollection<Int32>.ConcurrentQueue(Size: 512) | 1.04 | 7442.15 | 7713.63 | |
| System.Collections.CtorFromCollection<String>.ConcurrentQueue(Size: 512) | 1.03 | 8136.48 | 8383.06 | |
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.Concurrent.AddRemoveFromSameThreads<Int32>.ConcurrentQueue(Si | 2.24 | 187936495.50 | 83897866.50 | |
| System.Collections.Concurrent.AddRemoveFromSameThreads<String>.ConcurrentQueue(S | 2.06 | 182276745.00 | 88472010.00 | |
Ryzen
Base (SpinOnce(sleep1Threshold: -1))
------------------------------------
BenchmarkDotNet=v0.12.1.1405-nightly, OS=ubuntu 20.04
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=6.0.100-alpha.1.20528.4
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
Job-YGFIWA : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
Job-JNBMSA : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable Toolchain=CoreRun
InvocationCount=1 IterationCount=100 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15
| Namespace | Type | Method | Job | MaxWarmupIterationCount | MinWarmupIterationCount | UnrollFactor | WarmupCount | Count | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |-------------------------------------- |---------------- |----------- |------------------------ |------------------------ |------------- |------------ |------ |-------- |------------------:|------------------:|-------------------:|------------------:|------------------:|------------------:|-------:|-------:|------:|----------:|
| System.Collections | CtorDefaultSize<Int32> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | ? | 85.04 ns | 0.543 ns | 1.577 ns | 85.18 ns | 82.04 ns | 88.88 ns | 0.0342 | - | - | 576 B |
| System.Collections | CtorDefaultSize<String> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | ? | 95.75 ns | 0.059 ns | 0.165 ns | 95.74 ns | 95.22 ns | 96.24 ns | 0.0496 | - | - | 832 B |
| System.Collections.Tests | Add_Remove_SteadyState<Int32> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | 512 | ? | 12.31 ns | 0.003 ns | 0.007 ns | 12.31 ns | 12.30 ns | 12.33 ns | - | - | - | - |
| System.Collections.Tests | Add_Remove_SteadyState<String> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | 512 | ? | 13.45 ns | 0.079 ns | 0.234 ns | 13.55 ns | 12.95 ns | 13.87 ns | - | - | - | - |
| System.Collections | CtorFromCollection<Int32> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | 512 | 5,151.24 ns | 1.689 ns | 4.764 ns | 5,149.44 ns | 5,144.51 ns | 5,165.32 ns | 0.2479 | - | - | 4448 B |
| System.Collections | CtorFromCollection<String> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | 512 | 5,992.60 ns | 6.465 ns | 18.340 ns | 5,986.52 ns | 5,972.87 ns | 6,039.16 ns | 0.5035 | 0.0240 | - | 8544 B |
| System.Collections | IterateForEach<Int32> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | 512 | 4,392.32 ns | 0.578 ns | 1.611 ns | 4,392.13 ns | 4,389.04 ns | 4,397.64 ns | - | - | - | 72 B |
| System.Collections | IterateForEach<String> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | 512 | 5,370.14 ns | 0.403 ns | 1.163 ns | 5,369.99 ns | 5,367.94 ns | 5,373.05 ns | - | - | - | 72 B |
| System.Collections | CreateAddAndClear<Int32> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | 512 | 5,042.72 ns | 2.979 ns | 8.401 ns | 5,040.23 ns | 5,028.93 ns | 5,066.91 ns | 0.5662 | - | - | 9792 B |
| System.Collections | CreateAddAndClear<String> | ConcurrentQueue | Job-YGFIWA | Default | Default | 16 | 1 | ? | 512 | 5,811.73 ns | 4.256 ns | 12.142 ns | 5,808.91 ns | 5,789.65 ns | 5,846.74 ns | 1.0692 | 0.0697 | - | 17984 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<Int32> | ConcurrentQueue | Job-JNBMSA | 10 | 6 | 1 | -1 | ? | 2000000 | 19,289,620.33 ns | 1,373,898.353 ns | 3,807,065.866 ns | 19,122,269.00 ns | 14,069,073.00 ns | 30,219,178.00 ns | - | - | - | 2100176 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<String> | ConcurrentQueue | Job-JNBMSA | 10 | 6 | 1 | -1 | ? | 2000000 | 23,454,660.90 ns | 781,060.146 ns | 2,071,260.706 ns | 23,092,573.50 ns | 18,199,132.00 ns | 30,285,417.00 ns | - | - | - | 33776 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<Int32> | ConcurrentQueue | Job-JNBMSA | 10 | 6 | 1 | -1 | ? | 2000000 | 567,072,512.67 ns | 64,097,635.471 ns | 188,993,324.379 ns | 659,905,224.50 ns | 63,575,588.00 ns | 755,952,744.00 ns | - | - | - | 9128 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<String> | ConcurrentQueue | Job-JNBMSA | 10 | 6 | 1 | -1 | ? | 2000000 | 595,239,541.73 ns | 54,895,214.567 ns | 161,859,778.713 ns | 685,948,586.00 ns | 134,833,576.50 ns | 783,141,323.50 ns | - | - | - | 16808 B |
Diff (SpinOnce(sleep1Threshold: Thread.OptimalMaxSpinWaitsPerSpinIteration))
----------------------------------------------------------------------------
BenchmarkDotNet=v0.12.1.1405-nightly, OS=ubuntu 20.04
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=6.0.100-alpha.1.20528.4
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.47505, CoreFX 5.0.20.47505), X64 RyuJIT
Job-AHLXGP : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
Job-OCAWOY : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 Arguments=/p:DebugType=portable Toolchain=CoreRun
InvocationCount=1 IterationCount=100 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15
| Namespace | Type | Method | Job | MaxWarmupIterationCount | MinWarmupIterationCount | UnrollFactor | WarmupCount | Count | Size | Mean | Error | StdDev | Median | Min | Max | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------ |-------------------------------------- |---------------- |----------- |------------------------ |------------------------ |------------- |------------ |------ |-------- |------------------:|------------------:|------------------:|------------------:|-----------------:|------------------:|-------:|-------:|------:|----------:|
| System.Collections | CtorDefaultSize<Int32> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | ? | 81.89 ns | 0.501 ns | 1.445 ns | 81.75 ns | 78.85 ns | 85.34 ns | 0.0347 | - | - | 584 B |
| System.Collections | CtorDefaultSize<String> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | ? | 98.97 ns | 0.164 ns | 0.470 ns | 98.98 ns | 98.20 ns | 100.32 ns | 0.0500 | - | - | 840 B |
| System.Collections.Tests | Add_Remove_SteadyState<Int32> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | 512 | ? | 12.36 ns | 0.011 ns | 0.032 ns | 12.35 ns | 12.32 ns | 12.42 ns | - | - | - | - |
| System.Collections.Tests | Add_Remove_SteadyState<String> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | 512 | ? | 13.69 ns | 0.035 ns | 0.102 ns | 13.69 ns | 13.38 ns | 13.95 ns | - | - | - | - |
| System.Collections | CtorFromCollection<Int32> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | 512 | 5,160.01 ns | 1.392 ns | 3.764 ns | 5,159.72 ns | 5,149.85 ns | 5,169.32 ns | 0.2483 | - | - | 4456 B |
| System.Collections | CtorFromCollection<String> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | 512 | 6,018.71 ns | 10.216 ns | 29.638 ns | 6,007.39 ns | 5,985.17 ns | 6,104.02 ns | 0.5048 | 0.0240 | - | 8552 B |
| System.Collections | IterateForEach<Int32> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | 512 | 4,390.67 ns | 0.744 ns | 2.097 ns | 4,389.89 ns | 4,387.79 ns | 4,397.29 ns | - | - | - | 72 B |
| System.Collections | IterateForEach<String> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | 512 | 5,369.40 ns | 0.349 ns | 0.979 ns | 5,369.37 ns | 5,367.52 ns | 5,372.03 ns | - | - | - | 72 B |
| System.Collections | CreateAddAndClear<Int32> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | 512 | 5,069.14 ns | 2.806 ns | 7.869 ns | 5,069.77 ns | 5,053.56 ns | 5,086.62 ns | 0.5684 | - | - | 9840 B |
| System.Collections | CreateAddAndClear<String> | ConcurrentQueue | Job-AHLXGP | Default | Default | 16 | 1 | ? | 512 | 5,833.39 ns | 4.384 ns | 12.508 ns | 5,833.01 ns | 5,786.71 ns | 5,865.83 ns | 1.0740 | 0.0700 | - | 18032 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<Int32> | ConcurrentQueue | Job-OCAWOY | 10 | 6 | 1 | -1 | ? | 2000000 | 18,958,972.16 ns | 1,357,806.277 ns | 3,851,870.408 ns | 18,124,957.50 ns | 14,039,125.50 ns | 30,775,269.50 ns | - | - | - | 527168 B |
| System.Collections.Concurrent | AddRemoveFromDifferentThreads<String> | ConcurrentQueue | Job-OCAWOY | 10 | 6 | 1 | -1 | ? | 2000000 | 22,155,658.25 ns | 955,012.487 ns | 2,646,335.104 ns | 22,400,435.00 ns | 14,281,531.00 ns | 28,474,005.00 ns | - | - | - | 33528 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<Int32> | ConcurrentQueue | Job-OCAWOY | 10 | 6 | 1 | -1 | ? | 2000000 | 109,724,591.48 ns | 10,191,909.446 ns | 29,568,577.342 ns | 108,815,782.50 ns | 51,159,775.50 ns | 180,713,226.50 ns | - | - | - | 2488 B |
| System.Collections.Concurrent | AddRemoveFromSameThreads<String> | ConcurrentQueue | Job-OCAWOY | 10 | 6 | 1 | -1 | ? | 2000000 | 97,721,116.74 ns | 8,854,378.736 ns | 25,404,872.403 ns | 98,165,676.50 ns | 52,096,356.50 ns | 150,054,499.50 ns | - | - | - | 8384 B |
Comparison
-------------
summary:
better: 3, geomean: 3.522
total diff: 3
No Slower results for the provided threshold = 1% and noise filter = 50ns.
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Collections.Concurrent.AddRemoveFromSameThreads<String>.ConcurrentQueue(S | 6.99 | 685948586.00 | 98165676.50 | |
| System.Collections.Concurrent.AddRemoveFromSameThreads<Int32>.ConcurrentQueue(Si | 6.06 | 659905224.50 | 108815782.50 | |
| System.Collections.Concurrent.AddRemoveFromDifferentThreads<String>.ConcurrentQu | 1.03 | 23092573.50 | 22400435.00 | |
Analysis
Here is the change I made on my fork. This branch is based off master, so BenchmarkDotNet may complain about versioning if you just clone this branch alone. I can push a branch off of release/5.0-rc2 if that would be convenient.
Basically, changing the threshold value used in ConcurrentQueueSegment from -1 to a value that allows threads to sleep seems to help threads spend less time spin-waiting. I played around with a few values and found Thread.OptimalMaxSpinWaitsPerSpinIteration gave me the best result, but this was just blindly guessing with various values and may not be the most optimal. Removing the parameter entirely to allow for default behavior with SpinWait.SpinOnce() also improved performance, but not as much as using the Thread.OptimalMaxSpinWaitsPerSpinIteration value.
I'm wondering if there is a case where -1 is still optimal, or could this be changed?
Please let me know if I can include any other information or clarify anything above.
Edit: Needed to remove EPYC results, but this problem does appear on EPYC with similar results to Ryzen. Please see internal email thread for those numbers.