Skip to content

Reduce time in .NET Threadpool's WorkStealingQueue.TrySteal method and ThreadPoolWorkQueue.Dequeue methods.  #10752

@vancem

Description

@vancem

A number of different users have noted a large amount of time in the .NET ThreadPool WorkStealingQueue.TrySteal method (Being called from ThredPoolWorkQueue.Dequeue method).

From what we can tell the scenario that causes is bursty workloads. For bursty workloads our guidance is to set a MinWorkerThreads high enough so that there are threads available to handle the burst. For high scale machines (e.g. 16 Proc), it is not uncommon then to set this minimum in the 160-320 thread range.

When a burst (lets say it needs 100 threads to do the work), then those 100 threads do the work, then calle Dequeue to get the next work. However the burst is over, and thus they all don't find any work left, and go through a loop lin the ThredPoolWorkQueue.Dequeue method to find work to steal from other threads. (which will fail).

Thus you have 100 threads spinning through 160-320 worker threads looking for more work, thus requring 16K to 32K checks. These threads 'fight' over the memory to check that the queues are empty, and thus even though the check is short, it consumes a non-trivial amount of CPU time. If these bursts come frequently (e.g. every 10-100 msec), then the CPU adds up.

Here is where we see the CPU time spent (This is on Desktop framework, but the code is very similar for .NET core). Here is the code in Dequeue.

                   if (null == callback)
                   {
100.0 |                WorkStealingQueue[] otherQueues = allThreadQueues.Current;
  1.9K|                int i = tl.random.Next(otherQueues.Length);
                       int c = otherQueues.Length;
                       while (c > 0)
                       {
 49.1K|                    WorkStealingQueue otherQueue = Volatile.Read(ref otherQueues[i % otherQueues.Length]);
141.2K|                    if (otherQueue != null &&
                               otherQueue != wsq &&
                               otherQueue.TrySteal(out callback, ref missedSteal))

And here we see the hot code in TrySteal (That 141K, broken down)

                   private bool TrySteal(out IThreadPoolWorkItem obj, ref bool missedSteal, int millisecondsTimeout)
                   {
 23.8K|                obj = null;
       
                       while (true)
                       {
 92.5K|                    if (m_headIndex >= m_tailIndex)
  8.1K|                        return false;

In .NET Core the code is a bit different because we have created a helper called 'CanSteal' that does m_headIndex >= m_tailIndex, and we call this helper in Dequeue before calling TrySteal. This helps cut the cost per iteration, but does not mitigate the fact that we are doing an O(n) operation, and we have to 'fight' over the memory representing m_headIndex and m_tailIndex variables) Thus on .NET Core the problem will not show up in TrySteal and should be less severe, but probably still problematic.

To really fix the problem we need to be less aggressive about checking for stealing. Ideally want to do some checking, but we want to be much less aggressive if we know that other threads will shortly come along and do a more aggressive check. This avoids O(N) behavior, which is the fundamental problem.

The solutions probably looks like only looking for work to steal for a subset unless we have been asked to be 'aggressive'. We are aggressive only after a certain amount of time.

@kouvel @stephentoub

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions