Skip to content

Commit 63c5484

Browse files
committed
workqueue: Add multiple affinity scopes and interface to select them
Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE the default. The code changes to actually add the additional scopes are trivial. Also add module parameter "workqueue.default_affinity_scope" to override the default scope and "affinity_scope" sysfs file to configure it per workqueue. wq_dump.py and documentations are updated accordingly. This enables significant flexibility in configuring how unbound workqueues behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu workqueue. On the other hand, "system" removes all locality boundaries. Many modern machines have multiple L3 caches often while being mostly uniform in terms of memory access. Thus, workqueue's previous behavior of spreading work items in each NUMA node had negative performance implications from unncessarily crossing L3 boundaries between issue and execution. However, picking a finer grained affinity scope also has a downside in that an issuer in one group can't utilize CPUs in other groups. While dependent on the specifics of workload, there's usually a noticeable penalty in crossing L3 boundaries, so let's default to CACHE. This issue will be further addressed and documented with examples in future patches. Signed-off-by: Tejun Heo <[email protected]>
1 parent 025e168 commit 63c5484

File tree

5 files changed

+193
-12
lines changed

5 files changed

+193
-12
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7007,6 +7007,18 @@
70077007
The default value of this parameter is determined by
70087008
the config option CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
70097009

7010+
workqueue.default_affinity_scope=
7011+
Select the default affinity scope to use for unbound
7012+
workqueues. Can be one of "cpu", "smt", "cache",
7013+
"numa" and "system". Default is "cache". For more
7014+
information, see the Affinity Scopes section in
7015+
Documentation/core-api/workqueue.rst.
7016+
7017+
This can be updated after boot through the matching
7018+
file under /sys/module/workqueue/parameters.
7019+
However, the changed default will only apply to
7020+
unbound workqueues created afterwards.
7021+
70107022
workqueue.debug_force_rr_cpu
70117023
Workqueue used to implicitly guarantee that work
70127024
items queued without explicit CPU specified are put

Documentation/core-api/workqueue.rst

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,51 @@ Guidelines
347347
level of locality in wq operations and work item execution.
348348

349349

350+
Affinity Scopes
351+
===============
352+
353+
An unbound workqueue groups CPUs according to its affinity scope to improve
354+
cache locality. For example, if a workqueue is using the default affinity
355+
scope of "cache", it will group CPUs according to last level cache
356+
boundaries. A work item queued on the workqueue will be processed by a
357+
worker running on one of the CPUs which share the last level cache with the
358+
issuing CPU.
359+
360+
Workqueue currently supports the following five affinity scopes.
361+
362+
``cpu``
363+
CPUs are not grouped. A work item issued on one CPU is processed by a
364+
worker on the same CPU. This makes unbound workqueues behave as per-cpu
365+
workqueues without concurrency management.
366+
367+
``smt``
368+
CPUs are grouped according to SMT boundaries. This usually means that the
369+
logical threads of each physical CPU core are grouped together.
370+
371+
``cache``
372+
CPUs are grouped according to cache boundaries. Which specific cache
373+
boundary is used is determined by the arch code. L3 is used in a lot of
374+
cases. This is the default affinity scope.
375+
376+
``numa``
377+
CPUs are grouped according to NUMA bounaries.
378+
379+
``system``
380+
All CPUs are put in the same group. Workqueue makes no effort to process a
381+
work item on a CPU close to the issuing CPU.
382+
383+
The default affinity scope can be changed with the module parameter
384+
``workqueue.default_affinity_scope`` and a specific workqueue's affinity
385+
scope can be changed using ``apply_workqueue_attrs()``.
386+
387+
If ``WQ_SYSFS`` is set, the workqueue will have the following affinity scope
388+
related interface files under its ``/sys/devices/virtual/WQ_NAME/``
389+
directory.
390+
391+
``affinity_scope``
392+
Read to see the current affinity scope. Write to change.
393+
394+
350395
Examining Configuration
351396
=======================
352397

@@ -358,6 +403,24 @@ configuration, worker pools and how workqueues map to the pools: ::
358403
===============
359404
wq_unbound_cpumask=0000000f
360405

406+
CPU
407+
nr_pods 4
408+
pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008
409+
pod_node [0]=0 [1]=0 [2]=1 [3]=1
410+
cpu_pod [0]=0 [1]=1 [2]=2 [3]=3
411+
412+
SMT
413+
nr_pods 4
414+
pod_cpus [0]=00000001 [1]=00000002 [2]=00000004 [3]=00000008
415+
pod_node [0]=0 [1]=0 [2]=1 [3]=1
416+
cpu_pod [0]=0 [1]=1 [2]=2 [3]=3
417+
418+
CACHE (default)
419+
nr_pods 2
420+
pod_cpus [0]=00000003 [1]=0000000c
421+
pod_node [0]=0 [1]=1
422+
cpu_pod [0]=0 [1]=0 [2]=1 [3]=1
423+
361424
NUMA
362425
nr_pods 2
363426
pod_cpus [0]=00000003 [1]=0000000c

include/linux/workqueue.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,12 +126,15 @@ struct rcu_work {
126126
};
127127

128128
enum wq_affn_scope {
129+
WQ_AFFN_CPU, /* one pod per CPU */
130+
WQ_AFFN_SMT, /* one pod poer SMT */
131+
WQ_AFFN_CACHE, /* one pod per LLC */
129132
WQ_AFFN_NUMA, /* one pod per NUMA node */
130133
WQ_AFFN_SYSTEM, /* one pod across the whole system */
131134

132135
WQ_AFFN_NR_TYPES,
133136

134-
WQ_AFFN_DFL = WQ_AFFN_NUMA,
137+
WQ_AFFN_DFL = WQ_AFFN_CACHE,
135138
};
136139

137140
/**

kernel/workqueue.c

Lines changed: 105 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,15 @@ struct wq_pod_type {
338338
};
339339

340340
static struct wq_pod_type wq_pod_types[WQ_AFFN_NR_TYPES];
341+
static enum wq_affn_scope wq_affn_dfl = WQ_AFFN_DFL;
342+
343+
static const char *wq_affn_names[WQ_AFFN_NR_TYPES] = {
344+
[WQ_AFFN_CPU] = "cpu",
345+
[WQ_AFFN_SMT] = "smt",
346+
[WQ_AFFN_CACHE] = "cache",
347+
[WQ_AFFN_NUMA] = "numa",
348+
[WQ_AFFN_SYSTEM] = "system",
349+
};
341350

342351
/*
343352
* Per-cpu work items which run for longer than the following threshold are
@@ -3664,7 +3673,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(void)
36643673
goto fail;
36653674

36663675
cpumask_copy(attrs->cpumask, cpu_possible_mask);
3667-
attrs->affn_scope = WQ_AFFN_DFL;
3676+
attrs->affn_scope = wq_affn_dfl;
36683677
return attrs;
36693678
fail:
36703679
free_workqueue_attrs(attrs);
@@ -5777,19 +5786,55 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
57775786
return ret;
57785787
}
57795788

5789+
static int parse_affn_scope(const char *val)
5790+
{
5791+
int i;
5792+
5793+
for (i = 0; i < ARRAY_SIZE(wq_affn_names); i++) {
5794+
if (!strncasecmp(val, wq_affn_names[i], strlen(wq_affn_names[i])))
5795+
return i;
5796+
}
5797+
return -EINVAL;
5798+
}
5799+
5800+
static int wq_affn_dfl_set(const char *val, const struct kernel_param *kp)
5801+
{
5802+
int affn;
5803+
5804+
affn = parse_affn_scope(val);
5805+
if (affn < 0)
5806+
return affn;
5807+
5808+
wq_affn_dfl = affn;
5809+
return 0;
5810+
}
5811+
5812+
static int wq_affn_dfl_get(char *buffer, const struct kernel_param *kp)
5813+
{
5814+
return scnprintf(buffer, PAGE_SIZE, "%s\n", wq_affn_names[wq_affn_dfl]);
5815+
}
5816+
5817+
static const struct kernel_param_ops wq_affn_dfl_ops = {
5818+
.set = wq_affn_dfl_set,
5819+
.get = wq_affn_dfl_get,
5820+
};
5821+
5822+
module_param_cb(default_affinity_scope, &wq_affn_dfl_ops, NULL, 0644);
5823+
57805824
#ifdef CONFIG_SYSFS
57815825
/*
57825826
* Workqueues with WQ_SYSFS flag set is visible to userland via
57835827
* /sys/bus/workqueue/devices/WQ_NAME. All visible workqueues have the
57845828
* following attributes.
57855829
*
5786-
* per_cpu RO bool : whether the workqueue is per-cpu or unbound
5787-
* max_active RW int : maximum number of in-flight work items
5830+
* per_cpu RO bool : whether the workqueue is per-cpu or unbound
5831+
* max_active RW int : maximum number of in-flight work items
57885832
*
57895833
* Unbound workqueues have the following extra attributes.
57905834
*
5791-
* nice RW int : nice value of the workers
5792-
* cpumask RW mask : bitmask of allowed CPUs for the workers
5835+
* nice RW int : nice value of the workers
5836+
* cpumask RW mask : bitmask of allowed CPUs for the workers
5837+
* affinity_scope RW str : worker CPU affinity scope (cache, numa, none)
57935838
*/
57945839
struct wq_device {
57955840
struct workqueue_struct *wq;
@@ -5932,9 +5977,47 @@ static ssize_t wq_cpumask_store(struct device *dev,
59325977
return ret ?: count;
59335978
}
59345979

5980+
static ssize_t wq_affn_scope_show(struct device *dev,
5981+
struct device_attribute *attr, char *buf)
5982+
{
5983+
struct workqueue_struct *wq = dev_to_wq(dev);
5984+
int written;
5985+
5986+
mutex_lock(&wq->mutex);
5987+
written = scnprintf(buf, PAGE_SIZE, "%s\n",
5988+
wq_affn_names[wq->unbound_attrs->affn_scope]);
5989+
mutex_unlock(&wq->mutex);
5990+
5991+
return written;
5992+
}
5993+
5994+
static ssize_t wq_affn_scope_store(struct device *dev,
5995+
struct device_attribute *attr,
5996+
const char *buf, size_t count)
5997+
{
5998+
struct workqueue_struct *wq = dev_to_wq(dev);
5999+
struct workqueue_attrs *attrs;
6000+
int affn, ret = -ENOMEM;
6001+
6002+
affn = parse_affn_scope(buf);
6003+
if (affn < 0)
6004+
return affn;
6005+
6006+
apply_wqattrs_lock();
6007+
attrs = wq_sysfs_prep_attrs(wq);
6008+
if (attrs) {
6009+
attrs->affn_scope = affn;
6010+
ret = apply_workqueue_attrs_locked(wq, attrs);
6011+
}
6012+
apply_wqattrs_unlock();
6013+
free_workqueue_attrs(attrs);
6014+
return ret ?: count;
6015+
}
6016+
59356017
static struct device_attribute wq_sysfs_unbound_attrs[] = {
59366018
__ATTR(nice, 0644, wq_nice_show, wq_nice_store),
59376019
__ATTR(cpumask, 0644, wq_cpumask_show, wq_cpumask_store),
6020+
__ATTR(affinity_scope, 0644, wq_affn_scope_show, wq_affn_scope_store),
59386021
__ATTR_NULL,
59396022
};
59406023

@@ -6537,6 +6620,20 @@ static void __init init_pod_type(struct wq_pod_type *pt,
65376620
}
65386621
}
65396622

6623+
static bool __init cpus_dont_share(int cpu0, int cpu1)
6624+
{
6625+
return false;
6626+
}
6627+
6628+
static bool __init cpus_share_smt(int cpu0, int cpu1)
6629+
{
6630+
#ifdef CONFIG_SCHED_SMT
6631+
return cpumask_test_cpu(cpu0, cpu_smt_mask(cpu1));
6632+
#else
6633+
return false;
6634+
#endif
6635+
}
6636+
65406637
static bool __init cpus_share_numa(int cpu0, int cpu1)
65416638
{
65426639
return cpu_to_node(cpu0) == cpu_to_node(cpu1);
@@ -6554,6 +6651,9 @@ void __init workqueue_init_topology(void)
65546651
struct workqueue_struct *wq;
65556652
int cpu;
65566653

6654+
init_pod_type(&wq_pod_types[WQ_AFFN_CPU], cpus_dont_share);
6655+
init_pod_type(&wq_pod_types[WQ_AFFN_SMT], cpus_share_smt);
6656+
init_pod_type(&wq_pod_types[WQ_AFFN_CACHE], cpus_share_cache);
65576657
init_pod_type(&wq_pod_types[WQ_AFFN_NUMA], cpus_share_numa);
65586658

65596659
mutex_lock(&wq_pool_mutex);

tools/workqueue/wq_dump.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,16 @@ def cpumask_str(cpumask):
7878
workqueues = prog['workqueues']
7979
wq_unbound_cpumask = prog['wq_unbound_cpumask']
8080
wq_pod_types = prog['wq_pod_types']
81+
wq_affn_dfl = prog['wq_affn_dfl']
82+
wq_affn_names = prog['wq_affn_names']
8183

8284
WQ_UNBOUND = prog['WQ_UNBOUND']
8385
WQ_ORDERED = prog['__WQ_ORDERED']
8486
WQ_MEM_RECLAIM = prog['WQ_MEM_RECLAIM']
8587

88+
WQ_AFFN_CPU = prog['WQ_AFFN_CPU']
89+
WQ_AFFN_SMT = prog['WQ_AFFN_SMT']
90+
WQ_AFFN_CACHE = prog['WQ_AFFN_CACHE']
8691
WQ_AFFN_NUMA = prog['WQ_AFFN_NUMA']
8792
WQ_AFFN_SYSTEM = prog['WQ_AFFN_SYSTEM']
8893

@@ -109,12 +114,10 @@ def print_pod_type(pt):
109114
print(f' [{cpu}]={pt.cpu_pod[cpu].value_()}', end='')
110115
print('')
111116

112-
print('')
113-
print('NUMA')
114-
print_pod_type(wq_pod_types[WQ_AFFN_NUMA])
115-
print('')
116-
print('SYSTEM')
117-
print_pod_type(wq_pod_types[WQ_AFFN_SYSTEM])
117+
for affn in [WQ_AFFN_CPU, WQ_AFFN_SMT, WQ_AFFN_CACHE, WQ_AFFN_NUMA, WQ_AFFN_SYSTEM]:
118+
print('')
119+
print(f'{wq_affn_names[affn].string_().decode().upper()}{" (default)" if affn == wq_affn_dfl else ""}')
120+
print_pod_type(wq_pod_types[affn])
118121

119122
print('')
120123
print('Worker Pools')

0 commit comments

Comments
 (0)