Skip to content

Commit 5e24432

Browse files
jeffjeff
jeff
authored and
jeff
committed
Implement several enhancements to NUMA policies.
Add a new "interleave" allocation policy which stripes pages across domains with a stride or width keeping contiguity within a multi-page region. Move the kernel to the dedicated numbered cpuset #2 making it possible to assign kernel threads and memory policy separately from user. This also eliminates the need for the complicated interrupt binding code. Add a sysctl API for viewing and manipulating domainsets. Refactor some of the cpuset_t manipulation code using the generic bitset type so that it can be used for both. This probably belongs in a dedicated subr file. Attempt to improve the include situation. Reviewed by: kib Discussed with: jhb (cpuset parts) Tested by: pho (before review feedback) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14839
1 parent 9d420f4 commit 5e24432

File tree

14 files changed

+432
-177
lines changed

14 files changed

+432
-177
lines changed

share/man/man9/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ MAN= accept_filter.9 \
118118
disk.9 \
119119
dnv.9 \
120120
domain.9 \
121+
domainset.9 \
121122
dpcpu.9 \
122123
drbr.9 \
123124
driver.9 \

share/man/man9/domainset.9

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
.\" Copyright (c) 2018 Jeffrey Roberson <[email protected]>
2+
.\" All rights reserved.
3+
.\"
4+
.\" Redistribution and use in source and binary forms, with or without
5+
.\" modification, are permitted provided that the following conditions
6+
.\" are met:
7+
.\" 1. Redistributions of source code must retain the above copyright
8+
.\" notice, this list of conditions and the following disclaimer.
9+
.\" 2. Redistributions in binary form must reproduce the above copyright
10+
.\" notice, this list of conditions and the following disclaimer in the
11+
.\" documentation and/or other materials provided with the distribution.
12+
.\"
13+
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS''
14+
.\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
15+
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
16+
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE
17+
.\" LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
18+
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
19+
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
20+
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
21+
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
22+
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
23+
.\" POSSIBILITY OF SUCH DAMAGE.
24+
.\"
25+
.\" $FreeBSD$
26+
.\"
27+
.Dd March 24, 2018
28+
.Dt DOMAINSET 9
29+
.Os
30+
.Sh NAME
31+
.Nm domainset(9)
32+
\(em
33+
.Nm domainset_create ,
34+
.Nm sysctl_handle_domainset .
35+
.Nd domainset functions and operation
36+
.Sh SYNOPSIS
37+
.In sys/_domainset.h
38+
.In sys/domainset.h
39+
.\"
40+
.Bd -literal -offset indent
41+
struct domainset {
42+
domainset_t ds_mask;
43+
uint16_t ds_policy;
44+
domainid_t ds_prefer;
45+
...
46+
};
47+
.Ed
48+
.Pp
49+
.Ft struct domainset *
50+
.Fn domainset_create "const struct domainset *key"
51+
.Ft int
52+
.Fn sysctl_handle_domainset "SYSCTL_HANDLER_ARGS"
53+
.Sh DESCRIPTION
54+
The
55+
.Nm
56+
API provides memory domain allocation policy for NUMA machines.
57+
Each
58+
.Vt domainset
59+
contains a bitmask of allowed domains, an integer policy, and an optional
60+
preferred domain.
61+
Together, these specify a search order for memory allocations as well as
62+
the ability to restrict threads and objects to a subset of available
63+
memory domains for system partitioning and resource management.
64+
.Pp
65+
Every thread in the system and optionally every
66+
.Vt vm_object_t ,
67+
which is used to represent files and other memory sources, has
68+
a reference to a
69+
.Vt struct domainset .
70+
The domainset associated with the object is consulted first and the system
71+
falls back to the thread policy if none exists.
72+
.Pp
73+
The allocation policy has the following possible values:
74+
.Bl -tag -width "foo"
75+
.It Dv DOMAINSET_POLICY_ROUNDROBIN
76+
Memory is allocated from each domain in the mask in a round-robin fashion.
77+
This distributes bandwidth evenly among available domains.
78+
This policy can specify a single domain for a fixed allocation.
79+
.It Dv DOMAINSET_POLICY_FIRSTTOUCH
80+
Memory is allocated from the node that it is first accessed on.
81+
Allocation falls back to round-robin if the current domain is not in the
82+
allowed set or is out of memory.
83+
This policy optimizes for locality but may give pessimal results if the
84+
memory is accessed from many CPUs that are not in the local domain.
85+
.It Dv DOMAINSET_POLICY_PREFER
86+
Memory is allocated from the node in the
87+
.Vt prefer
88+
member. The preferred node must be set in the allowed mask.
89+
If the preferred node is out of memory the allocation falls back to
90+
round-robin among allowed sets.
91+
.It Dv DOMAINSET_POLICY_INTERLEAVE
92+
Memory is allocated in a striped fashion with multiple pages
93+
allocated to each domain in the set according to the offset within
94+
the object.
95+
The strip width is object dependent and may be as large as a
96+
super-page (2MB on amd64).
97+
This gives good distribution among memory domains while keeping system
98+
efficiency higher and is preferential to round-robin for general use.
99+
.El
100+
.Pp
101+
The
102+
.Fn domainset_create
103+
function takes a partially filled in domainset as a key and returns a
104+
valid domainset or NULL.
105+
It is critical that consumers not use domainsets that have not been
106+
returned by this function.
107+
.Vt
108+
domainset
109+
is an immutable type that is shared among all matching keys and must
110+
not be modified after return.
111+
.Pp
112+
The
113+
.Fn sysctl_handle_domainset
114+
function is provided as a convenience for modifying or viewing domainsets
115+
that are not accessible via
116+
.Xr cpuset 2 .
117+
It is intended for use with
118+
.Xr sysctl 9 .
119+
.Pp
120+
.Sh SEE ALSO
121+
.Xr cpuset 1 ,
122+
.Xr cpuset 2 ,
123+
.Xr cpuset_setdomain 2 ,
124+
.Xr bitset 9
125+
.Sh HISTORY
126+
.In sys/domainset.h
127+
first appeared in
128+
.Fx 12.0 .

0 commit comments

Comments
 (0)