Skip to content

Commit 90b83ef

Browse files
committed
Merge tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov: - Fix and improve BTF deduplication of identical BTF types (Alan Maguire and Andrii Nakryiko) - Support up to 12 arguments in BPF trampoline on arm64 (Xu Kuohai and Alexis Lothoré) - Support load-acquire and store-release instructions in BPF JIT on riscv64 (Andrea Parri) - Fix uninitialized values in BPF_{CORE,PROBE}_READ macros (Anton Protopopov) - Streamline allowed helpers across program types (Feng Yang) - Support atomic update for hashtab of BPF maps (Hou Tao) - Implement json output for BPF helpers (Ihor Solodrai) - Several s390 JIT fixes (Ilya Leoshkevich) - Various sockmap fixes (Jiayuan Chen) - Support mmap of vmlinux BTF data (Lorenz Bauer) - Support BPF rbtree traversal and list peeking (Martin KaFai Lau) - Tests for sockmap/sockhash redirection (Michal Luczaj) - Introduce kfuncs for memory reads into dynptrs (Mykyta Yatsenko) - Add support for dma-buf iterators in BPF (T.J. Mercier) - The verifier support for __bpf_trap() (Yonghong Song) * tag 'bpf-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (135 commits) bpf, arm64: Remove unused-but-set function and variable. selftests/bpf: Add tests with stack ptr register in conditional jmp bpf: Do not include stack ptr register in precision backtracking bookkeeping selftests/bpf: enable many-args tests for arm64 bpf, arm64: Support up to 12 function arguments bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem() bpf: Avoid __bpf_prog_ret0_warn when jit fails bpftool: Add support for custom BTF path in prog load/loadall selftests/bpf: Add unit tests with __bpf_trap() kfunc bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable bpf: Remove special_kfunc_set from verifier selftests/bpf: Add test for open coded dmabuf_iter selftests/bpf: Add test for dmabuf_iter bpf: Add open coded dmabuf iterator bpf: Add dmabuf iterator dma-buf: Rename debugfs symbols bpf: Fix error return value in bpf_copy_from_user_dynptr libbpf: Use mmap to parse vmlinux BTF from sysfs selftests: bpf: Add a test for mmapable vmlinux BTF btf: Allow mmap of vmlinux btf ...
2 parents 1b98f35 + c5cebb2 commit 90b83ef

File tree

108 files changed

+5801
-1713
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+5801
-1713
lines changed

Documentation/bpf/bpf_iterators.rst

Lines changed: 112 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,117 @@
22
BPF Iterators
33
=============
44

5+
--------
6+
Overview
7+
--------
8+
9+
BPF supports two separate entities collectively known as "BPF iterators": BPF
10+
iterator *program type* and *open-coded* BPF iterators. The former is
11+
a stand-alone BPF program type which, when attached and activated by user,
12+
will be called once for each entity (task_struct, cgroup, etc) that is being
13+
iterated. The latter is a set of BPF-side APIs implementing iterator
14+
functionality and available across multiple BPF program types. Open-coded
15+
iterators provide similar functionality to BPF iterator programs, but gives
16+
more flexibility and control to all other BPF program types. BPF iterator
17+
programs, on the other hand, can be used to implement anonymous or BPF
18+
FS-mounted special files, whose contents are generated by attached BPF iterator
19+
program, backed by seq_file functionality. Both are useful depending on
20+
specific needs.
21+
22+
When adding a new BPF iterator program, it is expected that similar
23+
functionality will be added as open-coded iterator for maximum flexibility.
24+
It's also expected that iteration logic and code will be maximally shared and
25+
reused between two iterator API surfaces.
526

6-
----------
7-
Motivation
8-
----------
27+
------------------------
28+
Open-coded BPF Iterators
29+
------------------------
30+
31+
Open-coded BPF iterators are implemented as tightly-coupled trios of kfuncs
32+
(constructor, next element fetch, destructor) and iterator-specific type
33+
describing on-the-stack iterator state, which is guaranteed by the BPF
34+
verifier to not be tampered with outside of the corresponding
35+
constructor/destructor/next APIs.
36+
37+
Each kind of open-coded BPF iterator has its own associated
38+
struct bpf_iter_<type>, where <type> denotes a specific type of iterator.
39+
bpf_iter_<type> state needs to live on BPF program stack, so make sure it's
40+
small enough to fit on BPF stack. For performance reasons its best to avoid
41+
dynamic memory allocation for iterator state and size the state struct big
42+
enough to fit everything necessary. But if necessary, dynamic memory
43+
allocation is a way to bypass BPF stack limitations. Note, state struct size
44+
is part of iterator's user-visible API, so changing it will break backwards
45+
compatibility, so be deliberate about designing it.
46+
47+
All kfuncs (constructor, next, destructor) have to be named consistently as
48+
bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents iterator
49+
type, and iterator state should be represented as a matching
50+
`struct bpf_iter_<type>` state type. Also, all iter kfuncs should have
51+
a pointer to this `struct bpf_iter_<type>` as the very first argument.
52+
53+
Additionally:
54+
- Constructor, i.e., `bpf_iter_<type>_new()`, can have arbitrary extra
55+
number of arguments. Return type is not enforced either.
56+
- Next method, i.e., `bpf_iter_<type>_next()`, has to return a pointer
57+
type and should have exactly one argument: `struct bpf_iter_<type> *`
58+
(const/volatile/restrict and typedefs are ignored).
59+
- Destructor, i.e., `bpf_iter_<type>_destroy()`, should return void and
60+
should have exactly one argument, similar to the next method.
61+
- `struct bpf_iter_<type>` size is enforced to be positive and
62+
a multiple of 8 bytes (to fit stack slots correctly).
63+
64+
Such strictness and consistency allows to build generic helpers abstracting
65+
important, but boilerplate, details to be able to use open-coded iterators
66+
effectively and ergonomically (see libbpf's bpf_for_each() macro). This is
67+
enforced at kfunc registration point by the kernel.
68+
69+
Constructor/next/destructor implementation contract is as follows:
70+
- constructor, `bpf_iter_<type>_new()`, always initializes iterator state on
71+
the stack. If any of the input arguments are invalid, constructor should
72+
make sure to still initialize it such that subsequent next() calls will
73+
return NULL. I.e., on error, *return error and construct empty iterator*.
74+
Constructor kfunc is marked with KF_ITER_NEW flag.
75+
76+
- next method, `bpf_iter_<type>_next()`, accepts pointer to iterator state
77+
and produces an element. Next method should always return a pointer. The
78+
contract between BPF verifier is that next method *guarantees* that it
79+
will eventually return NULL when elements are exhausted. Once NULL is
80+
returned, subsequent next calls *should keep returning NULL*. Next method
81+
is marked with KF_ITER_NEXT (and should also have KF_RET_NULL as
82+
NULL-returning kfunc, of course).
83+
84+
- destructor, `bpf_iter_<type>_destroy()`, is always called once. Even if
85+
constructor failed or next returned nothing. Destructor frees up any
86+
resources and marks stack space used by `struct bpf_iter_<type>` as usable
87+
for something else. Destructor is marked with KF_ITER_DESTROY flag.
88+
89+
Any open-coded BPF iterator implementation has to implement at least these
90+
three methods. It is enforced that for any given type of iterator only
91+
applicable constructor/destructor/next are callable. I.e., verifier ensures
92+
you can't pass number iterator state into, say, cgroup iterator's next method.
93+
94+
From a 10,000-feet BPF verification point of view, next methods are the points
95+
of forking a verification state, which are conceptually similar to what
96+
verifier is doing when validating conditional jumps. Verifier is branching out
97+
`call bpf_iter_<type>_next` instruction and simulates two outcomes: NULL
98+
(iteration is done) and non-NULL (new element is returned). NULL is simulated
99+
first and is supposed to reach exit without looping. After that non-NULL case
100+
is validated and it either reaches exit (for trivial examples with no real
101+
loop), or reaches another `call bpf_iter_<type>_next` instruction with the
102+
state equivalent to already (partially) validated one. State equivalency at
103+
that point means we technically are going to be looping forever without
104+
"breaking out" out of established "state envelope" (i.e., subsequent
105+
iterations don't add any new knowledge or constraints to the verifier state,
106+
so running 1, 2, 10, or a million of them doesn't matter). But taking into
107+
account the contract stating that iterator next method *has to* return NULL
108+
eventually, we can conclude that loop body is safe and will eventually
109+
terminate. Given we validated logic outside of the loop (NULL case), and
110+
concluded that loop body is safe (though potentially looping many times),
111+
verifier can claim safety of the overall program logic.
112+
113+
------------------------
114+
BPF Iterators Motivation
115+
------------------------
9116

10117
There are a few existing ways to dump kernel data into user space. The most
11118
popular one is the ``/proc`` system. For example, ``cat /proc/net/tcp6`` dumps
@@ -323,8 +430,8 @@ Now, in the userspace program, pass the pointer of struct to the
323430

324431
::
325432

326-
link = bpf_program__attach_iter(prog, &opts); iter_fd =
327-
bpf_iter_create(bpf_link__fd(link));
433+
link = bpf_program__attach_iter(prog, &opts);
434+
iter_fd = bpf_iter_create(bpf_link__fd(link));
328435

329436
If both *tid* and *pid* are zero, an iterator created from this struct
330437
``bpf_iter_attach_opts`` will include every opened file of every task in the

Documentation/bpf/kfuncs.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,23 @@ Or::
160160
...
161161
}
162162

163+
2.2.6 __prog Annotation
164+
---------------------------
165+
This annotation is used to indicate that the argument needs to be fixed up to
166+
the bpf_prog_aux of the caller BPF program. Any value passed into this argument
167+
is ignored, and rewritten by the verifier.
168+
169+
An example is given below::
170+
171+
__bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
172+
int (callback_fn)(void *map, int *key, void *value),
173+
unsigned int flags,
174+
void *aux__prog)
175+
{
176+
struct bpf_prog_aux *aux = aux__prog;
177+
...
178+
}
179+
163180
.. _BPF_kfunc_nodef:
164181

165182
2.3 Using an existing kernel function

0 commit comments

Comments
 (0)