Skip to content

Commit 0ca27d7

Browse files
committed
Merge: bpf: update to 6.4
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3043 Rebase bpf to 6.4. Bugzilla: https://bugzilla.redhat.com/2221599 Signed-off-by: Artem Savkov <[email protected]> Approved-by: Viktor Malik <[email protected]> Approved-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Scott Weaver <[email protected]>
2 parents 0f2587e + 8a3ba5c commit 0ca27d7

File tree

466 files changed

+41428
-21223
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

466 files changed

+41428
-21223
lines changed

Documentation/bpf/bpf_design_QA.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ Q: What is the compatibility story for special BPF types in map values?
314314
Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
315315
values (when using BTF support for BPF maps). This allows to use helpers for
316316
such objects on these fields inside map values. Users are also allowed to embed
317-
pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
317+
pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the
318318
kernel preserve backwards compatibility for these features?
319319

320320
A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
@@ -324,7 +324,7 @@ For struct types that have been added already, like bpf_spin_lock and bpf_timer,
324324
the kernel will preserve backwards compatibility, as they are part of UAPI.
325325

326326
For kptrs, they are also part of UAPI, but only with respect to the kptr
327-
mechanism. The types that you can use with a __kptr and __kptr_ref tagged
327+
mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged
328328
pointer in your struct are NOT part of the UAPI contract. The supported types can
329329
and will change across kernel releases. However, operations like accessing kptr
330330
fields and bpf_kptr_xchg() helper will continue to be supported across kernel

Documentation/bpf/bpf_iterators.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -238,11 +238,8 @@ The following is the breakdown for each field in struct ``bpf_iter_reg``.
238238
that the kernel function cond_resched() is called to avoid other kernel
239239
subsystem (e.g., rcu) misbehaving.
240240
* - seq_info
241-
- Specifies certain action requests in the kernel BPF iterator
242-
infrastructure. Currently, only BPF_ITER_RESCHED is supported. This means
243-
that the kernel function cond_resched() is called to avoid other kernel
244-
subsystem (e.g., rcu) misbehaving.
245-
241+
- Specifies the set of seq operations for the BPF iterator and helpers to
242+
initialize/free the private data for the corresponding ``seq_file``.
246243

247244
`Click here
248245
<https://lore.kernel.org/bpf/[email protected]/>`_

Documentation/bpf/clang-notes.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ Arithmetic instructions
2020
For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
2121
``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included.
2222

23+
Jump instructions
24+
=================
25+
26+
If ``-O0`` is used, Clang will generate the ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d)
27+
instruction, which is not supported by the Linux kernel verifier.
28+
2329
Atomic operations
2430
=================
2531

Documentation/bpf/cpumasks.rst

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ For example:
5151
.. code-block:: c
5252
5353
struct cpumask_map_value {
54-
struct bpf_cpumask __kptr_ref * cpumask;
54+
struct bpf_cpumask __kptr * cpumask;
5555
};
5656
5757
struct array_map {
@@ -117,18 +117,13 @@ For example:
117117
As mentioned and illustrated above, these ``struct bpf_cpumask *`` objects can
118118
also be stored in a map and used as kptrs. If a ``struct bpf_cpumask *`` is in
119119
a map, the reference can be removed from the map with bpf_kptr_xchg(), or
120-
opportunistically acquired with bpf_cpumask_kptr_get():
121-
122-
.. kernel-doc:: kernel/bpf/cpumask.c
123-
:identifiers: bpf_cpumask_kptr_get
124-
125-
Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
120+
opportunistically acquired using RCU:
126121

127122
.. code-block:: c
128123
129124
/* struct containing the struct bpf_cpumask kptr which is stored in the map. */
130125
struct cpumasks_kfunc_map_value {
131-
struct bpf_cpumask __kptr_ref * bpf_cpumask;
126+
struct bpf_cpumask __kptr * bpf_cpumask;
132127
};
133128
134129
/* The map containing struct cpumasks_kfunc_map_value entries. */
@@ -144,7 +139,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
144139
/**
145140
* A simple example tracepoint program showing how a
146141
* struct bpf_cpumask * kptr that is stored in a map can
147-
* be acquired using the bpf_cpumask_kptr_get() kfunc.
142+
* be passed to kfuncs using RCU protection.
148143
*/
149144
SEC("tp_btf/cgroup_mkdir")
150145
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
@@ -158,26 +153,21 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
158153
if (!v)
159154
return -ENOENT;
160155
156+
bpf_rcu_read_lock();
161157
/* Acquire a reference to the bpf_cpumask * kptr that's already stored in the map. */
162-
kptr = bpf_cpumask_kptr_get(&v->cpumask);
163-
if (!kptr)
158+
kptr = v->cpumask;
159+
if (!kptr) {
164160
/* If no bpf_cpumask was present in the map, it's because
165161
* we're racing with another CPU that removed it with
166162
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
167-
* above, and our call to bpf_cpumask_kptr_get().
168-
* bpf_cpumask_kptr_get() internally safely handles this
169-
* race, and will return NULL if the cpumask is no longer
170-
* present in the map by the time we invoke the kfunc.
163+
* above, and our load of the pointer from the map.
171164
*/
165+
bpf_rcu_read_unlock();
172166
return -EBUSY;
167+
}
173168
174-
/* Free the reference we just took above. Note that the
175-
* original struct bpf_cpumask * kptr is still in the map. It will
176-
* be freed either at a later time if another context deletes
177-
* it from the map, or automatically by the BPF subsystem if
178-
* it's still present when the map is destroyed.
179-
*/
180-
bpf_cpumask_release(kptr);
169+
bpf_cpumask_setall(kptr);
170+
bpf_rcu_read_unlock();
181171
182172
return 0;
183173
}

Documentation/bpf/instruction-set.rst

Lines changed: 128 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ Documentation conventions
1111
=========================
1212

1313
For brevity, this document uses the type notion "u64", "u32", etc.
14-
to mean an unsigned integer whose width is the specified number of bits.
14+
to mean an unsigned integer whose width is the specified number of bits,
15+
and "s32", etc. to mean a signed integer of the specified number of bits.
1516

1617
Registers and calling convention
1718
================================
@@ -38,14 +39,11 @@ eBPF has two instruction encodings:
3839
* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
3940
constant) value after the basic instruction for a total of 128 bits.
4041

41-
The basic instruction encoding is as follows, where MSB and LSB mean the most significant
42-
bits and least significant bits, respectively:
42+
The fields conforming an encoded basic instruction are stored in the
43+
following order::
4344

44-
============= ======= ======= ======= ============
45-
32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
46-
============= ======= ======= ======= ============
47-
imm offset src_reg dst_reg opcode
48-
============= ======= ======= ======= ============
45+
opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
46+
opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
4947

5048
**imm**
5149
signed integer immediate value
@@ -63,6 +61,18 @@ imm offset src_reg dst_reg opcode
6361
**opcode**
6462
operation to perform
6563

64+
Note that the contents of multi-byte fields ('imm' and 'offset') are
65+
stored using big-endian byte ordering in big-endian BPF and
66+
little-endian byte ordering in little-endian BPF.
67+
68+
For example::
69+
70+
opcode offset imm assembly
71+
src_reg dst_reg
72+
07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
73+
dst_reg src_reg
74+
07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
75+
6676
Note that most instructions do not use all of the fields.
6777
Unused fields shall be cleared to zero.
6878

@@ -72,18 +82,23 @@ The 64 bits following the basic instruction contain a pseudo instruction
7282
using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
7383
and imm containing the high 32 bits of the immediate value.
7484

75-
================= ==================
76-
64 bits (MSB) 64 bits (LSB)
77-
================= ==================
78-
basic instruction pseudo instruction
79-
================= ==================
85+
This is depicted in the following figure::
86+
87+
basic_instruction
88+
.-----------------------------.
89+
| |
90+
code:8 regs:8 offset:16 imm:32 unused:32 imm:32
91+
| |
92+
'--------------'
93+
pseudo instruction
8094

8195
Thus the 64-bit immediate value is constructed as follows:
8296

8397
imm64 = (next_imm << 32) | imm
8498

8599
where 'next_imm' refers to the imm value of the pseudo instruction
86-
following the basic instruction.
100+
following the basic instruction. The unused bytes in the pseudo
101+
instruction are reserved and shall be cleared to zero.
87102

88103
Instruction classes
89104
-------------------
@@ -228,28 +243,58 @@ Jump instructions
228243
otherwise identical operations.
229244
The 'code' field encodes the operation as below:
230245

231-
======== ===== ========================= ============
232-
code value description notes
233-
======== ===== ========================= ============
234-
BPF_JA 0x00 PC += off BPF_JMP only
235-
BPF_JEQ 0x10 PC += off if dst == src
236-
BPF_JGT 0x20 PC += off if dst > src unsigned
237-
BPF_JGE 0x30 PC += off if dst >= src unsigned
238-
BPF_JSET 0x40 PC += off if dst & src
239-
BPF_JNE 0x50 PC += off if dst != src
240-
BPF_JSGT 0x60 PC += off if dst > src signed
241-
BPF_JSGE 0x70 PC += off if dst >= src signed
242-
BPF_CALL 0x80 function call
243-
BPF_EXIT 0x90 function / program return BPF_JMP only
244-
BPF_JLT 0xa0 PC += off if dst < src unsigned
245-
BPF_JLE 0xb0 PC += off if dst <= src unsigned
246-
BPF_JSLT 0xc0 PC += off if dst < src signed
247-
BPF_JSLE 0xd0 PC += off if dst <= src signed
248-
======== ===== ========================= ============
246+
======== ===== === =========================================== =========================================
247+
code value src description notes
248+
======== ===== === =========================================== =========================================
249+
BPF_JA 0x0 0x0 PC += offset BPF_JMP only
250+
BPF_JEQ 0x1 any PC += offset if dst == src
251+
BPF_JGT 0x2 any PC += offset if dst > src unsigned
252+
BPF_JGE 0x3 any PC += offset if dst >= src unsigned
253+
BPF_JSET 0x4 any PC += offset if dst & src
254+
BPF_JNE 0x5 any PC += offset if dst != src
255+
BPF_JSGT 0x6 any PC += offset if dst > src signed
256+
BPF_JSGE 0x7 any PC += offset if dst >= src signed
257+
BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_
258+
BPF_CALL 0x8 0x1 call PC += offset see `Program-local functions`_
259+
BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_
260+
BPF_EXIT 0x9 0x0 return BPF_JMP only
261+
BPF_JLT 0xa any PC += offset if dst < src unsigned
262+
BPF_JLE 0xb any PC += offset if dst <= src unsigned
263+
BPF_JSLT 0xc any PC += offset if dst < src signed
264+
BPF_JSLE 0xd any PC += offset if dst <= src signed
265+
======== ===== === =========================================== =========================================
249266

250267
The eBPF program needs to store the return value into register R0 before doing a
251-
BPF_EXIT.
268+
``BPF_EXIT``.
269+
270+
Example:
271+
272+
``BPF_JSGE | BPF_X | BPF_JMP32`` (0x7e) means::
273+
274+
if (s32)dst s>= (s32)src goto +offset
275+
276+
where 's>=' indicates a signed '>=' comparison.
252277

278+
Helper functions
279+
~~~~~~~~~~~~~~~~
280+
281+
Helper functions are a concept whereby BPF programs can call into a
282+
set of function calls exposed by the underlying platform.
283+
284+
Historically, each helper function was identified by an address
285+
encoded in the imm field. The available helper functions may differ
286+
for each program type, but address values are unique across all program types.
287+
288+
Platforms that support the BPF Type Format (BTF) support identifying
289+
a helper function by a BTF ID encoded in the imm field, where the BTF ID
290+
identifies the helper name and type.
291+
292+
Program-local functions
293+
~~~~~~~~~~~~~~~~~~~~~~~
294+
Program-local functions are functions exposed by the same BPF program as the
295+
caller, and are referenced by offset from the call instruction, similar to
296+
``BPF_JA``. A ``BPF_EXIT`` within the program-local function will return to
297+
the caller.
253298

254299
Load and store instructions
255300
===========================
@@ -371,14 +416,56 @@ and loaded back to ``R0``.
371416
-----------------------------
372417

373418
Instructions with the ``BPF_IMM`` 'mode' modifier use the wide instruction
374-
encoding for an extra imm64 value.
375-
376-
There is currently only one such instruction.
377-
378-
``BPF_LD | BPF_DW | BPF_IMM`` means::
379-
380-
dst = imm64
381-
419+
encoding defined in `Instruction encoding`_, and use the 'src' field of the
420+
basic instruction to hold an opcode subtype.
421+
422+
The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD`` instructions
423+
with opcode subtypes in the 'src' field, using new terms such as "map"
424+
defined further below:
425+
426+
========================= ====== === ========================================= =========== ==============
427+
opcode construction opcode src pseudocode imm type dst type
428+
========================= ====== === ========================================= =========== ==============
429+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer
430+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map
431+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
432+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer
433+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer
434+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map
435+
BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
436+
========================= ====== === ========================================= =========== ==============
437+
438+
where
439+
440+
* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
441+
* map_by_idx(imm) means to convert a 32-bit index into an address of a map
442+
* map_val(map) gets the address of the first value in a given map
443+
* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
444+
* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
445+
* the 'imm type' can be used by disassemblers for display
446+
* the 'dst type' can be used for verification and JIT compilation purposes
447+
448+
Maps
449+
~~~~
450+
451+
Maps are shared memory regions accessible by eBPF programs on some platforms.
452+
A map can have various semantics as defined in a separate document, and may or
453+
may not have a single contiguous memory region, but the 'map_val(map)' is
454+
currently only defined for maps that do have a single contiguous memory region.
455+
456+
Each map can have a file descriptor (fd) if supported by the platform, where
457+
'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
458+
BPF program can also be defined to use a set of maps associated with the
459+
program at load time, and 'map_by_idx(imm)' means to get the map with the given
460+
index in the set associated with the BPF program containing the instruction.
461+
462+
Platform Variables
463+
~~~~~~~~~~~~~~~~~~
464+
465+
Platform variables are memory regions, identified by integer ids, exposed by
466+
the runtime and accessible by BPF programs on some platforms. The
467+
'var_addr(imm)' operation means to get the address of the memory region
468+
identified by the given id.
382469

383470
Legacy BPF Packet access instructions
384471
-------------------------------------

0 commit comments

Comments
 (0)