@@ -11,7 +11,8 @@ Documentation conventions
1111=========================
1212
1313For brevity, this document uses the type notion "u64", "u32", etc.
14- to mean an unsigned integer whose width is the specified number of bits.
14+ to mean an unsigned integer whose width is the specified number of bits,
15+ and "s32", etc. to mean a signed integer of the specified number of bits.
1516
1617Registers and calling convention
1718================================
@@ -38,14 +39,11 @@ eBPF has two instruction encodings:
3839* the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
3940 constant) value after the basic instruction for a total of 128 bits.
4041
41- The basic instruction encoding is as follows, where MSB and LSB mean the most significant
42- bits and least significant bits, respectively :
42+ The fields conforming an encoded basic instruction are stored in the
43+ following order: :
4344
44- ============= ======= ======= ======= ============
45- 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB)
46- ============= ======= ======= ======= ============
47- imm offset src_reg dst_reg opcode
48- ============= ======= ======= ======= ============
45+ opcode:8 src_reg:4 dst_reg:4 offset:16 imm:32 // In little-endian BPF.
46+ opcode:8 dst_reg:4 src_reg:4 offset:16 imm:32 // In big-endian BPF.
4947
5048**imm **
5149 signed integer immediate value
@@ -63,6 +61,18 @@ imm offset src_reg dst_reg opcode
6361**opcode **
6462 operation to perform
6563
64+ Note that the contents of multi-byte fields ('imm' and 'offset') are
65+ stored using big-endian byte ordering in big-endian BPF and
66+ little-endian byte ordering in little-endian BPF.
67+
68+ For example::
69+
70+ opcode offset imm assembly
71+ src_reg dst_reg
72+ 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
73+ dst_reg src_reg
74+ 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
75+
6676Note that most instructions do not use all of the fields.
6777Unused fields shall be cleared to zero.
6878
@@ -72,18 +82,23 @@ The 64 bits following the basic instruction contain a pseudo instruction
7282using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
7383and imm containing the high 32 bits of the immediate value.
7484
75- ================= ==================
76- 64 bits (MSB) 64 bits (LSB)
77- ================= ==================
78- basic instruction pseudo instruction
79- ================= ==================
85+ This is depicted in the following figure::
86+
87+ basic_instruction
88+ .-----------------------------.
89+ | |
90+ code:8 regs:8 offset:16 imm:32 unused:32 imm:32
91+ | |
92+ '--------------'
93+ pseudo instruction
8094
8195Thus the 64-bit immediate value is constructed as follows:
8296
8397 imm64 = (next_imm << 32) | imm
8498
8599where 'next_imm' refers to the imm value of the pseudo instruction
86- following the basic instruction.
100+ following the basic instruction. The unused bytes in the pseudo
101+ instruction are reserved and shall be cleared to zero.
87102
88103Instruction classes
89104-------------------
@@ -228,28 +243,58 @@ Jump instructions
228243otherwise identical operations.
229244The 'code' field encodes the operation as below:
230245
231- ======== ===== ========================= ============
232- code value description notes
233- ======== ===== ========================= ============
234- BPF_JA 0x00 PC += off BPF_JMP only
235- BPF_JEQ 0x10 PC += off if dst == src
236- BPF_JGT 0x20 PC += off if dst > src unsigned
237- BPF_JGE 0x30 PC += off if dst >= src unsigned
238- BPF_JSET 0x40 PC += off if dst & src
239- BPF_JNE 0x50 PC += off if dst != src
240- BPF_JSGT 0x60 PC += off if dst > src signed
241- BPF_JSGE 0x70 PC += off if dst >= src signed
242- BPF_CALL 0x80 function call
243- BPF_EXIT 0x90 function / program return BPF_JMP only
244- BPF_JLT 0xa0 PC += off if dst < src unsigned
245- BPF_JLE 0xb0 PC += off if dst <= src unsigned
246- BPF_JSLT 0xc0 PC += off if dst < src signed
247- BPF_JSLE 0xd0 PC += off if dst <= src signed
248- ======== ===== ========================= ============
246+ ======== ===== === =========================================== =========================================
247+ code value src description notes
248+ ======== ===== === =========================================== =========================================
249+ BPF_JA 0x0 0x0 PC += offset BPF_JMP only
250+ BPF_JEQ 0x1 any PC += offset if dst == src
251+ BPF_JGT 0x2 any PC += offset if dst > src unsigned
252+ BPF_JGE 0x3 any PC += offset if dst >= src unsigned
253+ BPF_JSET 0x4 any PC += offset if dst & src
254+ BPF_JNE 0x5 any PC += offset if dst != src
255+ BPF_JSGT 0x6 any PC += offset if dst > src signed
256+ BPF_JSGE 0x7 any PC += offset if dst >= src signed
257+ BPF_CALL 0x8 0x0 call helper function by address see `Helper functions `_
258+ BPF_CALL 0x8 0x1 call PC += offset see `Program-local functions `_
259+ BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions `_
260+ BPF_EXIT 0x9 0x0 return BPF_JMP only
261+ BPF_JLT 0xa any PC += offset if dst < src unsigned
262+ BPF_JLE 0xb any PC += offset if dst <= src unsigned
263+ BPF_JSLT 0xc any PC += offset if dst < src signed
264+ BPF_JSLE 0xd any PC += offset if dst <= src signed
265+ ======== ===== === =========================================== =========================================
249266
250267The eBPF program needs to store the return value into register R0 before doing a
251- BPF_EXIT.
268+ ``BPF_EXIT ``.
269+
270+ Example:
271+
272+ ``BPF_JSGE | BPF_X | BPF_JMP32 `` (0x7e) means::
273+
274+ if (s32)dst s>= (s32)src goto +offset
275+
276+ where 's>=' indicates a signed '>=' comparison.
252277
278+ Helper functions
279+ ~~~~~~~~~~~~~~~~
280+
281+ Helper functions are a concept whereby BPF programs can call into a
282+ set of function calls exposed by the underlying platform.
283+
284+ Historically, each helper function was identified by an address
285+ encoded in the imm field. The available helper functions may differ
286+ for each program type, but address values are unique across all program types.
287+
288+ Platforms that support the BPF Type Format (BTF) support identifying
289+ a helper function by a BTF ID encoded in the imm field, where the BTF ID
290+ identifies the helper name and type.
291+
292+ Program-local functions
293+ ~~~~~~~~~~~~~~~~~~~~~~~
294+ Program-local functions are functions exposed by the same BPF program as the
295+ caller, and are referenced by offset from the call instruction, similar to
296+ ``BPF_JA ``. A ``BPF_EXIT `` within the program-local function will return to
297+ the caller.
253298
254299Load and store instructions
255300===========================
@@ -371,14 +416,56 @@ and loaded back to ``R0``.
371416-----------------------------
372417
373418Instructions with the ``BPF_IMM `` 'mode' modifier use the wide instruction
374- encoding for an extra imm64 value.
375-
376- There is currently only one such instruction.
377-
378- ``BPF_LD | BPF_DW | BPF_IMM `` means::
379-
380- dst = imm64
381-
419+ encoding defined in `Instruction encoding `_, and use the 'src' field of the
420+ basic instruction to hold an opcode subtype.
421+
422+ The following table defines a set of ``BPF_IMM | BPF_DW | BPF_LD `` instructions
423+ with opcode subtypes in the 'src' field, using new terms such as "map"
424+ defined further below:
425+
426+ ========================= ====== === ========================================= =========== ==============
427+ opcode construction opcode src pseudocode imm type dst type
428+ ========================= ====== === ========================================= =========== ==============
429+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x0 dst = imm64 integer integer
430+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x1 dst = map_by_fd(imm) map fd map
431+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data pointer
432+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x3 dst = var_addr(imm) variable id data pointer
433+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x4 dst = code_addr(imm) integer code pointer
434+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x5 dst = map_by_idx(imm) map index map
435+ BPF_IMM | BPF_DW | BPF_LD 0x18 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data pointer
436+ ========================= ====== === ========================================= =========== ==============
437+
438+ where
439+
440+ * map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps `_)
441+ * map_by_idx(imm) means to convert a 32-bit index into an address of a map
442+ * map_val(map) gets the address of the first value in a given map
443+ * var_addr(imm) gets the address of a platform variable (see `Platform Variables `_) with a given id
444+ * code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
445+ * the 'imm type' can be used by disassemblers for display
446+ * the 'dst type' can be used for verification and JIT compilation purposes
447+
448+ Maps
449+ ~~~~
450+
451+ Maps are shared memory regions accessible by eBPF programs on some platforms.
452+ A map can have various semantics as defined in a separate document, and may or
453+ may not have a single contiguous memory region, but the 'map_val(map)' is
454+ currently only defined for maps that do have a single contiguous memory region.
455+
456+ Each map can have a file descriptor (fd) if supported by the platform, where
457+ 'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
458+ BPF program can also be defined to use a set of maps associated with the
459+ program at load time, and 'map_by_idx(imm)' means to get the map with the given
460+ index in the set associated with the BPF program containing the instruction.
461+
462+ Platform Variables
463+ ~~~~~~~~~~~~~~~~~~
464+
465+ Platform variables are memory regions, identified by integer ids, exposed by
466+ the runtime and accessible by BPF programs on some platforms. The
467+ 'var_addr(imm)' operation means to get the address of the memory region
468+ identified by the given id.
382469
383470Legacy BPF Packet access instructions
384471-------------------------------------
0 commit comments