Skip to content

Commit e0cea7c

Browse files
borkmannAlexei Starovoitov
authored and
Alexei Starovoitov
committed
bpf: implement ld_abs/ld_ind in native bpf
The main part of this work is to finally allow removal of LD_ABS and LD_IND from the BPF core by reimplementing them through native eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and keeping them around in native eBPF caused way more trouble than actually worth it. To just list some of the security issues in the past: * fdfaf64 ("x86: bpf_jit: support negative offsets") * 35607b0 ("sparc: bpf_jit: fix loads from negative offsets") * e0ee9c1 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") * 07aee94 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call") * 6d59b7d ("bpf, s390x: do not reload skb pointers in non-skb context") * 87338c8 ("bpf, ppc64: do not reload skb pointers in non-skb context") For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy these days due to their limitations and more efficient/flexible alternatives that have been developed over time such as direct packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a register, the load happens in host endianness and its exception handling can yield unexpected behavior. The latter is explained in depth in f6b1b3b ("bpf: fix subprog verifier bypass by div/mod by 0 exception") with similar cases of exceptions we had. In native eBPF more recent program types will disable LD_ABS/LD_IND altogether through may_access_skb() in verifier, and given the limitations in terms of exception handling, it's also disabled in programs that use BPF to BPF calls. In terms of cBPF, the LD_ABS/LD_IND is used in networking programs to access packet data. It is not used in seccomp-BPF but programs that use it for socket filtering or reuseport for demuxing with cBPF. This is mostly relevant for applications that have not yet migrated to native eBPF. The main complexity and source of bugs in LD_ABS/LD_IND is coming from their implementation in the various JITs. Most of them keep the model around from cBPF times by implementing a fastpath written in asm. They use typically two from the BPF program hidden CPU registers for caching the skb's headlen (skb->len - skb->data_len) and skb->data. Throughout the JIT phase this requires to keep track whether LD_ABS/LD_IND are used and if so, the two registers need to be recached each time a BPF helper would change the underlying packet data in native eBPF case. At least in eBPF case, available CPU registers are rare and the additional exit path out of the asm written JIT helper makes it also inflexible since not all parts of the JITer are in control from plain C. A LD_ABS/LD_IND implementation in eBPF therefore allows to significantly reduce the complexity in JITs with comparable performance results for them, e.g.: test_bpf tcpdump port 22 tcpdump complex x64 - before 15 21 10 14 19 18 - after 7 10 10 7 10 15 arm64 - before 40 91 92 40 91 151 - after 51 64 73 51 62 113 For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter() and cache the skb's headlen and data in the cBPF prologue. The BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just used as a local temporary variable. This allows to shrink the image on x86_64 also for seccomp programs slightly since mapping to %rsi is not an ereg. In callee-saved R8 and R9 we now track skb data and headlen, respectively. For normal prologue emission in the JITs this does not add any extra instructions since R8, R9 are pushed to stack in any case from eBPF side. cBPF uses the convert_bpf_ld_abs() emitter which probes the fast path inline already and falls back to bpf_skb_load_helper_{8,16,32}() helper relying on the cached skb data and headlen as well. R8 and R9 never need to be reloaded due to bpf_helper_changes_pkt_data() since all skb access in cBPF is read-only. Then, for the case of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally, does neither cache skb data and headlen nor has an inlined fast path. The reason for the latter is that native eBPF does not have any extra registers available anyway, but even if there were, it avoids any reload of skb data and headlen in the first place. Additionally, for the negative offsets, we provide an alternative bpf_skb_load_bytes_relative() helper in eBPF which operates similarly as bpf_skb_load_bytes() and allows for more flexibility. Tested myself on x64, arm64, s390x, from Sandipan on ppc64. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
1 parent 93731ef commit e0cea7c

File tree

5 files changed

+262
-100
lines changed

5 files changed

+262
-100
lines changed

include/linux/bpf.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,8 @@ struct bpf_verifier_ops {
235235
struct bpf_insn_access_aux *info);
236236
int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
237237
const struct bpf_prog *prog);
238+
int (*gen_ld_abs)(const struct bpf_insn *orig,
239+
struct bpf_insn *insn_buf);
238240
u32 (*convert_ctx_access)(enum bpf_access_type type,
239241
const struct bpf_insn *src,
240242
struct bpf_insn *dst,

include/linux/filter.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,9 @@ struct xdp_buff;
4747
/* Additional register mappings for converted user programs. */
4848
#define BPF_REG_A BPF_REG_0
4949
#define BPF_REG_X BPF_REG_7
50-
#define BPF_REG_TMP BPF_REG_8
50+
#define BPF_REG_TMP BPF_REG_2 /* scratch reg */
51+
#define BPF_REG_D BPF_REG_8 /* data, callee-saved */
52+
#define BPF_REG_H BPF_REG_9 /* hlen, callee-saved */
5153

5254
/* Kernel hidden auxiliary/helper register for hardening step.
5355
* Only used by eBPF JITs. It's nothing more than a temporary

kernel/bpf/core.c

Lines changed: 8 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -634,23 +634,6 @@ static int bpf_jit_blind_insn(const struct bpf_insn *from,
634634
*to++ = BPF_JMP_REG(from->code, from->dst_reg, BPF_REG_AX, off);
635635
break;
636636

637-
case BPF_LD | BPF_ABS | BPF_W:
638-
case BPF_LD | BPF_ABS | BPF_H:
639-
case BPF_LD | BPF_ABS | BPF_B:
640-
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
641-
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
642-
*to++ = BPF_LD_IND(from->code, BPF_REG_AX, 0);
643-
break;
644-
645-
case BPF_LD | BPF_IND | BPF_W:
646-
case BPF_LD | BPF_IND | BPF_H:
647-
case BPF_LD | BPF_IND | BPF_B:
648-
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ from->imm);
649-
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
650-
*to++ = BPF_ALU32_REG(BPF_ADD, BPF_REG_AX, from->src_reg);
651-
*to++ = BPF_LD_IND(from->code, BPF_REG_AX, 0);
652-
break;
653-
654637
case BPF_LD | BPF_IMM | BPF_DW:
655638
*to++ = BPF_ALU64_IMM(BPF_MOV, BPF_REG_AX, imm_rnd ^ aux[1].imm);
656639
*to++ = BPF_ALU64_IMM(BPF_XOR, BPF_REG_AX, imm_rnd);
@@ -891,14 +874,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
891874
INSN_3(LDX, MEM, W), \
892875
INSN_3(LDX, MEM, DW), \
893876
/* Immediate based. */ \
894-
INSN_3(LD, IMM, DW), \
895-
/* Misc (old cBPF carry-over). */ \
896-
INSN_3(LD, ABS, B), \
897-
INSN_3(LD, ABS, H), \
898-
INSN_3(LD, ABS, W), \
899-
INSN_3(LD, IND, B), \
900-
INSN_3(LD, IND, H), \
901-
INSN_3(LD, IND, W)
877+
INSN_3(LD, IMM, DW)
902878

903879
bool bpf_opcode_in_insntable(u8 code)
904880
{
@@ -908,6 +884,13 @@ bool bpf_opcode_in_insntable(u8 code)
908884
[0 ... 255] = false,
909885
/* Now overwrite non-defaults ... */
910886
BPF_INSN_MAP(BPF_INSN_2_TBL, BPF_INSN_3_TBL),
887+
/* UAPI exposed, but rewritten opcodes. cBPF carry-over. */
888+
[BPF_LD | BPF_ABS | BPF_B] = true,
889+
[BPF_LD | BPF_ABS | BPF_H] = true,
890+
[BPF_LD | BPF_ABS | BPF_W] = true,
891+
[BPF_LD | BPF_IND | BPF_B] = true,
892+
[BPF_LD | BPF_IND | BPF_H] = true,
893+
[BPF_LD | BPF_IND | BPF_W] = true,
911894
};
912895
#undef BPF_INSN_3_TBL
913896
#undef BPF_INSN_2_TBL
@@ -938,8 +921,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
938921
#undef BPF_INSN_3_LBL
939922
#undef BPF_INSN_2_LBL
940923
u32 tail_call_cnt = 0;
941-
void *ptr;
942-
int off;
943924

944925
#define CONT ({ insn++; goto select_insn; })
945926
#define CONT_JMP ({ insn++; goto select_insn; })
@@ -1266,67 +1247,6 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
12661247
atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
12671248
(DST + insn->off));
12681249
CONT;
1269-
LD_ABS_W: /* BPF_R0 = ntohl(*(u32 *) (skb->data + imm32)) */
1270-
off = IMM;
1271-
load_word:
1272-
/* BPF_LD + BPD_ABS and BPF_LD + BPF_IND insns are only
1273-
* appearing in the programs where ctx == skb
1274-
* (see may_access_skb() in the verifier). All programs
1275-
* keep 'ctx' in regs[BPF_REG_CTX] == BPF_R6,
1276-
* bpf_convert_filter() saves it in BPF_R6, internal BPF
1277-
* verifier will check that BPF_R6 == ctx.
1278-
*
1279-
* BPF_ABS and BPF_IND are wrappers of function calls,
1280-
* so they scratch BPF_R1-BPF_R5 registers, preserve
1281-
* BPF_R6-BPF_R9, and store return value into BPF_R0.
1282-
*
1283-
* Implicit input:
1284-
* ctx == skb == BPF_R6 == CTX
1285-
*
1286-
* Explicit input:
1287-
* SRC == any register
1288-
* IMM == 32-bit immediate
1289-
*
1290-
* Output:
1291-
* BPF_R0 - 8/16/32-bit skb data converted to cpu endianness
1292-
*/
1293-
1294-
ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 4, &tmp);
1295-
if (likely(ptr != NULL)) {
1296-
BPF_R0 = get_unaligned_be32(ptr);
1297-
CONT;
1298-
}
1299-
1300-
return 0;
1301-
LD_ABS_H: /* BPF_R0 = ntohs(*(u16 *) (skb->data + imm32)) */
1302-
off = IMM;
1303-
load_half:
1304-
ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 2, &tmp);
1305-
if (likely(ptr != NULL)) {
1306-
BPF_R0 = get_unaligned_be16(ptr);
1307-
CONT;
1308-
}
1309-
1310-
return 0;
1311-
LD_ABS_B: /* BPF_R0 = *(u8 *) (skb->data + imm32) */
1312-
off = IMM;
1313-
load_byte:
1314-
ptr = bpf_load_pointer((struct sk_buff *) (unsigned long) CTX, off, 1, &tmp);
1315-
if (likely(ptr != NULL)) {
1316-
BPF_R0 = *(u8 *)ptr;
1317-
CONT;
1318-
}
1319-
1320-
return 0;
1321-
LD_IND_W: /* BPF_R0 = ntohl(*(u32 *) (skb->data + src_reg + imm32)) */
1322-
off = IMM + SRC;
1323-
goto load_word;
1324-
LD_IND_H: /* BPF_R0 = ntohs(*(u16 *) (skb->data + src_reg + imm32)) */
1325-
off = IMM + SRC;
1326-
goto load_half;
1327-
LD_IND_B: /* BPF_R0 = *(u8 *) (skb->data + src_reg + imm32) */
1328-
off = IMM + SRC;
1329-
goto load_byte;
13301250

13311251
default_label:
13321252
/* If we ever reach this, we have a bug somewhere. Die hard here

kernel/bpf/verifier.c

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3884,6 +3884,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
38843884
return -EINVAL;
38853885
}
38863886

3887+
if (!env->ops->gen_ld_abs) {
3888+
verbose(env, "bpf verifier is misconfigured\n");
3889+
return -EINVAL;
3890+
}
3891+
38873892
if (env->subprog_cnt) {
38883893
/* when program has LD_ABS insn JITs and interpreter assume
38893894
* that r1 == ctx == skb which is not the case for callees
@@ -5519,6 +5524,25 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
55195524
continue;
55205525
}
55215526

5527+
if (BPF_CLASS(insn->code) == BPF_LD &&
5528+
(BPF_MODE(insn->code) == BPF_ABS ||
5529+
BPF_MODE(insn->code) == BPF_IND)) {
5530+
cnt = env->ops->gen_ld_abs(insn, insn_buf);
5531+
if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf)) {
5532+
verbose(env, "bpf verifier is misconfigured\n");
5533+
return -EINVAL;
5534+
}
5535+
5536+
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
5537+
if (!new_prog)
5538+
return -ENOMEM;
5539+
5540+
delta += cnt - 1;
5541+
env->prog = prog = new_prog;
5542+
insn = new_prog->insnsi + i + delta;
5543+
continue;
5544+
}
5545+
55225546
if (insn->code != (BPF_JMP | BPF_CALL))
55235547
continue;
55245548
if (insn->src_reg == BPF_PSEUDO_CALL)

0 commit comments

Comments
 (0)