Skip to content

Commit 35dfaad

Browse files
borkmannMartin KaFai Lau
authored and
Martin KaFai Lau
committed
netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides). In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1 ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
1 parent 42d31dd commit 35dfaad

File tree

9 files changed

+1074
-5
lines changed

9 files changed

+1074
-5
lines changed

MAINTAINERS

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3795,6 +3795,15 @@ L: [email protected]
37953795
S: Odd Fixes
37963796
K: (?:\b|_)bpf(?:\b|_)
37973797

3798+
BPF [NETKIT] (BPF-programmable network device)
3799+
M: Daniel Borkmann <[email protected]>
3800+
M: Nikolay Aleksandrov <[email protected]>
3801+
3802+
3803+
S: Supported
3804+
F: drivers/net/netkit.c
3805+
F: include/net/netkit.h
3806+
37983807
BPF [NETWORKING] (struct_ops, reuseport)
37993808
M: Martin KaFai Lau <[email protected]>
38003809

drivers/net/Kconfig

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,15 @@ config NLMON
448448
diagnostics, etc. This is mostly intended for developers or support
449449
to debug netlink issues. If unsure, say N.
450450

451+
config NETKIT
452+
bool "BPF-programmable network device"
453+
depends on BPF_SYSCALL
454+
help
455+
The netkit device is a virtual networking device where BPF programs
456+
can be attached to the device(s) transmission routine in order to
457+
implement the driver's internal logic. The device can be configured
458+
to operate in L3 or L2 mode. If unsure, say N.
459+
451460
config NET_VRF
452461
tristate "Virtual Routing and Forwarding (Lite)"
453462
depends on IP_MULTIPLE_TABLES

drivers/net/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ obj-$(CONFIG_MDIO) += mdio.o
2222
obj-$(CONFIG_NET) += loopback.o
2323
obj-$(CONFIG_NETDEV_LEGACY_INIT) += Space.o
2424
obj-$(CONFIG_NETCONSOLE) += netconsole.o
25+
obj-$(CONFIG_NETKIT) += netkit.o
2526
obj-y += phy/
2627
obj-y += pse-pd/
2728
obj-y += mdio/

0 commit comments

Comments
 (0)