bpf: tracing multi-link support #5383

kernel-patches-daemon-bpf-rc · 2025-05-28T03:55:42Z

Pull request for series with
subject: bpf: tracing multi-link support
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=966845

For now, there isn't a way to set and get per-function metadata with a low overhead, which is not convenient for some situations. Take BPF trampoline for example, we need to create a trampoline for each kernel function, as we have to store some information of the function to the trampoline, such as BPF progs, function arg count, etc. The performance overhead and memory consumption can be higher to create these trampolines. With the supporting of per-function metadata storage, we can store these information to the metadata, and create a global BPF trampoline for all the kernel functions. In the global trampoline, we get the information that we need from the function metadata through the ip (function address) with almost no overhead. Another beneficiary can be fprobe. For now, fprobe will add all the functions that it hooks into a hash table. And in fprobe_entry(), it will lookup all the handlers of the function in the hash table. The performance can suffer from the hash table lookup. We can optimize it by adding the handler to the function metadata instead. Support per-function metadata storage in the function padding, and previous discussion can be found in [1]. Generally speaking, we have two way to implement this feature: 1. Create a function metadata array, and prepend a insn which can hold the index of the function metadata in the array. And store the insn to the function padding. 2. Allocate the function metadata with kmalloc(), and prepend a insn which hold the pointer of the metadata. And store the insn to the function padding. Compared with way 2, way 1 consume less space, but we need to do more work on the global function metadata array. And we implement this function in the way 1. Link: https://lore.kernel.org/bpf/CADxym3anLzM6cAkn_z71GDd_VeKiqqk1ts=xuiP7pr4PO6USPA@mail.gmail.com/ [1] Signed-off-by: Menglong Dong <[email protected]>

With CONFIG_CALL_PADDING enabled, there will be 16-bytes padding space before all the kernel functions. And some kernel features can use it, such as MITIGATION_CALL_DEPTH_TRACKING, CFI_CLANG, FINEIBT, etc. In my research, MITIGATION_CALL_DEPTH_TRACKING will consume the tail 9-bytes in the function padding, CFI_CLANG will consume the head 5-bytes, and FINEIBT will consume all the 16 bytes if it is enabled. So there will be no space for us if MITIGATION_CALL_DEPTH_TRACKING and CFI_CLANG are both enabled, or FINEIBT is enabled. In order to implement the padding-based function metadata, we need 5-bytes to prepend a "mov %eax xxx" insn in x86_64, which can hold a 4-bytes index. So we have following logic: 1. use the head 5-bytes if CFI_CLANG is not enabled 2. use the tail 5-bytes if MITIGATION_CALL_DEPTH_TRACKING and FINEIBT are not enabled 3. try to probe if fineibt or the call thunks is enabled after the kernel boot dynamically On the third case, we implement the function metadata by hash table if "cfi_mode==CFI_FINEIBT || thunks_initialized". Therefore, we need to make thunks_initialized global in arch/x86/kernel/callthunks.c Signed-off-by: Menglong Dong <[email protected]>

The per-function metadata storage is already used by ftrace if CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS is enabled, and it store the pointer of the callback directly to the function padding, which consume 8-bytes, in the commit baaf553 ("arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS"). So we can directly store the index to the function padding too, without a prepending. With CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS enabled, the function is 8-bytes aligned, and we will compile the kernel with extra 8-bytes (2 NOPS) padding space. Otherwise, the function is 4-bytes aligned, and only extra 4-bytes (1 NOPS) is needed for us. However, we have the same problem with Mark in the commit above: we can't use the function padding together with CFI_CLANG, which can make the clang compiles a wrong offset to the pre-function type hash. So we fallback to the hash table mode for function metadata if CFI_CLANG is enabled. Signed-off-by: Menglong Dong <[email protected]>

Introduce the struct kfunc_md_tramp_prog for BPF_PROG_TYPE_TRACING, and add the field "bpf_progs" to struct kfunc_md. These filed will be used in the next patch of bpf global trampoline. And the KFUNC_MD_FL_TRACING_ORIGIN is introduced to indicate that origin call is needed on this function. Add the function kfunc_md_bpf_link and kfunc_md_bpf_unlink to add or remove bpf prog to kfunc_md. Meanwhile, introduce kunfc_md_bpf_ips() to get all the kernel functions in kfunc_mds that contains bpf progs. The KFUNC_MD_FL_BPF_REMOVING indicate that a removing operation is in progress, and we shouldn't return it if "bpf_prog_cnt<=1" in kunfc_md_bpf_ips(). Signed-off-by: Menglong Dong <[email protected]>

Implement the bpf global trampoline "bpf_global_caller" for x86_64. The logic of it is similar to the bpf trampoline: 1. save the regs for function args. For now, only the function with args count no more than 6 is supported 2. save rbx and r12, which will be used to store the prog list and return value of __bpf_prog_enter_recur 3. get the origin function address on the stack. To get the real function address, we make it "&= $0xfffffffffffffff0", as it is always 16-bytes aligned 4. get the function metadata by calling kfunc_md_get_noref() 5. get the function args count from the kfunc_md and store it on the stack. 6. get the kfunc_md flags and store it on the stack. Call kfunc_md_enter() if origin call is needed 7. get the prog list for FENTRY, and run all the progs in the list with bpf_caller_prog_run 8. goto the end if origin call is not necessary 9. get the prog list for MODIFY_RETURN, and run all the progs in the list with bpf_caller_prog_run 10.restore the regs and do the origin call. We get the ip of the origin function by the rip in the stack 11.save the return value of the origin call to the stack. 12.get the prog list for FEXIT, and run all the progs in the list with bpf_caller_prog_run 13.restore rbx, r12, r13. In order to rebalance the RSB, we call bpf_global_caller_rsb here. Indirect call is used in bpf_caller_prog_run, as we load and call the function address from the stack in the origin call case. What's more, we get the bpf progs from the kfunc_md and call it indirectly. We make the indirect call with CALL_NOSPEC, and I'm not sure if it can prevent the Spectre. I just saw others do it in the same way :/ We use the r13 to keep the address where we put the return value of the origin call on the stack. The offset of it is "FUNC_ARGS_OFFSET + 8 * nr_args". The calling of kfunc_md_get_noref() should be within rcu_read_lock, which I don't, as this will increase the overhead of a function call. And I'm considering to make the calling of the bpf prog list within the rcu lock: rcu_read_lock() kfunc_md_get_noref() call fentry progs call modify_return progs rcu_read_unlock() call origin rcu_read_lock() call fexit progs rcu_read_unlock() I'm not sure why the general bpf trampoline don't do it this way. Because this will make the trampoline hold the rcu lock too long? Signed-off-by: Menglong Dong <[email protected]>

Factor out ftrace_direct_update() from register_ftrace_direct(), which is used to add new entries to the direct_functions. This function will be used in the later patch. Signed-off-by: Menglong Dong <[email protected]>

For now, we can change the address of a direct ftrace_ops with modify_ftrace_direct(). However, we can't change the functions to filter for a direct ftrace_ops. Therefore, we introduce the function reset_ftrace_direct_ips() to do such things, and this function will reset the functions to filter for a direct ftrace_ops. This function do such thing in following steps: 1. filter out the new functions from ips that don't exist in the ops->func_hash->filter_hash and add them to the new hash. 2. add all the functions in the new ftrace_hash to direct_functions by ftrace_direct_update(). 3. reset the functions to filter of the ftrace_ops to the ips with ftrace_set_filter_ips(). 4. remove the functions that in the old ftrace_hash, but not in the new ftrace_hash from direct_functions. Signed-off-by: Menglong Dong <[email protected]>

Introduce the struct bpf_gtramp_link, which is used to attach a bpf prog to multi functions. Meanwhile, introduce corresponding function bpf_gtrampoline_{link,unlink}_prog. The lock global_tr_lock is held during global trampoline link and unlink. Why we define the global_tr_lock as rw_semaphore? Well, it should be mutex here, but we will use the rw_semaphore in the later patch for the trampoline override case :/ When unlink the global trampoline link, we mark all the function in the bpf_gtramp_link with KFUNC_MD_FL_BPF_REMOVING and update the global trampoline with bpf_gtrampoline_update(). If this is the last bpf prog in the kfunc_md, the function will be remove from the filter_hash of the ftrace_ops of bpf_global_trampoline. Then, we remove the bpf prog from the kfunc_md, and free the kfunc_md if necessary. Signed-off-by: Menglong Dong <[email protected]>

In this commit, we add the 'accessed_args' field to struct bpf_prog_aux, which is used to record the accessed index of the function args in btf_ctx_access(). Meanwhile, we add the function btf_check_func_part_match() to compare the accessed function args of two function prototype. This function will be used in the following commit. Signed-off-by: Menglong Dong <[email protected]>

Refactor the struct modules_array to more general struct ptr_array, which is used to store the pointers. Meanwhile, introduce the bpf_try_add_ptr(), which checks the existing of the ptr before adding it to the array. Seems it should be moved to another files in "lib", and I'm not sure where to add it now, and let's move it to kernel/bpf/syscall.c for now. Signed-off-by: Menglong Dong <[email protected]>

Add target btf to the function args of bpf_check_attach_target(), then the caller can specify the btf to check. Signed-off-by: Menglong Dong <[email protected]>

Move the checking of btf_id_deny and noreturn_deny from check_attach_btf_id() to bpf_check_attach_target(). Therefore, we can do such checking during attaching for tracing multi-link in the later patches. Signed-off-by: Menglong Dong <[email protected]>

Factor the function __arch_get_bpf_regs_nr() to get the regs count that used by the function args. The arch_get_bpf_regs_nr() will return -ENOTSUPP if the regs is not enough to hold the function args. Signed-off-by: Menglong Dong <[email protected]>

In this commit, we add the support to allow attaching a tracing BPF program to multi hooks, which is similar to BPF_TRACE_KPROBE_MULTI. The use case is obvious. For now, we have to create a BPF program for each kernel function, for which we want to trace, even through all the program have the same (or similar logic). This can consume extra memory, and make the program loading slow if we have plenty of kernel function to trace. The KPROBE_MULTI maybe a alternative, but it can't do what TRACING do. For example, the kretprobe can't obtain the function args, but the FEXIT can. For now, we support to create multi-link for fentry/fexit/modify_return with the following new attach types that we introduce: BPF_TRACE_FENTRY_MULTI BPF_TRACE_FEXIT_MULTI BPF_MODIFY_RETURN_MULTI We introduce the struct bpf_tracing_multi_link for this purpose, which can hold all the kernel modules, target bpf program (for attaching to bpf program) or target btf (for attaching to kernel function) that we referenced. During loading, the first target is used for verification by the verifer. And during attaching, we check the consistency of all the targets with the first target. Signed-off-by: Menglong Dong <[email protected]>

Factor out __unregister_ftrace_direct, which doesn't hold the direct_mutex lock. Signed-off-by: Menglong Dong <[email protected]>

Introduce the function replace_ftrace_direct(). This is used to replace the direct ftrace_ops for a function, and will be used in the next patch. Let's call the origin ftrace_ops A, and the new ftrace_ops B. First, we register B directly, and the callback of the functions in A and B will fallback to the ftrace_ops_list case. Then, we modify the address of the entry in the direct_functions to B->direct_call, and remove it from A. This will update the dyn_rec and make the functions call b->direct_call directly. If no function in A->filter_hash, just unregister it. So a record can have more than one direct ftrace_ops, and we need check if there is any direct ops for the record before remove the FTRACE_OPS_FL_DIRECT in __ftrace_hash_rec_update(). Signed-off-by: Menglong Dong <[email protected]>

For now, the bpf global trampoline can't work together with trampoline. For example, we will fail on attaching the FENTRY_MULTI to the functions that FENTRY exists, and FENTRY will also fail if FENTRY_MULTI exists. We make the global trampoline work together with trampoline in this commit. It is not easy. The most difficult part is synchronization between bpf_gtrampoline_link_prog and bpf_trampoline_link_prog, and we use a rw_semaphore here, which is quite ugly. We hold the write lock in bpf_gtrampoline_link_prog and read lock in bpf_trampoline_link_prog. We introduce the function bpf_gtrampoline_link_tramp() to make bpf_gtramp_link fit bpf_trampoline, which will be called in bpf_gtrampoline_link_prog(). If the bpf_trampoline of the function exist in the kfunc_md or we find it with bpf_trampoline_lookup_exist(), it means that we need do the fitting. The fitting is simple, we create a bpf_shim_tramp_link for our prog and link it to the bpf_trampoline with __bpf_trampoline_link_prog(). It's a little complex for the bpf_trampoline_link_prog() case. We create bpf_shim_tramp_link for all the bpf progs in kfunc_md and add it to the bpf_trampoline before we call __bpf_trampoline_link_prog() in bpf_gtrampoline_replace(). And we will fallback in bpf_gtrampoline_replace_finish() if error is returned by __bpf_trampoline_link_prog(). In __bpf_gtrampoline_unlink_prog(), we will call bpf_gtrampoline_remove() to release the bpf_shim_tramp_link, and the bpf prog will be unlinked if it is ever linked successfully in bpf_link_free(). Another solution is to fit into the existing trampoline. For example, we can add the bpf prog to the kfunc_md if tracing_multi bpf prog is attached on the target function when we attach a tracing bpf prog. And we can also update the tracing_multi prog to the trampoline if tracing prog exists on the target function. I think this will make the compatibility much easier. The code in this part is very ugly and messy, and I think it will be a liberation to split it out to another series :/ Signed-off-by: Menglong Dong <[email protected]>

By default, the kernel btf that we load during loading program will be freed after the programs are loaded in bpf_object_load(). However, we still need to use these btf for tracing of multi-link during attaching. Therefore, we don't free the btfs until the bpf object is closed if any bpf programs of the type multi-link tracing exist. Meanwhile, introduce the new api bpf_object__free_btf() to manually free the btfs after attaching. Signed-off-by: Menglong Dong <[email protected]>

Add supporting for the attach types of: BPF_TRACE_FENTRY_MULTI BPF_TRACE_FEXIT_MULTI BPF_MODIFY_RETURN_MULTI Signed-off-by: Menglong Dong <[email protected]>

For now, the libbpf find the btf type id by loop all the btf types and compare its name, which is inefficient if we have many functions to lookup. We add the "use_hash" to the function args of find_kernel_btf_id() to indicate if we should lookup the btf type id by hash. The hash table will be initialized if it has not yet. Signed-off-by: Menglong Dong <[email protected]>

We add skip_invalid and attach_tracing for tracing_multi for the selftests. When we try to attach all the functions in available_filter_functions with tracing_multi, we can't tell if the target symbol can be attached successfully, and the attaching will fail. When skip_invalid is set to true, we will check if it can be attached in libbpf, and skip the invalid entries. We will skip the symbols in the following cases: 1. the btf type not exist 2. the btf type is not a function proto 3. the function args count more that 6 4. the return type is struct or union 5. any function args is struct or union The 5th rule can be a manslaughter, but it's ok for the testings. "attach_tracing" is used to convert a TRACING prog to TRACING_MULTI. For example, we can set the attach type to FENTRY_MULTI before we load the skel. And we can attach the prog with bpf_program__attach_trace_multi_opts() with "attach_tracing=1". The libbpf will attach the target btf type of the prog automatically. This is also used to reuse the selftests of tracing. (Oh my goodness! What am I doing?) Signed-off-by: Menglong Dong <[email protected]>

The glob_match() in test_progs.c has almost the same logic with the glob_match() in libbpf.c, so we replace it to make the code simple. Signed-off-by: Menglong Dong <[email protected]>

We need to get all the kernel function that can be traced sometimes, so we move the get_syms() and get_addrs() in kprobe_multi_test.c to test_progs.c and rename it to bpf_get_ksyms() and bpf_get_addrs(). Signed-off-by: Menglong Dong <[email protected]>

In this commit, we add some testcases for the following attach types: BPF_TRACE_FENTRY_MULTI BPF_TRACE_FEXIT_MULTI BPF_MODIFY_RETURN_MULTI We reuse the testings in fentry_test.c, fexit_test.c and modify_return.c by attach the tracing bpf prog as tracing_multi. We add some functions to skip for tracing progs to bpf_get_ksyms() in this commit. Signed-off-by: Menglong Dong <[email protected]>

Add testcase for the performance of the trace bpf progs. In this testcase, bpf_fentry_test1() will be called 10000000 times in bpf_testmod_bench_run, and the time consumed will be returned. Following cases is considered: - nop: nothing is attached to bpf_fentry_test1() - fentry: a empty FENTRY bpf program is attached to bpf_fentry_test1() - fentry_multi_single: a empty FENTRY_MULTI bpf program is attached to bpf_fentry_test1() - fentry_multi_all: a empty FENTRY_MULTI bpf program is attached to all the kernel functions - kprobe_multi_single: a empty KPROBE_MULTI bpf program is attached to bpf_fentry_test1() - kprobe_multi_all: a empty KPROBE_MULTI bpf program is attached to all the kernel functions And we can get the result by running: ./test_progs -t tracing_multi_bench -v | grep time Signed-off-by: Menglong Dong <[email protected]>

kernel-patches-daemon-bpf-rc · 2025-05-28T03:55:43Z

Upstream branch: c5cebb2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966845
version: 1

kernel-patches-daemon-bpf-rc · 2025-06-01T20:05:40Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=966845 expired. Closing PR.

Kernel Patches Daemon and others added 26 commits May 27, 2025 20:21

adding ci files

a597de4

ftrace: factor out ftrace_direct_update from register_ftrace_direct

d123a0f

Factor out ftrace_direct_update() from register_ftrace_direct(), which is used to add new entries to the direct_functions. This function will be used in the later patch. Signed-off-by: Menglong Dong <[email protected]>

bpf: verifier: add btf to the function args of bpf_check_attach_target

c016969

Add target btf to the function args of bpf_check_attach_target(), then the caller can specify the btf to check. Signed-off-by: Menglong Dong <[email protected]>

ftrace: factor out __unregister_ftrace_direct

d5504ef

Factor out __unregister_ftrace_direct, which doesn't hold the direct_mutex lock. Signed-off-by: Menglong Dong <[email protected]>

libbpf: support tracing_multi

dc24885

Add supporting for the attach types of: BPF_TRACE_FENTRY_MULTI BPF_TRACE_FEXIT_MULTI BPF_MODIFY_RETURN_MULTI Signed-off-by: Menglong Dong <[email protected]>

selftests/bpf: use the glob_match() from libbpf in test_progs.c

03ac865

The glob_match() in test_progs.c has almost the same logic with the glob_match() in libbpf.c, so we replace it to make the code simple. Signed-off-by: Menglong Dong <[email protected]>

kernel-patches-daemon-bpf-rc bot added new bpf-next V1 labels May 28, 2025

kernel-patches-daemon-bpf-rc bot added new bpf-next V1 V1-ci-fail labels May 28, 2025

kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch 3 times, most recently from 18af9fe to 653831c Compare June 1, 2025 18:00

kernel-patches-daemon-bpf-rc bot added changes-requested and removed new labels Jun 1, 2025

kernel-patches-daemon-bpf-rc bot closed this Jun 1, 2025

kernel-patches-daemon-bpf-rc bot deleted the series/966845=>bpf-next branch June 4, 2025 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpf: tracing multi-link support #5383

bpf: tracing multi-link support #5383

Uh oh!

kernel-patches-daemon-bpf-rc bot commented May 28, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented May 28, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jun 1, 2025

Uh oh!

Uh oh!

bpf: tracing multi-link support #5383

bpf: tracing multi-link support #5383

Uh oh!

Conversation

kernel-patches-daemon-bpf-rc bot commented May 28, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented May 28, 2025

Uh oh!

kernel-patches-daemon-bpf-rc bot commented Jun 1, 2025

Uh oh!

Uh oh!