Skip to content

BPF Standard Streams #5374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kernel-patches-daemon-bpf-rc[bot]
Copy link

Pull request for series with
subject: BPF Standard Streams
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 079e5c5
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: db22b13
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 1ae7a84
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 86bc9c7
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: d496557
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: ca56fbd
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 5ffb537
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 5ffb537
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: c5cebb2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.

The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.

The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using try_alloc_pages. This ensures that we can allocate memory
from any context. The eventual plan is to discard this scheme in favor
of Alexei's kmalloc_nolock() [0].

This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.

Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.

The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.

For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.

For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.

The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().

From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.

To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.

This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.

While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.

Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.

Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.

  [0]: https://lore.kernel.org/bpf/[email protected]

Reviewed-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
kkdwivedi added 5 commits May 27, 2025 23:16
Prepare a function for use in future patches that can extract the file
info, line info, and the source line number for a given BPF program
provided it's program counter.

Only the basename of the file path is provided, given it can be
excessively long in some cases.

This will be used in later patches to print source info to the BPF
stream. The source line number is indicated by the return value, and the
file and line info are provided through out parameters.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
In preparation of figuring out the closest program that led to the
current point in the kernel, implement a function that scans through the
stack trace and finds out the closest BPF program when walking down the
stack trace.

Special care needs to be taken to skip over kernel and BPF subprog
frames. We basically scan until we find a BPF main prog frame. The
assumption is that if a program calls into us transitively, we'll
hit it along the way. If not, we end up returning NULL.

Contextually the function will be used in places where we know the
program may have called into us.

Due to reliance on arch_bpf_stack_walk(), this function only works on
x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
arch_bpf_stack_walk as well since we call it outside bpf_throw()
context.

Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
The bpf_ksym_find must be called with RCU read protection, wrap the call
to bpf_ksym_find in bpf_prog_ksym_find with RCU read lock so that
callers do not have to care about holding it specifically.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Introduce a kernel function which is the analogue of dump_stack()
printing some useful information and the stack trace. This is not
exposed to BPF programs yet, but can be made available in the future.

When we have a program counter for a BPF program in the stack trace,
also additionally output the filename and line number to make the trace
helpful. The rest of the trace can be passed into ./decode_stacktrace.sh
to obtain the line numbers for kernel symbols.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Begin reporting may_goto timeouts to BPF program's stderr stream.
Make sure that we don't end up spamming too many errors if the
program keeps failing repeatedly and filling up the stream, hence
emit at most 512 error messages from the kernel for a given stream.

Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: c5cebb2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=966065
version: 2

kkdwivedi added 5 commits May 27, 2025 23:16
Begin reporting rqspinlock deadlocks and timeout to BPF program's
stderr.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
print to the respective streams.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Acked-by: Eduard Zingerman <[email protected]>
Introduce a libbpf API so that users can read data from a given BPF
stream for a BPF prog fd. For now, only the low-level syscall wrapper
is provided, we can add a bpf_program__* accessor as a follow up if
needed.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Add support for printing the BPF stream contents of a program in
bpftool. The new bpftool prog tracelog command is extended to take
stdout and stderr arguments, and then the prog specification.

The bpf_prog_stream_read() API added in previous patch is simply reused
to grab data and then it is dumped to the respective file. The stdout
data is sent to stdout, and stderr is printed to stderr.

Cc: Quentin Monnet <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Add selftests to stress test the various facets of the stream API,
memory allocation pattern, and ensuring dumping support is tested and
functional.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
@kernel-patches-daemon-bpf-rc kernel-patches-daemon-bpf-rc bot force-pushed the bpf-next_base branch 3 times, most recently from 18af9fe to 653831c Compare June 1, 2025 18:00
@kernel-patches-daemon-bpf-rc
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=966065 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant