Skip to content

Split glibc and linux ucontext_t definitions #23802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rootbeer opened this issue May 5, 2025 · 1 comment
Open

Split glibc and linux ucontext_t definitions #23802

rootbeer opened this issue May 5, 2025 · 1 comment
Labels
bug Observed behavior contradicts documented or intended behavior os-linux standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@rootbeer
Copy link
Contributor

rootbeer commented May 5, 2025

Zig Version

0.15.0-dev.451+bd230215f

Steps to Reproduce and Observed Behavior

Following on #23601 to split the linux kernel sigset_t from the glibc sigset_t, some work may be needed to define ucontext_t separately for the Zig linux and glibc APIs. At the least, there is an embedded sigset_t in it. Currently for Linux, Zig defines a glibc-compatible ucontext_t at std.os.linux.<arch>.ucontext_t, aliases that to std.os.linux.ucontext_t and then aliases that on std.c.ucontext_t.

There are two variations of the ucontext_t in a Linux glibc environment. First there is the ucontext that the linux kernel fills out and puts on the stack for a signal handler (this is what the third parameter in a sigaction signal handler points to). This kernel-created state is implicitly the machine state to be restored when returning from the signal handler. The second variation is the ucontext_t filled out by the glibc getcontext() and makecontext() calls. The kernel and glibc structures are slightly incompatible when compared directly, but are technically compatible if used "correctly". The structures have a consistent layout up to the uc_sigmask signal mask field. The sigset_t field is different sizes in the two structures (1024-bit in glibc and 64/128-bit in kernel). And the fields after that are not consistent (e.g., floating point state, alternative stack state, etc) --- see the structure declarations below. Only the valid first 64/128 bits of the sigset_t should be accessed if used correctly. And the extra state beyond that should be accessed via embedded pointers in the common fields. Specifically, the fpregs field in the mcontext_t is a (maybe NULL) pointer to the (variable sized) floating point state, and extra shadow stack state is accessed via a pointer in the stack_t property.

Additionally, I believe this amorphous state can grow over time. For example, Rust ran into problems because glibc v2.28 added the __ssp state to their ucontext_t (see rust-lang/libc#1410). I believe the kernel has added more floating point state to its saved state over time, too, as processors add more CPU state to be saved and restored. I'm mostly familiar with the x86 flavors, but each architecture has its own history here. This variable size is annoying because callers are expected to allocate the ucontext_t instance, and then invoke getcontext() to initialize it. So the caller really must have the correct size structure.

Note that OpenBSD and Android do not support getcontext(). I don't think Darwin supports it either. (These systems generally do support sigsetjmp which is a predecessor to getcontext). Also getcontext, and its related functions were removed from POSIX: https://man7.org/linux/man-pages/man3/getcontext.3.html:

   POSIX.1-2008 removes these functions, citing portability issues,  
   and recommending that applications be rewritten to use POSIX  
   threads instead.

I see a couple ways of fixing the ucontext_t declaration in Zig:

  1. Zig could remove support for getcontext(), so the only usage of ucontext_t would be in a signal handler. This removes the need for a Zig translation of the glibc ucontext_t . The Linux ucontext_t structure can just contain the "public" fields, and leave the variable sized ones (fpregs, ssp, etc) off. This structure would be the wrong size if allocated directly, but for use in decoding state in a signal handler, it should be sufficient. AFAICT, the only current use case for getcontext() in Zig, is for the backtrace code in std.debug. I believe that code does not need a full context with signal masks.
  2. Pad the glibc ucontext_t with opaque padding bytes so the structure can still be allocated by a caller of getcontext(), but the specific layout is more clearly opaque to callers. Having a little extra padding (e.g., to cover the shadow stack state) should be harmless. It is not clear if future glibc versions will add more state to this structure, and how Zig should handle that.
  3. Zig could drop getcontext() (like option 1) but provide a moral equivalent to getcontext(), and friends. That is, Zig provides a set of functions that returns the full machine state and signal mask, but does so in a structure that is amenable to copies and amenable to future changes. And probably does it in a way that only the relevant machine state is captured, depending on the caller's requirements (e.g., signal mask, floating point state, shadow stacks, etc could be optional).

Kernel and glibc ucontext_t declarations

The Linux kernel's "uapi" struct ucontext is a subset of the state actually pushed on the stack, and contains the bits they're committed to maintaining compatibility for: (linux include/uapi/asm-generic/ucontext.h):

struct ucontext {  
	unsigned long	  uc_flags;
	struct ucontext  *uc_link;  
	stack_t		  uc_stack;  
	struct sigcontext uc_mcontext;  
	sigset_t	  uc_sigmask;	/* mask last for extensibility */  
}; 

Note that the sigset_t here is a kernel-defined signal mask (so 64 or 128 bits), not a glibc 1024-bit mask. Also note that the floating point register state is not explicitly mentioned, even though it is adjacent to this structure in memory (and could be the last field of the struct) -- it's meant to be accessed via the uc_mcontext.

The glibc x86 uncontext_t (glibc sysdeps/unix/sysv/linux/x86/sys/ucontext.h):

typedef struct ucontext_t {  
    unsigned long int  __ctx(uc_flags);
    struct ucontext_t *uc_link;
    stack_t            uc_stack;
    mcontext_t         uc_mcontext; 
    sigset_t           uc_sigmask;
    struct _libc_fpstate __fpregs_mem;
    __extension__ unsigned long long int __ssp[4];
  } ucontext_t;

Note here that the uc_sigmask is a 1024-bit glibc signal mask. This struct does include the floating point and shadow stack state, but callers are not expected to access these directly, but instead via pointers in the uc_mcontext and uc_stack fields. For example, consider casting a pointer to the signal state on the stack pushed by the kernel into a pointer to this struct. The actual fp register content probably overlaps with the 1024-bit sigmask, because the kernel only pushed a 64-bit signal mask. But as long as the pointers in the uc_mcontext are used to access that fp state, it should be fine.

Expected Behavior

ucontext_t is relatively safe and useful both with and without a C library linked.

@rootbeer rootbeer added the bug Observed behavior contradicts documented or intended behavior label May 5, 2025
@alexrp
Copy link
Member

alexrp commented May 6, 2025

I think option 1 is fine. Even the async-await-demo branch doesn't use getcontext() for green threads, IIRC.

@alexrp alexrp added standard library This issue involves writing Zig code for the standard library. os-linux labels May 6, 2025
@alexrp alexrp added this to the 0.15.0 milestone May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior os-linux standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

2 participants