|
| 1 | +# Proposal: Go general dynamic TLS |
| 2 | + |
| 3 | +Author: Alexander Musman (Advanced Software Technology Lab, Huawei) |
| 4 | + |
| 5 | +Last updated: 2025-01-27 |
| 6 | + |
| 7 | +Discussion at [golang.org/issue/54805](https://github.com/golang/go/issues/54805). |
| 8 | + |
| 9 | +## Abstract |
| 10 | + |
| 11 | +The Go runtime currently relies on Thread Local Storage (TLS) to preserve goroutine state when interacting with C code, but lacks support for the general dynamic [TLS model](https://uclibc.org/docs/tls.pdf). This limitation hinders the use of certain C libraries, such as Musl, and restricts loading of Go shared libraries without `LD_PRELOAD`. We propose extending the Go assembler and linker to support the general dynamic TLS model, focusing initially on the Arm64 architecture on Linux systems. This enhancement will enable seamless interoperability with a wider range of C libraries and improve the flexibility of deploying Go `c-shared` libraries. |
| 12 | + |
| 13 | +## Background |
| 14 | + |
| 15 | +The current Go runtime leverages a Thread Local Storage (TLS) variable for preserving the current goroutine (`g`) when interacting with C code. This is particularly relevant in scenarios such as CGO interactions and certain runtime functions like race detection, where the code switches to C. To facilitate this, Go uses the `runtime.save_g` function to store the goroutine in the `runtime·tls_g` TLSBSS variable. The `runtime.load_g` function then retrieves it, typically upon returning from C code execution. |
| 16 | +The Go assembler and linker currently support two TLS access models: _initial exec_ and _local exec_. The _local exec_ model is predominantly utilized, especially in build modes like `exe`, and is natively supported by the Go linker. Conversely, the _initial exec_ model requires external linkers like `bfd-ld`, `lld`, or `gold` for support. While the absence of a dynamic TLS model is generally benign with GlibC—owing to its adaptable TLS allocation scheme—this shortcoming becomes problematic with the Musl C library. Musl's more rigid TLS allocation exposes this limitation, as highlighted in issue [golang.org/issue/54805](https://github.com/golang/go/issues/54805). |
| 17 | + |
| 18 | +## Proposal |
| 19 | + |
| 20 | +Introduce general dynamic TLS (Thread Local Storage) support in the Go assembler/linker, and update the runtime assembly—currently the sole user of TLS variables—to accommodate this model. Activate this feature in the assembler with the explicit option `-tls=GD`, while keeping `-tls=IE` as the default for `shared` mode. Additionally, pass `-D=TLS_GD` to enable architecture-specific macro expansion in the runtime's assembly when the general dynamic model is employed. The linker support will depend on external linking, consistent with the existing initial exec TLS approach. |
| 21 | + |
| 22 | +The `cmd/go` command will enable the general dynamic TLS model by default in scenarios that require it, based on the combination of `GOOS`/`GOARCH` and `buildmode`. Initially, this model will be supported by the Arm64 architecture on Linux systems, specifically for `buildmode=c-shared` and `buildmode=c-archive`. |
| 23 | + |
| 24 | +## Rationale |
| 25 | + |
| 26 | +To enable loading a Go `c-shared` module without relying on `LD_PRELOAD`, it is essential to support the _general dynamic_ model. Since the variable resides within the same runtime package as its users, any relaxation of a _global dynamic_ variable reference to _local dynamic_ is automatically identified and executed by the external linker. |
| 27 | +While one could avoid using the `-D` flag by generating the save/restore of the return address directly in the assembler (when lowering MOV instruction), this approach seems less convenient. It does not explicitly show the clobbered register in the assembly code. |
| 28 | +Another consideration would be to modify the runtime functions that interact with TLS variables to have a stack frame. However, this option is not ideal because these functions are sometimes executed in performance-critical paths, such as during race detection. |
| 29 | + |
| 30 | +## Compatibility |
| 31 | + |
| 32 | +There is no change in exported APIs. |
| 33 | +The build modes affected are `c-shared` and `c-archive`. Archives built with `c-archive` may be used in a `c-shared` library, which in turn might be loaded without `LD_PRELOAD`. |
| 34 | +The assembler needs to support a new flag `-tls=`, which allows to choose TLS model explicitly. This flag will be passed by `cmd/go` and will also be useful for testing the TLS lowering. |
| 35 | +A new relocation type `R_ARM64_TLS_GD` would be needed in objabi, along with potentially other architecture-specific relocation types. |
| 36 | + |
| 37 | +## Implementation |
| 38 | + |
| 39 | +A prototype of the implementation, is done and tested with Musl C for arm64 Linux (TBD preparing a review). |
| 40 | +### Changes to `cmd/go` for Supported Platforms |
| 41 | +For compatible GOOS/GOARCH combinations and applicable build modes, the following flags are passed to the assembler: |
| 42 | +``` |
| 43 | +-tls=GD -D=TLS_GD |
| 44 | +``` |
| 45 | +These flags allow conditional use of a register to retain the return address across calls, as detailed below for arm64. |
| 46 | +### Modifications in the Runtime for arm64 Assembly |
| 47 | +In assembly code, specifically for arm64, we propose updating references to thread-local variable in `runtime·save_g`/`runtime·load_g`: |
| 48 | +``` |
| 49 | +LOAD_TLS_G_R0 ; get the offset of tls_g from the thread pointer |
| 50 | +MRS TPIDR_EL0, R27 ; get the thread pointer into R27 |
| 51 | +MOVD g, (R0)(R27) ; use the address in R0+R27 |
| 52 | +``` |
| 53 | +The TLS usage occurs in frameless functions, so we ensure return addresses are preserved across any sequence involving calls by using a macro definition as follows: |
| 54 | +``` |
| 55 | +#ifdef TLS_GD |
| 56 | + #define LOAD_TLS_G_R0 \ |
| 57 | + MOVD LR, R25 \ |
| 58 | + MOVD runtime·tls_g(SB), R0 \ |
| 59 | + MOVD R25, LR |
| 60 | +#else |
| 61 | + #define LOAD_TLS_G_R0 \ |
| 62 | + MOVD runtime·tls_g(SB), R0 |
| 63 | +#endif |
| 64 | +``` |
| 65 | +### Assembler Flag Additions and Instruction Lowering |
| 66 | +We introduce a `-tls=[IE,LE,GD]` flag in the asm tool. A new `MOVD` instruction variant, `C_TLS_GD`, is defined, which lowers to the following four-instruction sequence using a new `R_ARM64_TLS_GD` relocation type: |
| 67 | +``` |
| 68 | +ADRP var, R0 // Address of the GOT entry |
| 69 | +LDR [R0], R27 // Load stub from GOT |
| 70 | +ADD #0,R0, R0 // Argument to call |
| 71 | +BLR (R27) // Call, R0 returns offset from TP to variable |
| 72 | +``` |
| 73 | +The `C_TLS_GD` variant would be used for `TLSBSS` symbols only when a flag `-tls=GD` is passed to assembler. The default in `shared` mode still remains to be `C_TLS_IE`. |
| 74 | +### Linker Enhancements for New Relocation Support |
| 75 | +The linker will support the `R_ARM64_TLS_GD` relocation type, added by the assembler at the start of the sequence and relocated for specified TLS symbols using ELF relocations: |
| 76 | +``` |
| 77 | +ADRP var, R0 // R_AARCH64_TLSDESC_ADR_PAGE21 |
| 78 | +LDR [R0], R27 // R_AARCH64_TLSDESC_LD64_LO12_NC |
| 79 | +ADD #0,R0, R0 // R_AARCH64_TLSDESC_ADD_LO12_NC |
| 80 | +BLR (R27) // R_AARCH64_TLSDESC_CALL |
| 81 | +``` |
| 82 | +In PIE mode, while `TLS_IE` is optimized to `TLS_LE` (allowing internal linking), similar optimization for `TLS_GD` isn't supported as `-tls=GD` isn't passed to the assembler in this mode. |
| 83 | + |
0 commit comments