|
| 1 | +# Proposal: Go general dynamic TLS |
| 2 | + |
| 3 | +Author: Alexander Musman (Advanced Software Technology Lab, Huawei) |
| 4 | + |
| 5 | +Last updated: 2025-01-28 |
| 6 | + |
| 7 | +Discussion at [golang.org/issue/71953](https://github.com/golang/go/issues/71953). |
| 8 | + |
| 9 | +## Abstract |
| 10 | + |
| 11 | +The Go runtime currently relies on Thread Local Storage (TLS) to preserve |
| 12 | +goroutine state when interacting with C code, |
| 13 | +but lacks support for the |
| 14 | +general dynamic [TLS model](https://uclibc.org/docs/tls.pdf). |
| 15 | +This limitation hinders the use of certain C libraries, |
| 16 | +such as Musl, |
| 17 | +and restricts loading of Go shared libraries without `LD_PRELOAD`. |
| 18 | +We propose extending the Go assembler and linker to support |
| 19 | +the general dynamic TLS model, |
| 20 | +focusing initially on the Arm64 architecture |
| 21 | +on Linux systems. |
| 22 | +This enhancement will enable seamless interoperability with |
| 23 | +a wider range of C libraries |
| 24 | +and improve the flexibility of deploying Go `c-shared` libraries. |
| 25 | + |
| 26 | +## Background |
| 27 | + |
| 28 | +The current Go runtime leverages a Thread Local Storage (TLS) variable |
| 29 | +for preserving the current goroutine (`g`) |
| 30 | +when interacting with C code. |
| 31 | +This is particularly relevant in scenarios such as |
| 32 | +CGO interactions |
| 33 | +and certain runtime functions like |
| 34 | +race detection, |
| 35 | +where the code switches to C. |
| 36 | +To facilitate this, |
| 37 | +Go uses the `runtime.save_g` function |
| 38 | +to store the goroutine in the `runtime·tls_g` TLSBSS variable. |
| 39 | +The `runtime.load_g` function then retrieves it, |
| 40 | +typically upon returning from C code execution. |
| 41 | +The Go assembler and linker currently support two TLS access models: |
| 42 | +_initial exec_ |
| 43 | +and _local exec_. |
| 44 | +The _local exec_ model is predominantly utilized, |
| 45 | +especially in build modes like `exe`, |
| 46 | +and is natively supported by the Go linker. |
| 47 | +Conversely, the _initial exec_ model requires external linkers |
| 48 | +like `bfd-ld`, `lld`, or `gold` |
| 49 | +for support. |
| 50 | +While the absence of a dynamic TLS model is generally benign with |
| 51 | +GlibC— |
| 52 | +owing to its adaptable TLS allocation scheme— |
| 53 | +this shortcoming becomes problematic with the Musl C library. |
| 54 | +Musl's more rigid TLS allocation exposes this limitation, |
| 55 | +as highlighted in issue |
| 56 | +[golang.org/issue/54805](https://github.com/golang/go/issues/54805). |
| 57 | + |
| 58 | +## Proposal |
| 59 | + |
| 60 | +Introduce general dynamic TLS (Thread Local Storage) support in the Go |
| 61 | +assembler/linker, |
| 62 | +and update the runtime assembly— |
| 63 | +currently the sole user of TLS variables— |
| 64 | +to accommodate this model. |
| 65 | +Activate this feature in the assembler |
| 66 | +with the explicit option `-tls=GD`, |
| 67 | +while keeping `-tls=IE` as the default for `shared` mode. |
| 68 | +Additionally, |
| 69 | +pass `-D=TLS_GD` to enable architecture-specific |
| 70 | +macro expansion in the runtime's assembly |
| 71 | +when the general dynamic model is employed. |
| 72 | +The linker support will depend on external linking, |
| 73 | +consistent with the existing initial exec TLS approach. |
| 74 | + |
| 75 | +The `cmd/go` command will enable the general dynamic TLS model by default |
| 76 | +in scenarios that require it, |
| 77 | +based on the combination of `GOOS`/`GOARCH` |
| 78 | +and `buildmode`. |
| 79 | +Initially, |
| 80 | +this model will be supported by the Arm64 architecture on Linux systems, |
| 81 | +specifically for `buildmode=c-shared` and `buildmode=c-archive`. |
| 82 | + |
| 83 | +## Rationale |
| 84 | + |
| 85 | +To enable loading a Go `c-shared` module without relying on `LD_PRELOAD`, |
| 86 | +it is essential to support the _general dynamic_ model. |
| 87 | +Since the variable resides within the same runtime package as its users, |
| 88 | +any relaxation of a _global dynamic_ variable reference to _local dynamic_ |
| 89 | +is automatically identified and executed by the external linker. |
| 90 | +While one could avoid using the `-D` flag by generating the save/restore |
| 91 | +of the return address directly in the assembler |
| 92 | +(when lowering MOV instruction), |
| 93 | +this approach seems less convenient. |
| 94 | +It does not explicitly show the clobbered register in the assembly code. |
| 95 | +Another consideration would be to modify the runtime functions |
| 96 | +that interact with TLS variables to have a stack frame. |
| 97 | +However, |
| 98 | +this option is not ideal |
| 99 | +because these functions are sometimes executed in performance-critical paths, |
| 100 | +such as during race detection. |
| 101 | + |
| 102 | +## Compatibility |
| 103 | + |
| 104 | +There is no change in exported APIs. |
| 105 | +The build modes affected are `c-shared` and `c-archive`. |
| 106 | +Archives built with `c-archive` may be used in a `c-shared` library, |
| 107 | +which in turn might be loaded without `LD_PRELOAD`. |
| 108 | +The assembler needs to support a new flag `-tls=`, |
| 109 | +which allows to choose TLS model explicitly. |
| 110 | +This flag will be passed by `cmd/go` and will also be useful |
| 111 | +for testing the TLS lowering. |
| 112 | +A new relocation type `R_ARM64_TLS_GD` would be needed in objabi, |
| 113 | +along with potentially other architecture-specific relocation types. |
| 114 | + |
| 115 | +## Implementation |
| 116 | + |
| 117 | +A prototype of the implementation, is done and tested |
| 118 | +with Musl C |
| 119 | +for arm64 |
| 120 | +Linux |
| 121 | +(please see [review 644975](https://go-review.googlesource.com/c/go/+/644975)). |
| 122 | + |
| 123 | +### Changes to `cmd/go` for Supported Platforms |
| 124 | +For compatible GOOS/GOARCH combinations and applicable build modes, |
| 125 | +the following flags are passed to the assembler: |
| 126 | +``` |
| 127 | +-tls=GD -D=TLS_GD |
| 128 | +``` |
| 129 | +These flags allow conditional use of a register to retain |
| 130 | +the return address across calls, |
| 131 | +as detailed below for arm64. |
| 132 | + |
| 133 | +### Modifications in the Runtime for arm64 Assembly |
| 134 | +In assembly code, |
| 135 | +specifically for arm64, |
| 136 | +we propose updating references to thread-local variable |
| 137 | +in `runtime·save_g`/`runtime·load_g`: |
| 138 | +``` |
| 139 | +LOAD_TLS_G_R0 ; get the offset of tls_g from the thread pointer |
| 140 | +MRS TPIDR_EL0, R27 ; get the thread pointer into R27 |
| 141 | +MOVD g, (R0)(R27) ; use the address in R0+R27 |
| 142 | +``` |
| 143 | +The TLS usage occurs in frameless functions, |
| 144 | +so we ensure return addresses are preserved across any sequence |
| 145 | +involving calls by |
| 146 | +using a macro definition as follows: |
| 147 | +``` |
| 148 | +#ifdef TLS_GD |
| 149 | + #define LOAD_TLS_G_R0 \ |
| 150 | + MOVD LR, R25 \ |
| 151 | + MOVD runtime·tls_g(SB), R0 \ |
| 152 | + MOVD R25, LR |
| 153 | +#else |
| 154 | + #define LOAD_TLS_G_R0 \ |
| 155 | + MOVD runtime·tls_g(SB), R0 |
| 156 | +#endif |
| 157 | +``` |
| 158 | + |
| 159 | +### Assembler Flag Additions and Instruction Lowering |
| 160 | +We introduce a `-tls=[IE,LE,GD]` flag in the asm tool. |
| 161 | +A new `MOVD` instruction variant, `C_TLS_GD`, is defined, |
| 162 | +which lowers to the following four-instruction sequence |
| 163 | +using a new `R_ARM64_TLS_GD` relocation type: |
| 164 | +``` |
| 165 | +ADRP var, R0 // Address of the GOT entry |
| 166 | +LDR [R0], R27 // Load stub from GOT |
| 167 | +ADD #0,R0, R0 // Argument to call |
| 168 | +BLR (R27) // Call, R0 returns offset from TP to variable |
| 169 | +``` |
| 170 | +The `C_TLS_GD` variant would be used for `TLSBSS` symbols |
| 171 | +only when a flag `-tls=GD` is passed to assembler. |
| 172 | +The default in `shared` mode still remains to be `C_TLS_IE`. |
| 173 | + |
| 174 | +### Linker Enhancements for New Relocation Support |
| 175 | +The linker will support the `R_ARM64_TLS_GD` relocation type, |
| 176 | +added by the assembler |
| 177 | +at the start of the sequence |
| 178 | +and relocated for specified TLS symbols |
| 179 | +using ELF relocations: |
| 180 | +``` |
| 181 | +ADRP var, R0 // R_AARCH64_TLSDESC_ADR_PAGE21 |
| 182 | +LDR [R0], R27 // R_AARCH64_TLSDESC_LD64_LO12_NC |
| 183 | +ADD #0,R0, R0 // R_AARCH64_TLSDESC_ADD_LO12_NC |
| 184 | +BLR (R27) // R_AARCH64_TLSDESC_CALL |
| 185 | +``` |
| 186 | +In PIE mode, while `TLS_IE` is optimized to `TLS_LE` |
| 187 | +(allowing internal linking), |
| 188 | +similar optimization for `TLS_GD` isn't supported |
| 189 | +as `-tls=GD` isn't passed to the assembler in this mode. |
| 190 | + |
0 commit comments