Skip to content

Commit 123f7dd

Browse files
committed
runtime: zero upper bit of Y registers in asyncPreempt on darwin/amd64
Apparently, the signal handling code path in darwin kernel leaves the upper bits of Y registers in a dirty state, which causes many SSE operations (128-bit and narrower) become much slower. Clear the upper bits to get to a clean state. We do it at the entry of asyncPreempt, which is immediately following exiting from the kernel's signal handling code, if we actually injected a call. It does not cover other exits where we don't inject a call, e.g. failed preemption, profiling signal, or other async signals. But it does cover an important use case of async signals, preempting a tight numerical loop, which we introduced in this cycle. Running the benchmark in issue #37174: name old time/op new time/op delta Fast-8 90.0ns ± 1% 46.8ns ± 3% -47.97% (p=0.000 n=10+10) Slow-8 188ns ± 5% 49ns ± 1% -73.82% (p=0.000 n=10+9) There is no more slowdown due to preemption signals. For #37174. Change-Id: I8b83d083fade1cabbda09b4bc25ccbadafaf7605 Reviewed-on: https://go-review.googlesource.com/c/go/+/219131 Run-TryBot: Cherry Zhang <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent a0c9fb6 commit 123f7dd

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed

src/runtime/mkpreempt.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,15 @@ func genAMD64() {
244244

245245
// TODO: MXCSR register?
246246

247+
// Apparently, the signal handling code path in darwin kernel leaves
248+
// the upper bits of Y registers in a dirty state, which causes
249+
// many SSE operations (128-bit and narrower) become much slower.
250+
// Clear the upper bits to get to a clean state. See issue #37174.
251+
// It is safe here as Go code don't use the upper bits of Y registers.
252+
p("#ifdef GOOS_darwin")
253+
p("VZEROUPPER")
254+
p("#endif")
255+
247256
p("PUSHQ BP")
248257
p("MOVQ SP, BP")
249258
p("// Save flags before clobbering them")

src/runtime/preempt_amd64.s

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@
44
#include "textflag.h"
55

66
TEXT ·asyncPreempt(SB),NOSPLIT|NOFRAME,$0-0
7+
#ifdef GOOS_darwin
8+
VZEROUPPER
9+
#endif
710
PUSHQ BP
811
MOVQ SP, BP
912
// Save flags before clobbering them

0 commit comments

Comments
 (0)