-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
What version of Go are you using (go version
)?
$ go version go version go1.21.3 linux/arm64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE='' GOARCH='arm64' GOBIN='' GOCACHE='/home/ec2-user/.cache/go-build' GOENV='/home/ec2-user/.config/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='arm64' GOHOSTOS='linux' GOINSECURE='' GOMODCACHE='/home/ec2-user/go/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='linux' GOPATH='/home/ec2-user/go' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/home/ec2-user/sdk/go1.21.3' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/home/ec2-user/sdk/go1.21.3/pkg/tool/linux_arm64' GOVCS='' GOVERSION='go1.21.3' GCCGO='gccgo' AR='ar' CC='gcc' CXX='g++' CGO_ENABLED='1' GOMOD='/home/ec2-user/fpcrash/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1020512040=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Used the runtime execution tracer with frame pointer unwinding enabled.
What did you expect to see?
No crashes.
What did you see instead?
Crashes when recording an event for async preemption. Example:
SIGSEGV: segmentation violation
PC=0x471c64 m=38 sigcode=1
goroutine 0 [idle]:
runtime.fpTracebackPCs(...)
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:1018
runtime.traceStackID(0x57c27cf78c4c3?, {0xfffe8dfd0018, 0x57c27ce950808?, 0x80}, 0x1?)
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:991 +0x224 fp=0xffff10bcb4f0 sp=0xffff10bcb4a0 pc=0x471c64
runtime.traceEventLocked(0xffff10bcb5e8?, 0x4739e8?, 0x10bcb5c8?, 0x40000c65d8, 0x12, 0x0, 0x1, {0x0, 0x0, 0x0?})
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:834 +0x240 fp=0xffff10bcb570 sp=0xffff10bcb4f0 pc=0x4712d0
runtime.traceEvent(0x0?, 0x1, {0x0, 0x0, 0x0})
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:770 +0x90 fp=0xffff10bcb5e0 sp=0xffff10bcb570 pc=0x471030
runtime.traceGoPreempt(...)
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/trace.go:1609
runtime.gopreempt_m(0x100ab9d6?)
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/proc.go:3786 +0x50 fp=0xffff10bcb620 sp=0xffff10bcb5e0 pc=0x457270
traceback: unexpected SPWRITE function runtime.mcall // <--- I don't believe this is actually related to the issue?
runtime.mcall()
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/asm_arm64.s:192 +0x54 fp=0xffff10bcb630 sp=0xffff10bcb620 pc=0x482224
goroutine 549321 [running]:
runtime.asyncPreempt2()
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/preempt.go:307 +0x3c fp=0x4004013a40 sp=0x4004013a20 pc=0x44e26c
runtime.asyncPreempt()
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/preempt_arm64.s:47 +0x9c fp=0x4004013c30 sp=0x4004013a40 pc=0x4853ec
golang.org/x/net/http2.(*PingFrame).Header(0x4004013c78?)
<autogenerated>:1 +0x54 fp=0x4004013c40 sp=0x4004013c40 pc=0x9ab404
golang.org/x/net/http2.(*Framer).checkFrameOrder(0x40d7cf0d20, {0x31947f0?, 0x4128ebc498?})
/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:547 +0x74 fp=0x4004013d60 sp=0x4004013c40 pc=0x97f854
golang.org/x/net/http2.(*Framer).ReadFrame(0x40d7cf0d20)
/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:516 +0x258 fp=0x4004013e10 sp=0x4004013d60 pc=0x97f5f8
google.golang.org/grpc/internal/transport.(*http2Client).reader(0x4003b71d40, 0x40d95567a8?)
/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:1594 +0x1b8 fp=0x4004013fb0 sp=0x4004013e10 pc=0x9cb6f8
google.golang.org/grpc/internal/transport.newHTTP2Client.func11()
/go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:397 +0x2c fp=0x4004013fd0 sp=0x4004013fb0 pc=0x9c231c
runtime.goexit()
/root/.gimme/versions/go1.21.3.linux.arm64/src/runtime/asm_arm64.s:1197 +0x4 fp=0x4004013fd0 sp=0x4004013fd0 pc=0x484834
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 549519
This happens infrequently (observed ~20 times over the course of a week, across a significant number of processes). I believe I have identified the cause. Essentially:
Code is generated for a method like golang.org/x/net/http2.(FrameHeader).Header (example generated code here) which does this:
- Decrement SP to allocate a stack frame
- Save X29 (the frame pointer register) into the frame and update X29 to point to where the previous one is saved
- Do the function body
- Increment SP to free the stack frame
- Restore X29 and return
If the async preemption signal is delivered after 4, but before 5 (a single instruction) then the async preemption signal handler will allocate a new stack frame, overwriting where the interrupted function's stack frame was. It will save all of the registers into that frame. But X29 will point somewhere in the old, overwritten frame, meaning it will point to an address containing the random contents of a register saved on the stack. In particular, it'll point to where the floating point registers are saved (ref).
I can reproduce this with the test program here: https://gist.github.com/nsrip-dd/d85ff0d05d2afa6ca0c12796e992ea91
I'm not sure what the right fix here is. Make sure the frame pointer is restored before incrementing the stack pointer at the end of the function, perhaps?