Skip to content

strange panic SIGSEGV inside lock_futex.go lock #32448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
skinass opened this issue Jun 5, 2019 · 6 comments
Closed

strange panic SIGSEGV inside lock_futex.go lock #32448

skinass opened this issue Jun 5, 2019 · 6 comments

Comments

@skinass
Copy link

skinass commented Jun 5, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 linux/amd64

Does this issue reproduce with the latest release?

i tried only 1.12.1

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/a.sulaev/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/a.sulaev/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build911936445=/tmp/go-build -gno-record-gcc-switches"

What did you do?

background:
i have a big backend app which is on golang v1.9
i switched golang version on 1.12.1 and deployed this app on production
after switching on 1.12.1 i started to catch very rare panics

whats in code:
i create new WithCancel context (using an old golang.org/x/net/context)
and then i make several goroutines to run mysql queries in parallel

What did you expect to see?

i expect no panic here like it was in 1.9

What did you see instead?

after switching to 1.12.1 the panic sometime happens inside golang stdlib
here is the stack trace of the running goroutine:

f490 1559662618:000 fatal error: unexpected signal during runtime execution
f490 1559662618:001 [signal SIGSEGV: segmentation violation code=0x2 addr=0x16b3378 pc=0x441dbf]
f490 1559662618:002
f490 1559662618:003 goroutine 324964 [running]:
f490 1559662618:004 runtime.throw(0x1937a2e, 0x2a)
f490 1559662618:005     /usr/lib/golang/src/runtime/panic.go:617 +0x72 fp=0xc016a1ecc8 sp=0xc016a1ec98 pc=0x464ed2
f490 1559662618:006 runtime.sigpanic()
f490 1559662618:007     /usr/lib/golang/src/runtime/signal_unix.go:374 +0x4a9 fp=0xc016a1ecf8 sp=0xc016a1ecc8 pc=0x47a539
f490 1559662618:008 runtime.lock(0x16b3378)
f490 1559662618:009     /usr/lib/golang/src/runtime/lock_futex.go:55 +0x4f fp=0xc016a1ed40 sp=0xc016a1ecf8 pc=0x441dbf
f490 1559662618:010 runtime.sellock(0xc016a1ef40, 0x3, 0x3, 0xc016a1eed2, 0x3, 0x3)
f490 1559662618:011     /usr/lib/golang/src/runtime/select.go:51 +0x76 fp=0xc016a1ed68 sp=0xc016a1ed40 pc=0x475856
f490 1559662618:012 runtime.selectgo(0xc016a1ef40, 0xc016a1eecc, 0x3, 0x0, 0xc0223bcf01)
f490 1559662618:013     /usr/lib/golang/src/runtime/select.go:205 +0x3a5 fp=0xc016a1ee90 sp=0xc016a1ed68 pc=0x475df5
f490 1559662618:014 github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc0223a9320, 0xc0224da900, 0xc026840e40)
f490 1559662618:015     /builddir/build/BUILD/mailapi/vendor/src/github.com/go-sql-driver/mysql/connection.go:625 +0x1a1 fp=0xc016a1efc8 sp=0xc016a1ee90 pc=0x863671
f490 1559662618:016 runtime.goexit()
f490 1559662618:017     /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc016a1efd0 sp=0xc016a1efc8 pc=0x494371
f490 1559662618:018 created by github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher
f490 1559662618:019     /builddir/build/BUILD/mailapi/vendor/src/github.com/go-sql-driver/mysql/connection.go:616 +0xbe

i cant reproduce this panic myself. it only panics on production with high rpm.
i took a look at golang stdlib and i dont understand how &c.lock might be nil in runtime.sellock() so it panics on lock_futex.go:55

@agnivade
Copy link
Contributor

agnivade commented Jun 6, 2019

A few questions-

  • Have you ruled out any possibility of having races in the code ? Does running the code in race-enabled mode throw any errors ?
  • Can you provide us with code to reproduce it ourselves ?

@skinass
Copy link
Author

skinass commented Jun 6, 2019

we found a way to reproduce this panics. the problem is in race condition around custom contexts and GC(i suppose).
i can fix it now by using mutex.
but i think there is still a problem with GC collecting memory, that is not free at the moment.

here is the code so you can decide if this problem is valuable
(more GOMAXPROCS == better reproduction):

package main

import (
	"context"
	"fmt"
	"os"
	"runtime"
	"time"
)

type Context interface {
	context.Context
	BaseContext() *Base
}

type Base struct {
	context context.Context
}

func (c *Base) Done() <-chan struct{} {
	return c.context.Done()
}

func (c *Base) BaseContext() *Base {
	return c
}

func (c *Base) Deadline() (deadline time.Time, ok bool) {
	return c.context.Deadline()
}

func (c *Base) Err() error {
	return c.context.Err()
}

func (c *Base) Value(key interface{}) interface{} {
	return c.context.Value(key)
}

func WithValue(parent Context, key, val interface{}) Context {
	baseCtx := parent.BaseContext()
	return &Base{
		context: context.WithValue(baseCtx.context, key, val),
	}
}

func WithCancelBase(parent Context) (Context, context.CancelFunc) {
	baseCtx := parent.BaseContext()
	newNetContext, cancel := context.WithCancel(baseCtx.context)

	ret := &Base{
		context: newNetContext,
	}

	return ret, cancel
}

type ACTX struct {
	Base
}

func (ctx *ACTX) SetValue(key, value interface{}) {
	ctx.Base = *(WithValue(ctx.BaseContext(), key, value).BaseContext())
}

func WithCancel(parent *ACTX) (*ACTX, context.CancelFunc) {
	baseCtx := parent.BaseContext()

	bwCancelCtx, cancel := WithCancelBase(baseCtx.BaseContext())
	return &ACTX{
		Base: *(bwCancelCtx.BaseContext()),
	}, cancel
}

func newACTX() *ACTX {
	return &ACTX{Base{context.Background()}}
}

func test(ctx *ACTX, key, value interface{}) {
	ctx.SetValue(key, value)
}

func main() {
	for i := 0; i < 1000000; i++ {
		ctx := newACTX()
		ctx, cancel := WithCancel(ctx)

		runtime.GOMAXPROCS(4)
		for k := 0; k < 10; k++ {
			go func() {
				for {
					select {
					default:
					case <-ctx.Done():
						return
					}
				}
			}()
		}
		runtime.Gosched()
		for j := 0; j < 50; j++ {
			test(ctx, j, j)
			x := ctx.Value(j)
			if intX, ok := x.(int); !ok || intX != j {
				fmt.Printf("Unexpected value: %v", x)
				os.Exit(0)
			}
		}
		cancel()
	}
}

@agnivade
Copy link
Contributor

agnivade commented Jun 6, 2019

Yes, I can fix this with a mutex, but I don't understand what is the GC issue here. Can you expand more on that ? What exactly are you seeing ? I believe that should be orthogonal to the race issue we have here.

@skinass
Copy link
Author

skinass commented Jun 6, 2019

ok, i tried with GOGC=off and it still panics. so the problem is only about data race?

@agnivade
Copy link
Contributor

agnivade commented Jun 6, 2019

Yes, if you have a race in your program, it will crash regardless of whether GC is enabled or not.

@skinass
Copy link
Author

skinass commented Jun 6, 2019

We know what's the problem, so i close the issue. Thank you.

@skinass skinass closed this as completed Jun 6, 2019
@golang golang locked and limited conversation to collaborators Jun 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants