Skip to content

context: cancelCtx exclusive lock causes extreme contention #42564

@niaow

Description

@niaow

What version of Go are you using (go version)?

$ go version
go version go1.15.3 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/niaow/.cache/go-build"
GOENV="/home/niaow/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/niaow/go/pkg/mod"
GONOPROXY="github.com/molecula"
GONOSUMDB="github.com/molecula"
GOOS="linux"
GOPATH="/home/niaow/go"
GOPRIVATE="github.com/molecula"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build497931594=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Passed a single context with cancellation to a bunch of goroutines.
These goroutines had a cold-path compute task, interlaced with calls to context.Err() to detect cancellation.
The loop looks something like:

var out []Thing
for iterator.Next() {
	if err := ctx.Err(); err != nil {
		// caller doesn't need a result anymore.
		return nil, err
	}

	// Fetch thing from iterator, apply some filtering functions, and append it to out.
}

What did you expect to see?

A bit of a slowdown from the context check maybe?

What did you see instead?

Slightly over 50% of the CPU time was spent in runtime.findrunnable. The cancelContext struct uses a sync.Mutex, and due to extreme lock contention (64 CPU threads spamming it), this was triggering lockSlow. From poking at pprof, it appears that about 86% of CPU time was spent in functions related to this lock acquire.

I was able to work around this by adding a counter and checking it less frequently. However, I do not think that this is an intended performance degradation path. Theoretically this could be made more efficient with sync/atomic, although I think a sync.RWMutex would still be more than sufficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions