Skip to content

time: Timer reset broken under heavy use since go1.16 timer optimizations added #47329

Closed
@andrewvc

Description

@andrewvc

What version of Go are you using (go version)?

Tracked the specific issue to commit b4b0144 via git bisect

$ go version
go version devel go1.17-d568e6e075 Tue Jul 20 19:54:36 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
andrewvc@LAPTOP-80O11FM2 ~/p/b/heartbeat (fix-timer-failure)> go env
warning: GOPATH set to GOROOT (/home/andrewvc/projects/go) has no effect
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/andrewvc/.cache/go-build"
GOENV="/home/andrewvc/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/andrewvc/projects/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/andrewvc/projects/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/andrewvc/projects/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/andrewvc/projects/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/andrewvc/projects/beats/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build674387485=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Longstanding unit tests (~1.5 years old) started sporadically failing against go 1.16.x

Our program (Heartbeat) rapidly stops and resets a single timer based on user submitted jobs, effectively testing the golang timer. Further investigation revealed that timer.Reset was no longer resetting the timer consistently. Every 10-40k iterations or so it would have no effect, resulting in a non-triggering timer, and in our case, a deadlocked program.

We tracked the specific issue to a change introduced in golang commit b4b0144 via git bisect

The failure can be reproduced by running from the special branch below, which contains an enhanced test suite for Heartbeat using a watchdog timer to catch the failed timer.

# Examples use a zip download to prevent a full repo clone
curl -L https://github.com/andrewvc/beats/archive/refs/heads/broken-timer.zip -o broken-timer.zip
unzip -q broken-timer.zip
cd beats-broken-timer/heartbeat
go test -timeout 30s -run '^TestStress$' github.com/elastic/beats/v7/heartbeat/scheduler/timerqueue

We are now avoiding Reset in favor of NewTimer in a workaround PR elastic/beats#27006 . You can validate this by deleting the Reset call here and replacing it with the NewTimer call here

Digging into the golang source code I discovered that I could fix the issue by commenting out the optimization on this line inside adjusttimers . It seems that the accounting of that variable may have an issue somewhere. The code is quite tricky, heavily concurrent, etc, and could use the eye of someone familiar with it.

What did you expect to see?

I expected the timer to fire consistently when reset.

What did you see instead?

Nothing, after enhancing the test suite for heartbeat to dump traces it was apparent that the program was in an idle state, with no timer scheduled, and no other code blocked.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions