Skip to content

runtime: debug.Stack() and runtime.Callers() PCs differ on panic trigger site #34123

Closed
@lggomez

Description

@lggomez

What version of Go are you using (go version)?

go1.12.4 linux/amd64

Does this issue reproduce with the latest release?

Can´t tell for the time being, the only repro we have is on a web app and we still don´t have docker images with go 1.13/1.12.7 on our end

What operating system and processor architecture are you using (go env)?

Linux e5d326139ac1 4.9.184-linuxkit #1 SMP Tue Jul 2 22:58:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build601602077=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Quoting my original issue on newrelic/go-agent#100:

I'm working on a custom gin recover middleware that prints the offending stacktrace whenever a panic goes uncatched. In this scenario, we noticed an application's code path in which the line number on the printed stacktrace is wrong, returning line 59 instead of line 61 (it took me a long time to realize that something was wrong with the stacktrace itself and not the reported line):

func (ls LocationsMutableService) getChildren(ctx context.Context, tree string, loc string) (LocationChildren, error) {
	url := ""
	var children LocationChildren

//line 59 - panic not possible here
	response, err := ls.Client.GET(url, &children)

//line 61 - actual place of panic (response == nil)
	if response.StatusCode == http.StatusNotFound { 
		return nil, errors.New("")
	}

	if err != nil {
		return nil, err
	}

	return children, err
}

We fixed this by writing a implementation based on debug.Stack() but, surprisingly, after deploying the fix to verify it I'm observing that the internal stacktrace used by txn.NoticeError has the same inaccuracy that our original middleware implementation had:

2019/08/30 14:33:09 /go/src/github.com/.../src/api/vendor/github.com/.../gingonic/mlhandlers/nr_stack.go 19 mlhandlers.GetStackTrace
/go/src/github.com/.../src/api/vendor/github.com/.../gingonic/mlhandlers/recovery.go 58 mlhandlers.RecoveryWithWriter.func1.1
/usr/local/go/src/runtime/panic.go 522 runtime.gopanic
/usr/local/go/src/runtime/panic.go 82 runtime.panicmem
/usr/local/go/src/runtime/signal_unix.go 390 runtime.sigpanic
/go/src/github.com/.../src/api/gateways/locations_mutable_service.go 59 gateways.LocationsMutableService.getChildren
/go/src/github.com/.../src/api/gateways/locations_mutable_service.go 51 gateways.LocationsMutableService.GetChildren.func1

After some research the main culprit seems to be a wrong PC being used when retrieving the stack frames, but I'm still not certain about the concrete cause. I changed the stacktrace generator code that manually generated the frames via newrelic/go-agent#101 but I keep noticing the same issue, so I guess it boils down to a difference between how runtime.Stack and runtime.Callers manipulate the systemstack

The scenario I've been showing is on a specific branch of a web app we use at work (thus being unable to share it here), and unfortunately I've been trying to create a local repro case to no avail

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.release-blocker

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions