internal/trace/v2: TestTraceCgoCallback failures #64060

gopherbot · 2023-11-10T16:35:41Z

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestTraceCgoCallback (0.00s)
    --- FAIL: TestTraceCgoCallback/Default (0.00s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

— watchflakes

The text was updated successfully, but these errors were encountered:

gopherbot · 2023-11-10T16:35:45Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-10 15:49 linux-amd64-nocgo go@43ffe2a8 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.00s)
    --- FAIL: TestTraceCgoCallback/Default (0.00s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

2023-11-10 15:50 linux-amd64-nocgo go@9d914015 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.00s)
    --- FAIL: TestTraceCgoCallback/Default (0.00s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

2023-11-10 15:51 linux-amd64-nocgo go@3b303fa9 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.00s)
    --- FAIL: TestTraceCgoCallback/Default (0.00s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

— watchflakes

mknyszek · 2023-11-10T16:48:48Z

This is fixed as of abf8422.

gopherbot · 2023-11-10T16:56:29Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-10 15:49 linux-ppc64-sid-buildlet go@43ffe2a8 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.04s)
    --- FAIL: TestTraceCgoCallback/Default (0.04s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

2023-11-10 15:50 darwin-amd64-nocgo go@9d914015 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.05s)
    --- FAIL: TestTraceCgoCallback/Default (0.05s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

2023-11-10 15:50 linux-ppc64-sid-buildlet go@9d914015 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.05s)
    --- FAIL: TestTraceCgoCallback/Default (0.05s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

2023-11-10 15:51 darwin-amd64-nocgo go@3b303fa9 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.05s)
    --- FAIL: TestTraceCgoCallback/Default (0.04s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

2023-11-10 15:51 linux-ppc64-sid-buildlet go@3b303fa9 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.03s)
    --- FAIL: TestTraceCgoCallback/Default (0.03s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

— watchflakes

gopherbot · 2023-11-11T07:26:23Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-10 15:49 darwin-amd64-nocgo go@43ffe2a8 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.06s)
    --- FAIL: TestTraceCgoCallback/Default (0.06s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

— watchflakes

gopherbot · 2023-11-11T08:49:13Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-10 15:51 openbsd-riscv64-jsing go@3b303fa9 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.05s)
    --- FAIL: TestTraceCgoCallback/Default (0.05s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

— watchflakes

gopherbot · 2023-11-11T11:14:13Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-10 15:50 openbsd-riscv64-jsing go@9d914015 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.05s)
    --- FAIL: TestTraceCgoCallback/Default (0.05s)
        trace_test.go:548: stderr: go: no Go source files
        trace_test.go:550: exit status 1

— watchflakes

gopherbot · 2023-11-17T21:45:36Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-17 20:40 dragonfly-amd64-622 go@3ff5632d internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.92s)
    --- FAIL: TestTraceCgoCallback/Default (0.92s)
        reader_test.go:65: unexpected error while reading the trace: broken trace: failed to advance: frontier:
            [gen=1]

            M 24 [ProcStart time=318751161344 p=5 p_seq=12]
            M 5 [GoSyscallEndBlocked time=318751246208]
            M 11 [ProcStart time=318751358144 p=2 p_seq=29]
            M 10 [ProcStart time=318751388992 p=3 p_seq=26]
            M 12 [ProcStart time=318751464256 p=4 p_seq=9]
...
            String id=89
            	data="internal/poll.(*FD).Write"
            String id=90
            	data="os.(*File).write"
            String id=91
            	data="/tmp/workdir/go/src/os/file_posix.go"
            String id=92
            	data="os.(*File).Write"
            String id=93
            	data="/tmp/workdir/go/src/os/file.go"

— watchflakes

gopherbot · 2023-11-17T23:50:09Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-17 23:15 dragonfly-amd64-622 go@f67b2d8f internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (1.50s)
    --- FAIL: TestTraceCgoCallback/Default (1.49s)
        reader_test.go:65: unexpected error while reading the trace: expected no proc but had one
        trace_test.go:569: found bad trace; dumping to test log...
        trace_test.go:570: Trace Go1.22
            EventBatch gen=1 m=8 time=7013092587 size=140
            GoCreateSyscall dt=100 new_g=34
            ProcStart dt=68 p=6 p_seq=1
            GoSyscallEndBlocked dt=22
            GoStart dt=16 g=34 g_seq=1
...
            String id=86
            	data="runtime.(*scavengerState).park"
            String id=87
            	data="/tmp/workdir/go/src/runtime/mgcscavenge.go"
            String id=88
            	data="runtime.bgscavenge"
            String id=89
            	data="runtime.(*traceAdvancerState).start"
            String id=90
            	data="main._Cfunc_foo"

— watchflakes

gopherbot · 2023-11-21T16:49:33Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-21 16:20 dragonfly-amd64-622 go@8be8bfea internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (1.53s)
    --- FAIL: TestTraceCgoCallback/Default (1.52s)
        reader_test.go:65: unexpected error while reading the trace: expected no proc but had one
        trace_test.go:569: found bad trace; dumping to test log...
        trace_test.go:570: Trace Go1.22
            EventBatch gen=1 m=13 time=6544938370 size=121
            GoCreateSyscall dt=107 new_g=35
            ProcStart dt=64 p=7 p_seq=1
            GoSyscallEndBlocked dt=23
            GoStart dt=16 g=35 g_seq=1
...
            String id=86
            	data="runtime.traceStartReadCPU"
            String id=87
            	data="runtime.(*scavengerState).park"
            String id=88
            	data="/tmp/workdir/go/src/runtime/mgcscavenge.go"
            String id=89
            	data="runtime.bgscavenge"
            String id=90
            	data="runtime.traceLocker.Gomaxprocs"

— watchflakes

gopherbot · 2023-11-21T22:23:02Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-21 21:29 dragonfly-amd64-622 go@4e3ac99a internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (1.50s)
    --- FAIL: TestTraceCgoCallback/Default (1.49s)
        reader_test.go:65: unexpected error while reading the trace: expected no proc but had one
        trace_test.go:569: found bad trace; dumping to test log...
        trace_test.go:570: Trace Go1.22
            EventBatch gen=1 m=25 time=7125512681 size=78
            GoCreateSyscall dt=85 new_g=72
            ProcStart dt=62 p=5 p_seq=12
            GoSyscallEndBlocked dt=16
            GoStart dt=12 g=72 g_seq=1
...
            String id=84
            	data="runtime.gopark"
            String id=85
            	data="runtime.(*traceAdvancerState).start.func1"
            String id=86
            	data="runtime.startTheWorld"
            String id=87
            	data="main._Cfunc_foo"
            String id=88
            	data="runtime.traceLocker.Gomaxprocs"

— watchflakes

mknyszek · 2023-11-21T22:29:35Z

Some of these issues may be resolved by https://go.dev/cl/544215.

mknyszek · 2023-11-22T00:06:33Z

The staticlockranking builder found an issue that I think might explain a lot of these failures. See https://go.dev/cl/544396.

EDIT: It wasn't a real issue.

gopherbot · 2023-11-25T16:33:02Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-22 02:20 ios-arm64-corellium go@5f7a4085 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (1.88s)
    --- FAIL: TestTraceCgoCallback/Default (1.86s)
        reader_test.go:65: unexpected error while reading the trace: expected no proc but had one
        trace_test.go:570: found bad trace; dumping to test log...
        trace_test.go:581: Trace Go1.22
            EventBatch gen=1 m=6105411584 time=21723120546 size=68
            GoCreateSyscall dt=130 new_g=38
            GoSyscallEndBlocked dt=14
            ProcStart dt=6276 p=1 p_seq=28
            GoStart dt=10 g=38 g_seq=1
...
            String id=84
            	data="os.(*File).Write"
            String id=85
            	data="/tmp/workdir-host-ios-arm64-corellium-ios/go/src/os/file.go"
            String id=86
            	data="runtime.gopark"
            String id=87
            	data="runtime.chansend1"
            String id=88
            	data="runtime.(*traceAdvancerState).start"

— watchflakes

gopherbot · 2023-11-28T17:14:53Z

Change https://go.dev/cl/545515 mentions this issue: internal/trace/v2: tolerate having a P in GoCreateSyscall

mknyszek · 2023-11-28T20:10:52Z

https://go.dev/cl/545515 fixes all the "expected no proc but had one" issues.

The only one that doesn't fit the pattern is:

2023-11-17 20:40 dragonfly-amd64-622 go@3ff5632d internal/trace/v2.TestTraceCgoCallback (log)

But I'm fairly certain that's fixed by https://go.dev/cl/544215. I will try and confirm.

mknyszek · 2023-11-28T20:20:49Z

Huh. Actually, that dragonfly failure has something really weird going on. It kind of looks like there's more than one active thread with the same procid, which is typically the system thread ID. Is it not unique on that platform?

EDIT: It's not that. The ID is just reused. I see the problem.

On non-pthread platforms, it's totally possible for the same M to GoCreateSyscall/GoDestroySyscall on the same thread multiple times. That same thread may hold onto its P through all those calls. For #64060. Change-Id: Ib968bfd439ecd5bc24fc98d78c06145b0d4b7802 Reviewed-on: https://go-review.googlesource.com/c/go/+/545515 Reviewed-by: Michael Pratt <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>

gopherbot · 2023-11-29T18:23:09Z

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "internal/trace/v2" && test == "TestTraceCgoCallback"

2023-11-29 16:01 dragonfly-amd64-622 go@b9a4eaa6 internal/trace/v2.TestTraceCgoCallback (log)

--- FAIL: TestTraceCgoCallback (0.67s)
    --- FAIL: TestTraceCgoCallback/Default (0.67s)
        reader_test.go:65: unexpected error while reading the trace: broken trace: failed to advance: frontier:
            [gen=1]

            M 17 [ProcStart time=329731484928 p=3 p_seq=17]
            M 14 [ProcStart time=329731592512 p=0 p_seq=16]
            M 21 [ProcStart time=329731702144 p=2 p_seq=12]
            M 12 [ProcStart time=329731794816 p=5 p_seq=20]
            M 13 [ProcStart time=329731878464 p=3 p_seq=19]
...
            String id=84
            	data="os.(*File).write"
            String id=85
            	data="/tmp/workdir/go/src/os/file_posix.go"
            String id=86
            	data="os.(*File).Write"
            String id=87
            	data="/tmp/workdir/go/src/os/file.go"
            String id=88
            	data="main._Cfunc_foo"

— watchflakes

gopherbot · 2023-11-29T22:46:21Z

Change https://go.dev/cl/546096 mentions this issue: internal/trace/v2: handle thread ID reuse correctly

To begin with, CL 545515 made the trace parser tolerant of GoCreateSyscall having a P, but that was wrong. Because dropm trashes the M's syscalltick, that case should never be possible. So the first thing this change does is it rewrites the test that CL introduced to expect a failure instead of a success. What I'd misinterpreted as a case that should be allowed was actually the same as the other issues causing golang#64060, which is that the parser doesn't correctly implement what happens to Ps when a thread calls back into Go on non-pthread platforms, and what happens when a thread dies on pthread platorms (or more succinctly, what the runtime does when it calls dropm). Specifically, the GoDestroySyscall event implies that if any P is still running on that M when it's called, that the P stops running. This is what is intended by the runtime trashing the M's syscalltick; when it calls back into Go, the tracer models that thread as obtaining a new P from scratch. Handling this incorrectly manifests in one of two ways. On pthread platforms, GoDestroySyscall is only emitted when a C thread that previously called into Go is destroyed. However, that thread ID can be reused. Because we have no thread events, whether it's the same thread or not is totally ambiguous to the tracer. Therefore, the tracer may observe a thread that previously died try to start running with a new P under the same identity. The association to the old P is still intact because the ID is the same, and the tracer gets confused -- it appears as if two Ps are running on the same M! On non-pthread platforms, GoDestroySyscall is emitted on every return to C from Go code. In this case, the same thread with the same identity is naturally going to keep calling back into Go. But again, since the runtime trashes syscalltick in dropm, it's always going to acquire a P from the tracer's perspective. But if this is a different P than before, just like the pthread case, the parser is going to get confused, since it looks like two Ps are running on the same M! The case that CL 545515 actually handled was actually the non-pthread case, specifically where the same P is reacquired by an M calling back into Go. In this case, if we tolerate having a P, then what we'll observe is the M stealing its own P from itself, then running with it. Now that we know what the problem is, how do we fix it? This change addresses the problem by emitting an extra event when encountering a GoDestroySyscall with an active P in its context. In this case, it emits an additional ProcSteal event to steal from itself, indicating that the P stopped running. This removes any association between that M and that P, resolving any ambiguity in the tracer. There's one other minor detail that needs to be worked out, and that's what happens to any *real* ProcSteal event that stole the P we're now emitting an extra ProcSteal event for. Since, this event is going to look for an M that may have moved on already and the P at this point is already idle. Luckily, we have *exactly* the right fix for this. The handler for GoDestroySyscall now moves any active P it has to the ProcSyscallAbandoned state, indicating that we've lost information about the P and that it should be treated as already idle. Conceptually this all makes sense: this is a P in _Psyscall that has been abandoned by the M it was previously bound to. It's unfortunate how complicated this has all ended up being, but we can take a closer look at that in the future. Fixes golang#64060. Change-Id: Ie9e6eb9cf738607617446e3487392643656069a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/546096 Reviewed-by: Michael Pratt <[email protected]> TryBot-Result: Gopher Robot <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Run-TryBot: Michael Knyszek <[email protected]> Auto-Submit: Michael Knyszek <[email protected]>

gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 10, 2023

gopherbot added this to Test Flakes Nov 10, 2023

mknyszek closed this as completed Nov 10, 2023

github-project-automation bot moved this to Done in Test Flakes Nov 10, 2023

gopherbot reopened this Nov 17, 2023

mknyszek self-assigned this Nov 17, 2023

mknyszek added this to the Go1.22 milestone Nov 17, 2023

mknyszek added this to Go Compiler / Runtime Nov 17, 2023

mknyszek added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 17, 2023

mknyszek moved this to In Progress in Go Compiler / Runtime Nov 17, 2023

gopherbot closed this as completed in 6d7b3c8 Dec 5, 2023

github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime Dec 5, 2023

mknyszek removed this from Go Compiler / Runtime Feb 28, 2024

gabyhelp mentioned this issue Jun 19, 2024

internal/trace/v2: TestTraceGCStress/AllocFree failures #68077

Open

golang locked and limited conversation to collaborators Dec 4, 2024

gopherbot added the FrozenDueToAge label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/trace/v2: TestTraceCgoCallback failures #64060

internal/trace/v2: TestTraceCgoCallback failures #64060

gopherbot commented Nov 10, 2023

gopherbot commented Nov 10, 2023

mknyszek commented Nov 10, 2023

gopherbot commented Nov 10, 2023

gopherbot commented Nov 11, 2023

gopherbot commented Nov 11, 2023

gopherbot commented Nov 11, 2023

gopherbot commented Nov 17, 2023

gopherbot commented Nov 17, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Nov 21, 2023

mknyszek commented Nov 21, 2023

mknyszek commented Nov 22, 2023 •

edited

Loading

gopherbot commented Nov 25, 2023

gopherbot commented Nov 28, 2023

mknyszek commented Nov 28, 2023

mknyszek commented Nov 28, 2023 •

edited

Loading

gopherbot commented Nov 29, 2023

gopherbot commented Nov 29, 2023

internal/trace/v2: TestTraceCgoCallback failures #64060

internal/trace/v2: TestTraceCgoCallback failures #64060

Comments

gopherbot commented Nov 10, 2023

gopherbot commented Nov 10, 2023

mknyszek commented Nov 10, 2023

gopherbot commented Nov 10, 2023

gopherbot commented Nov 11, 2023

gopherbot commented Nov 11, 2023

gopherbot commented Nov 11, 2023

gopherbot commented Nov 17, 2023

gopherbot commented Nov 17, 2023

gopherbot commented Nov 21, 2023

gopherbot commented Nov 21, 2023

mknyszek commented Nov 21, 2023

mknyszek commented Nov 22, 2023 • edited Loading

gopherbot commented Nov 25, 2023

gopherbot commented Nov 28, 2023

mknyszek commented Nov 28, 2023

mknyszek commented Nov 28, 2023 • edited Loading

gopherbot commented Nov 29, 2023

gopherbot commented Nov 29, 2023

mknyszek commented Nov 22, 2023 •

edited

Loading

mknyszek commented Nov 28, 2023 •

edited

Loading