Skip to content

runtime: staticlockranking builders failing on release branches on LUCI #64722

Open
@prattmic

Description

@prattmic

Example failure:

https://ci.chromium.org/ui/p/golang/builders/try/go1.21-linux-amd64-staticlockranking/b8762252922810888305/test-results?sortby=&groupby=

        65878  ======
        0 : rwmutexW 18 0x111d488
        1 : fin 26 0x111cec0
        fatal error: lock ordering problem
        
        runtime stack:
        runtime.throw({0xbb398e?, 0xffffffffffffe000?})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1077 +0x5c fp=0x7ffcd92875b8 sp=0x7ffcd9287588 pc=0x43fd9c
        runtime.checkRanks(0xc0000081a0, 0x7ffcd9287638?, 0x111d460?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lockrank_on.go:162 +0x236 fp=0x7ffcd9287618 sp=0x7ffcd92875b8 pc=0x411eb6
        runtime.lockWithRankMayAcquire.func1()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lockrank_on.go:235 +0x85 fp=0x7ffcd9287648 sp=0x7ffcd9287618 pc=0x4125c5
        traceback: unexpected SPWRITE function runtime.systemstack
        runtime.systemstack()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:509 +0x4a fp=0x7ffcd9287658 sp=0x7ffcd9287648 pc=0x47452a
        
        goroutine 1 [running]:
        runtime.systemstack_switch()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:474 +0x8 fp=0xc00013f3c8 sp=0xc00013f3b8 pc=0x4744c8
        runtime.lockWithRankMayAcquire(0x100c00013f488?, 0x7fc6e535f658?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/lockrank_on.go:224 +0x5a fp=0xc00013f400 sp=0xc00013f3c8 pc=0x4124fa
        runtime.lockRankMayQueueFinalizer(...)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mfinal.go:91
        runtime.mallocgc(0x10, 0xb244e0, 0x0)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/malloc.go:963 +0x4b fp=0xc00013f468 sp=0xc00013f400 pc=0x413eeb
        runtime.convTnoptr(0xb244e0, 0x2f?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/iface.go:348 +0x2b fp=0xc00013f4a0 sp=0xc00013f468 pc=0x41096b
        syscall.Setrlimit(0x7, 0x30?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/syscall/rlimit.go:47 +0x4f fp=0xc00013f4e8 sp=0xc00013f4a0 pc=0x48830f
        syscall.Exec({0xc00003c7c0?, 0xc00013f5e8?}, {0xc000036040, 0x2, 0x2}, {0xc000006c00, 0x30, 0x30})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/syscall/exec_unix.go:283 +0x16c fp=0xc00013f580 sp=0xc00013f4e8 pc=0x486d4c
        cmd/go/internal/toolchain.execGoToolchain({0xc00003a0cc, 0x7}, {0xc000040037, 0x28}, {0xc00003c7c0, 0x40})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/internal/toolchain/exec.go:53 +0x345 fp=0xc00013f618 sp=0xc00013f580 pc=0x9c0f45
        cmd/go/internal/toolchain.Exec({0xc00003a0cc, 0x7})
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/internal/toolchain/select.go:280 +0x345 fp=0xc00013f8c8 sp=0xc00013f618 pc=0x9c2145
        cmd/go/internal/toolchain.Select()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/internal/toolchain/select.go:230 +0xb0f fp=0xc00013fa00 sp=0xc00013f8c8 pc=0x9c1d6f
        cmd/go.main()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/main.go:97 +0x34 fp=0xc00013fb10 sp=0xc00013fa00 pc=0xa19454
        cmd/go.Main(...)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/export_test.go:7
        cmd/go_test.TestMain(0x4760fa?)
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/cmd/go/go_test.go:160 +0x14dc fp=0xc00013fe88 sp=0xc00013fb10 pc=0xa5b85c
        main.main()
        	_testmain.go:193 +0x1c6 fp=0xc00013ff40 sp=0xc00013fe88 pc=0xa7f3a6
        runtime.main()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/proc.go:267 +0x2bb fp=0xc00013ffe0 sp=0xc00013ff40 pc=0x44283b
        runtime.goexit()
        	/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00013ffe8 sp=0xc00013ffe0 pc=0x4764a1

This specific ordering violation is not problematic, though it is unclear to me why this is only failing on LUCI, and even there only on the release branches. More importantly, digging into this reveals fundamental problems with the may we model rwmutex.

  1. We treat all rwmutex the same. That is, they all use rwmutexR and rwmutexW even though they are semantically different locks. This is technically OK, but it reduces precision in the lock ranking and makes it more difficult to understand.
  2. rwmutexR is not actually held across read locks, it is just an internal implementation detail held temporarily when there is contention. As a result the read lock rank is not consistently modeled since the lock is so rarely taken. We should have a rank that is always acquired on read lock.

cc @mknyszek

Metadata

Metadata

Assignees

Labels

NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions