-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
CL 432535 caused a 75% increase in total runtime for go tests on plan9-arm builders. I believe the effect will have been similar on other builders with slow filesystems, for example openbsd-arm. Looking at local logs for a raspberry pi in the plan9-arm cluster around the time the CL was merged, I see this [the last column is total runtime in minutes]:
pi4h.1666769752 2022/10/26 16:28:48 79
pi4h.1666812046 2022/10/26 21:42:13 81
pi4h.1666823232 2022/10/27 00:47:48 79
pi4h.1666839373 2022/10/27 06:54:21 80
pi4h.1666850781 2022/10/27 12:57:53 79
pi4h.1666892893 2022/10/27 20:09:07 80
pi4h.1666912765 2022/10/28 04:08:04 82
pi4h.1666972603 2022/10/28 20:01:41 84
pi4h.1667253819 2022/11/01 00:26:54 142
pi4h.1667262442 2022/11/01 02:48:46 140
pi4h.1667270954 2022/11/01 05:09:20 139
pi4h.1667519311 2022/11/04 03:44:27 140
pi4h.1667538040 2022/11/04 07:36:40 76
pi4h.1667572836 2022/11/04 17:02:27 141
pi4h.1667582128 2022/11/04 19:37:00 141
pi4h.1667593858 2022/11/04 23:15:51 164
pi4h.1667603779 2022/11/05 01:38:11 141
pi4h.1667612320 2022/11/05 04:00:40 141
pi4h.1667620868 2022/11/05 06:22:59 140
pi4h.1667631635 2022/11/05 09:23:03 142
pi4h.1667686330 2022/11/07 13:04:59 141
pi4h.1667844527 2022/11/07 20:31:07 142
The increased time comes from this new call in cmd/dist/test.go
:
// The cache used by dist when building is different from that used when
// running dist test, so rebuild (but don't install) std and cmd to make
// sure packages without install targets are cached so they are not stale.
goCmd("go", "build", "std", "cmd") // make sure dependencies of targets are cached
(Parenthetically, could someone explain what "packages without install targets" refers to?)
Note that this extra build is performed at the start of every dist test
invocation in a test run. Only the first one after the make.bash
/make.rc
might actually build something; subsequent builds will just amount to a staleness test. On a fast linux machine with aggressive caching of directories and inodes, the staleness test may seem like a "no-op", but at last count it involves 9999 open
and 6326 stat
calls on 4580 files and directories. On a plan9-arm builder this adds nearly a minute of overhead to each dist test
:
cpu% time go test -count 1 container/list
ok container/list 0.076s
0.27u 0.24s 3.83r go test -count 1 container/list ...
cpu% time go tool dist test --no-rebuild go_test:container/list
##### Test execution environment.
# GOARCH: arm
# CPU:
# GOOS: plan9
# OS Version: 2000
##### Testing packages.
ok container/list 0.083s
ALL TESTS PASSED (some were excluded)
0.02u 0.09s 59.67r go tool dist test --no-rebuild ...
Because the tests in a run are partitioned (badly, and in my opinion unnecessarily - see #49343), each test run of 275 tests currently involves 131 invocations of dist test
- so, potentially more than two hours spent on redundant dist test
overhead
Solutions? I would suggest one or more of:
- use the same cache for building and running
dist test
- using a parameter to
dist test
to indicate that a build has just been done so no staleness test is needed (maybe - radical idea - just respect--no-rebuild
?) - stop partitioning the tests for slow builders without helpers