Skip to content

cmd/dist: huge increase in runtime for 'dist test' on slow builders after CL 432535 #57734

@millerresearch

Description

@millerresearch

CL 432535 caused a 75% increase in total runtime for go tests on plan9-arm builders. I believe the effect will have been similar on other builders with slow filesystems, for example openbsd-arm. Looking at local logs for a raspberry pi in the plan9-arm cluster around the time the CL was merged, I see this [the last column is total runtime in minutes]:

pi4h.1666769752 2022/10/26 16:28:48 79
pi4h.1666812046 2022/10/26 21:42:13 81
pi4h.1666823232 2022/10/27 00:47:48 79
pi4h.1666839373 2022/10/27 06:54:21 80
pi4h.1666850781 2022/10/27 12:57:53 79
pi4h.1666892893 2022/10/27 20:09:07 80
pi4h.1666912765 2022/10/28 04:08:04 82
pi4h.1666972603 2022/10/28 20:01:41 84
pi4h.1667253819 2022/11/01 00:26:54 142
pi4h.1667262442 2022/11/01 02:48:46 140
pi4h.1667270954 2022/11/01 05:09:20 139
pi4h.1667519311 2022/11/04 03:44:27 140
pi4h.1667538040 2022/11/04 07:36:40 76
pi4h.1667572836 2022/11/04 17:02:27 141
pi4h.1667582128 2022/11/04 19:37:00 141
pi4h.1667593858 2022/11/04 23:15:51 164
pi4h.1667603779 2022/11/05 01:38:11 141
pi4h.1667612320 2022/11/05 04:00:40 141
pi4h.1667620868 2022/11/05 06:22:59 140
pi4h.1667631635 2022/11/05 09:23:03 142
pi4h.1667686330 2022/11/07 13:04:59 141
pi4h.1667844527 2022/11/07 20:31:07 142

The increased time comes from this new call in cmd/dist/test.go:

// The cache used by dist when building is different from that used when
// running dist test, so rebuild (but don't install) std and cmd to make
// sure packages without install targets are cached so they are not stale.
goCmd("go", "build", "std", "cmd") // make sure dependencies of targets are cached

(Parenthetically, could someone explain what "packages without install targets" refers to?)

Note that this extra build is performed at the start of every dist test invocation in a test run. Only the first one after the make.bash/make.rc might actually build something; subsequent builds will just amount to a staleness test. On a fast linux machine with aggressive caching of directories and inodes, the staleness test may seem like a "no-op", but at last count it involves 9999 open and 6326 stat calls on 4580 files and directories. On a plan9-arm builder this adds nearly a minute of overhead to each dist test:

cpu% time go test -count 1 container/list
ok  	container/list	0.076s
0.27u 0.24s 3.83r 	 go test -count 1 container/list ...
cpu% time go tool dist test --no-rebuild go_test:container/list

##### Test execution environment.
# GOARCH: arm
# CPU: 
# GOOS: plan9
# OS Version: 2000

##### Testing packages.
ok  	container/list	0.083s

ALL TESTS PASSED (some were excluded)
0.02u 0.09s 59.67r 	 go tool dist test --no-rebuild ...

Because the tests in a run are partitioned (badly, and in my opinion unnecessarily - see #49343), each test run of 275 tests currently involves 131 invocations of dist test - so, potentially more than two hours spent on redundant dist test overhead

Solutions? I would suggest one or more of:

  • use the same cache for building and running dist test
  • using a parameter to dist test to indicate that a build has just been done so no staleness test is needed (maybe - radical idea - just respect --no-rebuild?)
  • stop partitioning the tests for slow builders without helpers

Metadata

Metadata

Assignees

Labels

FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.ToolSpeed

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions