Skip to content

x/benchmarks: cockroachdb failing after CL 564197 (runtime: only poll network from one P at a time in findRunnable) #73474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
prattmic opened this issue Apr 22, 2025 · 6 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) compiler/runtime Issues related to the Go compiler and/or runtime. Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations)
Milestone

Comments

@prattmic
Copy link
Member

As of https://go.dev/cl/564197 (for #65064), all 4 perf builders are failing due to cockroachdb errors.

e.g., https://ci.chromium.org/ui/p/golang/builders/ci/gotip-linux-arm64_c4ah72-perf_vs_release/b8716897839307984241/overview

[sweet] error: run benchmark cockroachdb for config experiment: exit status 1
Tail of log (/home/swarming/.swarming/w/ir/x/t/go-sweet2521400662/results/cockroachdb/experiment.log):
external I/O path:   /home/swarming/.swarming/w/ir/x/t/go-sweet2521400662/work/cockroachdb/experiment/tmp/roach-node/extern
store[0]:            path=/home/swarming/.swarming/w/ir/x/t/go-sweet2521400662/work/cockroachdb/experiment/tmp/roach-node
storage engine:      pebble
clusterID:           2d2165f0-849e-4d6d-952c-6a7068f2b375
status:              initialized new cluster
nodeID:              1

=== Instance "roach-node" original stderr ===
I250422 11:36:15.334032 1 util/log/file_sync_buffer.go:242 ⋮ [config]   file created at: 2025/04/22 11:36:15
I250422 11:36:15.334043 1 util/log/file_sync_buffer.go:242 ⋮ [config]   running on machine: ‹golang-ciw-c4a-72-linux-arm64-bookworm-us-east4-a-0-149a›
I250422 11:36:15.334050 1 util/log/file_sync_buffer.go:242 ⋮ [config]   binary: CockroachDB CCL v24.2.0-alpha.00000000-dev (linux arm64, built , devel 352dd2d932c1c1c6dbc3e112fcdfface07d4fffb)
I250422 11:36:15.334056 1 util/log/file_sync_buffer.go:242 ⋮ [config]   arguments: [‹/home/swarming/.swarming/w/ir/x/t/go-sweet2521400662/work/cockroachdb/experiment/bin/cockroach› ‹start-single-node› ‹--insecure› ‹--listen-addr› ‹localhost:26257› ‹--http-addr› ‹localhost:26258› ‹--cache› ‹0.25› ‹--store› ‹/home/swarming/.swarming/w/ir/x/t/go-sweet2521400662/work/cockroachdb/experiment/tmp/roach-node› ‹--log-dir› ‹/home/swarming/.swarming/w/ir/x/t/go-sweet2521400662/work/cockroachdb/experiment/tmp/roach-node-log›]
I250422 11:36:15.334072 1 util/log/file_sync_buffer.go:242 ⋮ [config]   log format (utf8=✓): crdb-v2
I250422 11:36:15.334074 1 util/log/file_sync_buffer.go:242 ⋮ [config]   line format: [IWEF]yymmdd hh:mm:ss.uuuuuu goid [chan@]file:line redactionmark \[tags\] [counter] msg
I250422 11:36:15.333849 1 util/log/flags.go:222  [-] 1  stderr capture started

error: workload failed to become available within timeout: error: signal: killed: output:
I250422 11:36:40.865692 1 workload/cli/run.go:640  [-] 1  random seed: 6233835059417767123
I250422 11:36:40.865770 1 workload/cli/run.go:432  [-] 2  creating load generator...

At the moment, this CL is the most recent run of these builders, so more data will be useful to see if this is consistent. But given that all builders failed, it seems less likely to be a flake.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 22, 2025
@gopherbot gopherbot added this to the Unreleased milestone Apr 22, 2025
@prattmic prattmic added Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations) and removed compiler/runtime Issues related to the Go compiler and/or runtime. labels Apr 22, 2025
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 22, 2025
@gabyhelp gabyhelp added the Builders x/build issues (builders, bots, dashboards) label Apr 22, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/667336 mentions this issue: Revert "runtime: only poll network from one P at a time in findRunnable"

gopherbot pushed a commit that referenced this issue Apr 22, 2025
This reverts commit 352dd2d.

Reason for revert: cockroachdb benchmark failing. Likely due to CL 564197.

For #73474

Change-Id: Id5d83cd8bb8fe9ee7fddb8dc01f1a01f2d40154e
Reviewed-on: https://go-review.googlesource.com/c/go/+/667336
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Reviewed-by: Mauri de Souza Meneguzzo <[email protected]>
Auto-Submit: Carlos Amedee <[email protected]>
@mknyszek mknyszek moved this from Todo to In Progress in Go Compiler / Runtime Apr 23, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/668735 mentions this issue: sweet/harnesses: update the version of Cockroachdb benchmarked

gopherbot pushed a commit to golang/benchmarks that referenced this issue Apr 30, 2025
This change updates the version of Cockroachdb benchmarked by sweet.
When we submitted a change to how the network is polled in CL 564197
we discovered that older versions of Cockroachdb modify the scheduler
in a way that is incompatable with the submitted change. This updates the
version of Cockroachdb to a version where that incompatibility no
longer exists.

For golang/go#73474
For golang/go#65064

Change-Id: Ifae845e025a5b64ff2cfff65e8c508c999ffbcb4
Reviewed-on: https://go-review.googlesource.com/c/benchmarks/+/668735
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/670416 mentions this issue: sweet/harnesses: update the version of CockroachDB benchmarked

@cagedmantis
Copy link
Contributor

While we were working to address Issue #65064, we submitted CL 564197. Once that CL was submitted, the perf builders began failing. The failures were due to CockroachDB benchmark failures. The failures were timeouts in the tests. The failures were due to CockroachDB using a internal copy/clone of schedt which became out of sync with the Go runtime. CockroachDB has transitioned from using an internal copy/clone of schedt. When we run the benchmark against the tip version of CockroachDB it fails because of a another issue cockroachdb/cockroach#124021. We should update the version that we test against and attempt to exclude the failing benchmark BenchmarkSQLCatchVectorizedRuntimeError.

@cagedmantis
Copy link
Contributor

Following up on the previous message. We should update the version that we test against and attempt to mitigate the effects of the root cause of the error condition found in cockroachdb/cockroach#124021. I will mail a CL updating the version of CockroachDB we benchmark against and then send a follow-up CL which retries failing benchmarks. The error we are experiencing i logged as:

/home/swarming/.swarming/w/ir/tmp/go-sweet978346786/work/cockroachdb/baseline/bin/cockroach workload init kv postgres://root@localhost:26257?sslmode=disable
/home/swarming/.swarming/w/ir/tmp/go-sweet978346786/work/cockroachdb/baseline/bin/cockroach workload run kv --read-percent=50 --min-block-bytes=1024 --max-block-bytes=1024 --concurrency=10000 --max-rate=30000 --scatter --splits=5 --ramp=0s --duration=500ms postgres://root@localhost:26257?sslmode=disable
/home/swarming/.swarming/w/ir/tmp/go-sweet978346786/work/cockroachdb/baseline/bin/cockroach workload run kv --read-percent=50 --min-block-bytes=1024 --max-block-bytes=1024 --concurrency=10000 --max-rate=30000 --scatter --splits=5 --ramp=15s --duration=1m postgres://root@localhost:26257?sslmode=disable
exit status 1

I250507 20:06:46.930810 1 workload/cli/run.go:649  [-] 1  random seed: -6780016327593225364
I250507 20:06:48.349557 1 workload/cli/run.go:649  [-] 1  random seed: -3211894271932946231
I250507 20:06:48.349665 1 workload/cli/run.go:460  [-] 2  creating load generator...
I250507 20:06:48.780665 1 workload/cli/run.go:499  [-] 3  creating load generator... done (took 430.998554ms)
W250507 20:06:48.783756 49098 workload/pgx_helpers.go:235  [-] 4  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:48.784155 49099 workload/pgx_helpers.go:235  [-] 5  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:48.785971 49100 workload/pgx_helpers.go:235  [-] 7  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:48.785957 49098 workload/pgx_helpers.go:235  [-] 6  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:51.268981 51800 workload/pgx_helpers.go:235  [-] 9158  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: relation "kv" does not exist (SQLSTATE 42P01)
W250507 20:06:51.269238 50710 workload/pgx_helpers.go:235  [-] 9159  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: relation "kv" does not exist (SQLSTATE 42P01)
W250507 20:06:51.269225 50648 workload/pgx_helpers.go:235  [-] 9161  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:51.269239 51320 workload/pgx_helpers.go:235  [-] 9160  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
...
W250507 20:06:51.269378 51156 workload/pgx_helpers.go:235  [-] 9162  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: relation "kv" does not exist (SQLSTATE 42P01)
W250507 20:06:51.269472 52025 workload/pgx_helpers.go:235  [-] 9163  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: relation "kv" does not exist (SQLSTATE 42P01)
...
W250507 20:06:51.269653 51631 workload/pgx_helpers.go:235  [-] 9164  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:51.269804 50887 workload/pgx_helpers.go:235  [-] 9165  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:51.270143 50621 workload/pgx_helpers.go:235  [-] 9166  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:51.894682 50869 workload/pgx_helpers.go:235  [-] 10405  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
W250507 20:06:51.894684 50754 workload/pgx_helpers.go:235  [-] 10406  error preparing statement. name=kv-2 sql=SELECT k, v FROM kv AS OF SYSTEM TIME follower_read_timestamp() WHERE k IN ($1) ERROR: database "kv" does not exist (SQLSTATE 3D000)
E250507 20:07:02.748440 1 workload/cli/run.go:583  [-] 10407  workload run error: failed to connect to `host=localhost user=root database=kv`: dial error (dial tcp [::1]:26257: connect: cannot assign requested address)
E250507 20:07:02.748440 1 workload/cli/run.go:583  [-] 10407 +(1) failed to connect to `host=localhost user=root database=kv`: dial error (dial tcp [::1]:26257: connect: cannot assign requested address)
E250507 20:07:02.748440 1 workload/cli/run.go:583  [-] 10407 +Wraps: (2) dial tcp [::1]:26257
E250507 20:07:02.748440 1 workload/cli/run.go:583  [-] 10407 +Wraps: (3) connect
E250507 20:07:02.748440 1 workload/cli/run.go:583  [-] 10407 +Wraps: (4) cannot assign requested address
E250507 20:07:02.748440 1 workload/cli/run.go:583  [-] 10407 +Error types: (1) *pgconn.ConnectError (2) *net.OpError (3) *os.SyscallError (4) syscall.Errno
Error: failed to connect to `host=localhost user=root database=kv`: dial error (dial tcp [::1]:26257: connect: cannot assign requested address)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) compiler/runtime Issues related to the Go compiler and/or runtime. Soon This needs action soon. (recent regressions, service outages, unusual time-sensitive situations)
Projects
Status: In Progress
Development

No branches or pull requests

4 participants