Skip to content

Flaky test failure in staging/src/k8s.io/client-go/util/workqueue #125581

@tatsuhiro-t

Description

@tatsuhiro-t

What happened?

$ go test -run TestAddTwoFireEarly -race -count=10000 ./staging/src/k8s.io/client-go/util/workqueue/...
--- FAIL: TestAddTwoFireEarly (10.01s)
    delaying_queue_test.go:160: unexpected err: timed out waiting for the condition
FAIL
FAIL	k8s.io/client-go/util/workqueue	100.405s
FAIL

What did you expect to happen?

Tests always succeed

How can we reproduce it (as minimally and precisely as possible)?

Run

go test -run TestAddTwoFireEarly -race -count=10000 ./staging/src/k8s.io/client-go/util/workqueue/...

Repeat until it fails.

Anything else we need to know?

If FakeClock.Step is called after https://github.com/kubernetes/client-go/blob/d2f5fba1f82d119e09288c70ce87bccb3dc119e4/util/workqueue/delaying_queue.go#L279 and before https://github.com/kubernetes/client-go/blob/d2f5fba1f82d119e09288c70ce87bccb3dc119e4/util/workqueue/delaying_queue.go#L300,
the created timer fires at entry.readyAt.Sub(now) + FakeClock.time that has been adjusted by FakeClock.Step.
Because we do not call FakeClock.Step with the positive duration anymore, this timer never fire. Note now is the FakeClock.time before it is adjusted by FakeClock.Step.

The proposed fix is as follows.
First add NewDeadlineTimer(t *time.Time) Timer to k8s.io/utils/clock.Clock, which fires at the given t.
Second, replace NewTimer with NewDeadlineTimer at https://github.com/kubernetes/client-go/blob/d2f5fba1f82d119e09288c70ce87bccb3dc119e4/util/workqueue/delaying_queue.go#L300
Finally, call FakeClock.Step(0) at https://github.com/kubernetes/client-go/blob/d2f5fba1f82d119e09288c70ce87bccb3dc119e4/util/workqueue/delaying_queue_test.go#L234 so that FakeClock deals with waiters.

Kubernetes version

```console $ kubectl version # paste output here ```

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Labels

kind/flakeCategorizes issue or PR as related to a flaky test.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions