Skip to content

Reflector seems to hang for long periods of time #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scarby opened this issue Dec 3, 2021 · 4 comments
Closed

Reflector seems to hang for long periods of time #239

scarby opened this issue Dec 3, 2021 · 4 comments

Comments

@scarby
Copy link

scarby commented Dec 3, 2021

Noticed recently that our certificates sometimes take up to 30 minutes to reflect to new namespaces, yet sometimes this happens instantly.

As an example of the happy path:

$ kubectl create ns adam-cert-test
Thu Dec  2 17:32:34 PST 2021
namespace/adam-cert-test created

logs:

2021-12-03 01:31:02.678 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2021-12-03 01:31:02.934 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected trsy-certificates/trsy-cert where permitted. Created 1 - Updated 0 - Deleted 0 - Validated 36.
2021-12-03 01:32:35.089 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Created adam-cert-test/trsy-cert as a reflection of trsy-certificates/trsy-cert 

this one took around a second

the unhappy path:

$ date; kubectl create ns adam-cert-test3
Thu Dec  2 17:43:11 PST 2021
namespace/adam-cert-test3 created

logs:

2021-12-03 01:41:39.885 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Session closed. Duration: 00:45:25.8514941. Faulted: False.
2021-12-03 01:41:39.885 +00:00 [INF] (ES.Kubernetes.Reflector.Core.ConfigMapWatcher) Requesting V1ConfigMap resources
2021-12-03 01:44:51.286 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Session closed. Duration: 00:36:53.8147881. Faulted: False.
2021-12-03 01:44:51.286 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Requesting V1Namespace resources

still going as of Fri Dec 3 02:08:12 UTC 2021

note - all cert creation at this point will stall at this point. It will then catch up with log entries like:

2021-12-03 01:44:51.286 +00:00 [INF] (ES.Kubernetes.Reflector.Core.NamespaceWatcher) Requesting V1Namespace resources
2021-12-03 02:09:37.293 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Session closed. Duration: 00:38:34.6152188. Faulted: False.
2021-12-03 02:09:37.294 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretWatcher) Requesting V1Secret resources
2021-12-03 02:09:37.555 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Auto-reflected trsy-certificates/trsy-cert where permitted. Created 1 - Updated 0 - Deleted 0 - Validated 39.
2021-12-03 02:09:37.563 +00:00 [INF] (ES.Kubernetes.Reflector.Core.SecretMirror) Created adam-cert-test3/trsy-cert as a reflection of trsy-certificates/trsy-cert

So this one took 26 minutes. Unfortunately i'm not proficient in C# so haven't gotten close to working out why this could happen.

cluster info:

Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.12-gke.1500
@Tomasz-Kluczkowski
Copy link

Any ideas about this one? If this happens all the time, it pretty much means I cannot use reflector, which would be really sad :(.

@winromulus
Copy link
Contributor

hi
As mentioned in previous issues, this is not a reflector but k8s issue.
Basically k8s had a bug in older (not that old, but older) versions where events were not pushed by the API server. In 1.21+ it was fixed.
Reflector relies on those events to be pushed (it does not scrape). What you're seeing as "hanging" is basically the API server not sending anything, then the connection closes (idle) and on reconnect everything gets sent. There is no way to detect from the reflector is the API server is not sending events or there are actually no events.
Have a look at #228
My suggestion is upgrading your version of k8s to the latest supported by your platform.
BTW, this is not an issue that affects reflector only, there are a ton of extensions that rely on those events and do not get them. Most of them have changed from subscribing to events to scraping the data (querying k8s) but this is problematic because, depending on the size of the cluster and number of resources, it can become a serious performance issue. (Reflector is also installed on clusters with hundreds of namespaces and thousands of configmaps and secrets, so querying those every,,,1 minute?...will kill the API server).

I'll keep this issue open for a while in the hope that others facing this problem can provide before and after k8s upgrade insights.

@Tomasz-Kluczkowski
Copy link

Tomasz-Kluczkowski commented Dec 6, 2021 via email

@winromulus
Copy link
Contributor

Closing this as related to #246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants