Kubernetes Namespace pod watch can hang sometimes #1134

gunpuz · 2022-12-14T09:09:49Z

Describe the bug

It seems that watch:

var podlistResp = kubeClient.CoreV1.ListNamespacedPodWithHttpMessagesAsync(configurationSettings.DeploymentNamespace, watch: true);

await foreach (var (type, item) in podlistResp.WatchAsync<V1Pod, V1PodList>())
{
   ...
}

can hang from time to time. There seems to be no clear scenario how and when to reproduce it though :(

Server Kubernetes Version
v1.24.3

Dotnet Runtime Version
net6

To Reproduce
It can hang "sometimes". Its not clear why.

KubeConfig

Default configuration

KubernetesClientConfiguration k8sConfig;
            if (!_configurationSettings.UseKubeConfig)
            {
                // Running inside a k8s cluser
                k8sConfig = KubernetesClientConfiguration.InClusterConfig();
            }

Additional info

It could be related to this issue: #884

tg123 · 2022-12-14T09:40:15Z

what do you mean by hang

gunpuz · 2022-12-14T14:07:07Z

I think that new pod events are not dispached from library in await foreach but I don't know how to reproduce it. I would say that this happens rarely

tg123 · 2022-12-14T14:42:19Z

there is an onError param in WatchAsync<V1Pod, V1PodList>())
maybe something happened you did not notice

gunpuz · 2022-12-14T14:57:45Z

Should you use error param? Seems that if error parameter is not used then exception is thrown. I dont use error param

Should this work? I have something like this:

while(true){
   try{
       var podlistResp = kubeClient.CoreV1.ListNamespacedPodWithHttpMessagesAsync(configurationSettings.DeploymentNamespace, watch: true);

       await foreach (var (type, item) in podlistResp.WatchAsync<V1Pod, V1PodList>())
       {
           ...
       }
   }
   catch(){
   }
}

brendandburns · 2022-12-16T16:28:45Z

The above codewill not work properly. The reason for this is that you need to do an explicit List of the resources when the watch breaks.

There is a race where if a resource is created after the watch ends and before the next watch begins you will miss that resource.

The proper approach is:

while (true) {
   var list = // list all resources here
   foreach (var item in list ) {
            ....
   }
   // watch resources here
}

gunpuz · 2022-12-17T15:06:10Z

@brendandburns Seems that method ListNamespacedPodWithHttpMessagesAsync already lists existing resources and then gives back any new resources that are created.
Why do I need to list resources seperatly?

And in this example of yours, would not you miss any new resources between list and watch calls if you think that ListNamespacedPodWithHttpMessagesAsync does not list existing resources? Maybe i am missing something and you could provide more detailed example with methods you use? Seems like this option is bad?

Function name already says that - ListNamespacedPod + WithHttpMessages

var podlistResp = kubeClient.CoreV1.ListNamespacedPodWithHttpMessagesAsync(configurationSettings.DeploymentNamespace, watch: true);

k8s-triage-robot · 2023-03-17T15:43:49Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Kruti-Joshi · 2023-03-22T16:48:30Z

@gunpuz , did you figure out how to stop this from happening? I have a watch code which randomly stops receiving any events and is stuck till I restart my service.

tg123 · 2023-03-22T18:28:53Z

@Kruti-Joshi you can check if this is the cause
#1099

gunpuz · 2023-03-25T22:56:17Z

@Kruti-Joshi, i think this issue went away when you "restart" the listener and listen again in a loop...

k8s-triage-robot · 2023-04-24T23:13:41Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-05-24T23:44:09Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-05-24T23:44:11Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 24, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kubernetes Namespace pod watch can hang sometimes #1134

Kubernetes Namespace pod watch can hang sometimes #1134

gunpuz commented Dec 14, 2022 •

edited

Loading

tg123 commented Dec 14, 2022

Uh oh!

gunpuz commented Dec 14, 2022 •

edited

Loading

Uh oh!

tg123 commented Dec 14, 2022

Uh oh!

gunpuz commented Dec 14, 2022

Uh oh!

brendandburns commented Dec 16, 2022

Uh oh!

gunpuz commented Dec 17, 2022 •

edited

Loading

Uh oh!

k8s-triage-robot commented Mar 17, 2023

Uh oh!

Kruti-Joshi commented Mar 22, 2023

Uh oh!

tg123 commented Mar 22, 2023

Uh oh!

gunpuz commented Mar 25, 2023 •

edited

Loading

Uh oh!

k8s-triage-robot commented Apr 24, 2023

Uh oh!

k8s-triage-robot commented May 24, 2023

Uh oh!

k8s-ci-robot commented May 24, 2023

Uh oh!

Kubernetes Namespace pod watch can hang sometimes #1134

Kubernetes Namespace pod watch can hang sometimes #1134

Comments

gunpuz commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tg123 commented Dec 14, 2022

Uh oh!

gunpuz commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tg123 commented Dec 14, 2022

Uh oh!

gunpuz commented Dec 14, 2022

Uh oh!

brendandburns commented Dec 16, 2022

Uh oh!

gunpuz commented Dec 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-triage-robot commented Mar 17, 2023

Uh oh!

Kruti-Joshi commented Mar 22, 2023

Uh oh!

tg123 commented Mar 22, 2023

Uh oh!

gunpuz commented Mar 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-triage-robot commented Apr 24, 2023

Uh oh!

k8s-triage-robot commented May 24, 2023

Uh oh!

k8s-ci-robot commented May 24, 2023

Uh oh!

gunpuz commented Dec 14, 2022 •

edited

Loading

gunpuz commented Dec 14, 2022 •

edited

Loading

gunpuz commented Dec 17, 2022 •

edited

Loading

gunpuz commented Mar 25, 2023 •

edited

Loading