Skip to content

Conversation

gouthamve
Copy link
Contributor

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase

Essentially is something is evicted, or exits with non-zero, it gets rescheduled. Now, the failed pod sticks around until --terminated-pod-gc-threshold.

The only exception to this rule is that Pods with a phase of Succeeded or Failed for more than some duration (determined by terminated-pod-gc-threshold in the master) will expire and be automatically destroyed

--terminated-pod-gc-threshold int32     Default: 12500
Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods. If <= 0, the terminated pod garbage collector is disabled.

This is causing us some alerts like:
screen shot 2018-08-28 at 16 06 27

* If a node flaps and comes back, pods are marked failed

Signed-off-by: Goutham Veeramachaneni <[email protected]>
@gouthamve
Copy link
Contributor Author

@brancz @tomwilkie

@brancz
Copy link
Member

brancz commented Aug 28, 2018

Yeah we actually had a similar case with Jobs that were causing lots of Completed pods.

This lgtm 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants