-
Notifications
You must be signed in to change notification settings - Fork 626
Closed
Description
Evicted pods can be in phase failed and I would expect this alert to catch this.
Happy to submit a PR to resolve, but wanted to know whether I am right to think that KubePodNotReady should catch the {phase=Failed} or of there should be a new alert KubePodFailed or similar?
Current alert:
{
expr: |||
sum by (namespace, pod) (kube_pod_status_phase{%(prefixedNamespaceSelector)s%(kubeStateMetricsSelector)s, phase=~"Pending|Unknown"}) > 0
||| % $._config,
labels: {
severity: 'critical',
},
annotations: {
message: 'Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than an hour.',
},
'for': '1h',
alert: 'KubePodNotReady',
},
There is also a KubeJobFailed alert that may influence the decision for how to manage Failed pods.
{
alert: 'KubeJobFailed',
expr: |||
kube_job_status_failed{%(prefixedNamespaceSelector)s%(kubeStateMetricsSelector)s} > 0
||| % $._config,
'for': '1h',
labels: {
severity: 'warning',
},
annotations: {
message: 'Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete.',
},
},
Metadata
Metadata
Assignees
Labels
No labels