-
Notifications
You must be signed in to change notification settings - Fork 274
Description
Describe the bug
We have deployed v1.16.3 of the node termination handler. I noticed that the memory usage is increasing over certain period of time and eventually it reaches the pod memory limit and is OOMKilled. Is there a memory leak somewhere?
Application Logs
Following are the logs for the aws-node-termination-handler pod. Logs do not have anything erroneous:
2022/07/28 07:15:01 INF Starting to serve handler /metrics, port 9092
2022/07/28 07:15:01 INF Starting to serve handler /healthz, port 8080
2022/07/28 07:15:01 INF Startup Metadata Retrieved metadata={"accountId":"xxxx","availabilityZone":"us-west-2b","instanceId":"i-xxxx","instanceLifeCycle":"on-demand","instanceType":"c6i.4xlarge","localHostname":"xxxx.us-west-2.compute.internal","privateIp":"x.x.x.x","publicHostname":"","publicIp":"","region":"us-west-2"}
2022/07/28 07:15:01 INF aws-node-termination-handler arguments:
dry-run: false,
node-name: xxxx.us-west-2.compute.internal,
pod-name: aws-node-termination-handler-b56bf578b-79x5m,
metadata-url: http://abcd,
kubernetes-service-host: x.x.x.X,
kubernetes-service-port: 443,
delete-local-data: true,
ignore-daemon-sets: true,
pod-termination-grace-period: -1,
node-termination-grace-period: 120,
enable-scheduled-event-draining: false,
enable-spot-interruption-draining: false,
enable-sqs-termination-draining: true,
enable-rebalance-monitoring: false,
enable-rebalance-draining: false,
metadata-tries: 3,
cordon-only: false,
taint-node: true,
taint-effect: NoSchedule,
exclude-from-load-balancers: false,
json-logging: false,
log-level: info,
webhook-proxy: ,
webhook-headers: ,
webhook-url: ,
webhook-template: ,
uptime-from-file: ,
enable-prometheus-server: true,
prometheus-server-port: 9092,
emit-kubernetes-events: true,
kubernetes-events-extra-annotations: ,
aws-region: us-west-2,
queue-url: https://xxxx,
check-asg-tag-before-draining: false,
managed-asg-tag: aws-node-termination-handler/managed,
use-provider-id: false,
aws-endpoint: ,
2022/07/28 07:15:01 INF Started watching for interruption events
2022/07/28 07:15:01 INF Kubernetes AWS Node Termination Handler has started successfully!
2022/07/28 07:15:01 INF Started watching for event cancellations
2022/07/28 07:15:01 INF Started monitoring for events event_type=SQS_TERMINATE
2022/07/28 07:20:12 INF Adding new event to the event store event={"AutoScalingGroupName":"xxxx","Description":"EC2 State Change event received. Instance i-xxxx went into shutting-down at 2022-07-28 07:20:11 +0000 UTC \n","EndTime":"0001-01-01T00:00:00Z","EventID":"ec2-state-change-event-xxxx","InProgress":false,"InstanceID":"i-xxxx","IsManaged":true,"Kind":"SQS_TERMINATE","NodeLabels":null,"NodeName":"xxxx.us-west-2.compute.internal","NodeProcessed":false,"Pods":null,"ProviderID":"aws:///us-west-2c/xxxx","StartTime":"2022-07-28T07:20:11Z","State":""}
- Kubernetes version: v1.21