Skip to content

Ensure NKG shutdowns gracefully #563

Closed as not planned
Closed as not planned
@ja20222

Description

@ja20222

Graceful = catch SIGTERM -> no errors in the logs -> return 0

Below is an example of a known problem:

How to reproduce:

Deploy cafe example.

Generate some updates in the k8s APIs - for example, scale coffee pods to 10 or back to 0.

At the same time, kill kubectl delete the NKG pod.

If you’re lucky (seriously, it depends on the timing of the events), you might get this:

{"level":"info","ts":1666645520.3277802,"msg":"The resource was not upserted because the context was canceled","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass","GatewayClass":{"name":"nginx"},"namespace":"","name":"nginx","reconcileID":"8b59d3c4-f023-4e28-bf4e-7fcef51d62e2"}
{"level":"info","ts":1666645520.3278348,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
{"level":"info","ts":1666645520.3278677,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":1666645520.327872,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass"}
{"level":"info","ts":1666645520.327875,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"gateway","controllerGroup":"gateway.networking.k8s.io","controllerKind":"Gateway"}
{"level":"info","ts":1666645520.3278775,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"httproute","controllerGroup":"gateway.networking.k8s.io","controllerKind":"HTTPRoute"}
{"level":"info","ts":1666645520.32788,"msg":"Shutdown signal received, waiting for all workers to finish","controller":"endpointslice","controllerGroup":"discovery.k8s.io","controllerKind":"EndpointSlice"}
{"level":"error","ts":1666645520.3280203,"logger":"statusUpdater","msg":"Failed to update status","namespace":"default","name":"gateway","kind":"Gateway","error":"Put "https://10.96.0.1:443/apis/gateway.networking.k8s.io/v1beta1/namespaces/default/gateways/gateway/status?timeout=10s\": context canceled","stacktrace":"github.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).update\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:154\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).Update\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:97\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventHandlerImpl).HandleEventBatch\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/events/handler.go:88\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventLoop).Start.func1.1\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/events/loop.go:61"}
{"level":"info","ts":1666645520.328146,"logger":"eventLoop","msg":"Finished handling the batch"}
{"level":"info","ts":1666645520.3281548,"msg":"All workers finished","controller":"service","controllerGroup":"","controllerKind":"Service"}
{"level":"info","ts":1666645520.328158,"msg":"All workers finished","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
{"level":"info","ts":1666645520.3281612,"msg":"All workers finished","controller":"gateway","controllerGroup":"gateway.networking.k8s.io","controllerKind":"Gateway"}
{"level":"info","ts":1666645520.3281643,"msg":"All workers finished","controller":"httproute","controllerGroup":"gateway.networking.k8s.io","controllerKind":"HTTPRoute"}
{"level":"info","ts":1666645520.328167,"msg":"All workers finished","controller":"endpointslice","controllerGroup":"discovery.k8s.io","controllerKind":"EndpointSlice"}
{"level":"info","ts":1666645520.3281696,"msg":"All workers finished","controller":"gatewayclass","controllerGroup":"gateway.networking.k8s.io","controllerKind":"GatewayClass"}
{"level":"info","ts":1666645520.3281732,"msg":"Stopping and waiting for caches"}
{"level":"info","ts":1666645520.328385,"msg":"Stopping and waiting for webhooks"}
{"level":"info","ts":1666645520.328418,"msg":"Wait completed, proceeding to shutdown the manager"}
rpc error: code = NotFound desc = an error occurred when try to find container "c71dccbaa6c5ee3987461541bbafdaebf6a95b0e9c080029765e46eb92798c79": not found%

Note the error:

{"level":"error","ts":1666645520.3280203,"logger":"statusUpdater","msg":"Failed to update status","namespace":"default","name":"gateway","kind":"Gateway","error":"Put "https://10.96.0.1:443/apis/gateway.networking.k8s.io/v1beta1/namespaces/default/gateways/gateway/status?timeout=10s\": context canceled","stacktrace":"github.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).update\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:154\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/status.(*updaterImpl).Update\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/status/updater.go:97\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventHandlerImpl).HandleEventBatch\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/events/handler.go:88\ngithub.com/nginxinc/nginx-kubernetes-gateway/internal/events.(*EventLoop).Start.func1.1\n\tgithub.com/nginxinc/nginx-kubernetes-gateway/internal/events/loop.go:61"}

It happened because the update status API call canceled.

Expected:

the graceful shutdown should be graceful - no errors should be printed in the logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/control-planeGeneral control plane issuesbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions