-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
We recently merged our logging guidelines and a first PR ensuring all the logs from a reconcile have consistent keys.
Now it is possible make a pass on our controller and improve existing log messages to take benefit of key-value pairs, and a good way to do so is to focus on a simple workflow supported by CAPI, and make sure logs are representing what happens and all the dependencies across objects.
e.g I created a management cluster with logging enabled using Tilt, I created a cluster, and take a look at the logs documenting a machine being provisioned by using the following query: {app="capi-controller-manager",controller="machine"} | json | machine_name="classy1-23696-sxcsn-2jkvs"
msg="Bootstrap provider is not ready, requeuing" v=0
msg="Infrastructure provider is not ready, requeuing" v=0
msg="Cannot reconcile Machine's Node, no valid ProviderID yet" v=0
...
msg="Set Machine's NodeRef" v=0
They are ok, but we can do better, by making more explicit what we are waiting for and adding some more details when provisioning completes. So I created a small PR that gives us the following output:
msg="Waiting for bootstrap provider to generate data secret and report status.ready, requeing" v=0 (with a key value pair for the bootstrap object)
msg="Waiting for infrastructure provider to create machine infrastructure and report status.ready, requeing" v=0 (with a key value pair for the infrastructure object)
msg="Waiting for infrastructure provider to report spec.ProviderID, requeing" v=0 (with a key value pair for the infrastructure object)
...
msg="Bootstrap provider generated data secret and reports status.ready" v=0 (with a key value pair for the bootstrap object and one for the secret)
...
msg="Infrastructure provider completed machine infrastructure provisioning and reports status.ready" v=0 (with a key value pair for the infrastructure object)
msg="Infrastructure provider reporting spec.ProviderID, Kubernetes node is now available" v=0 (with a key value pair for the infrastructure object, one for providerID and one for the node)
And the idea behind this issue is to rally the community for creating similar PRs, each one doing small, incremental improvements for one of the Cluster API workflows:
- Cluster controller provisioning infrastructure
- Cluster deletion (@valaparthvi)
- Machine deletion
- Node deletion
- KCP creating/deleting Machine
- KCP remediating a Machine
- MD creating/deleting MS 🌱 [WIP] Improve logging for MachineDeployment scale up&down workflow #7168 (@furkatgofurov7)
- MS creating/deleting Machines 🌱 Improve logging for the MachineSet scale up/down workflow #7026 (@fabriziopandini)
- MS remediating a Machine
- MHC triggering machine remediation
- structuredmerge package
- (probably more, let me know).
Also worth to notice this is a great opportunity for people willing to dig in into CAPI and learn how things work
/help wanted
/kind cleanup