-
Notifications
You must be signed in to change notification settings - Fork 1.4k
✨ Emit Kubernetes Events when Cluster Phase, ControlPlaneReady, or InfrastructureReady change #7786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Emit Kubernetes Events when Cluster Phase, ControlPlaneReady, or InfrastructureReady change #7786
Conversation
This is valuable information and I'm +1 to events. Some questions: Is there a reason to not infer this from conditions ControlPlaneInitializedCondition / ControlPlaneReadyCondition / BootstrapReadyCondition / InfrastructureReadyCondition? Also note that events do not follow contractual policies, they might change at any time. I'm curious for the automation workflows described in the issue have you considered consuming the conditions directly or metrics/alerting instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtougeron thanks for working on this PR!
For sake of keeping a clean code organization/a simple main reconcile loop, what do you think about moving this change into reconcilePhase?
@enxebre The informer pattern only works with events not with object status conditions. There isn't a way to follow changes to the status conditions of a resource that I'm aware. This change will allow for being informed of events and then inspecting the Cluster resource to determine what action(s) need to be taken based on its status at that time. |
Sorry for the delays on this. I'm hoping to pick it back up next week. |
9b7a1c9
to
07dd857
Compare
Updated details about the events emitted kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --sort-by=.metadata.creationTimestamp -o json
|
@vincepri @CecileRobertMichon sorry for pinging but I would really like to have this feature is CAPI - could you review the PR? Thank you so much! |
} | ||
|
||
controllerutil.RemoveFinalizer(cluster, clusterv1.ClusterFinalizer) | ||
r.recorder.Eventf(cluster, corev1.EventTypeNormal, "Deleted", "Cluster %s has been deleted", cluster.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a common pattern? Wondering if pods do this too, and/or if there isn't some kind of general Object deleted event that gets emitted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a pod, there's the Killing
Event that's emitted when it is deleted. Here's the lifecycle events of creating a statefulset and then running a kubectl delete against the pod.
LAST SEEN TYPE REASON OBJECT MESSAGE
60s Normal Scheduled pod/touge-debug-0 Successfully assigned default/touge-debug-0 to ip-10-158-59-46.us-west-2.compute.internal
4s Normal SuccessfulCreate statefulset/touge-debug create Pod touge-debug-0 in StatefulSet touge-debug successful
59s Normal Pulling pod/touge-debug-0 Pulling image "mtougeron/touge-debug"
46s Normal Pulled pod/touge-debug-0 Successfully pulled image "mtougeron/touge-debug" in 13.117780542s
46s Normal Created pod/touge-debug-0 Created container ubuntu
46s Normal Started pod/touge-debug-0 Started container ubuntu
3s Normal Killing pod/touge-debug-0 Stopping container ubuntu
4s Normal Pulled pod/touge-debug-0 Container image "mtougeron/touge-debug" already present on machine
4s Normal Scheduled pod/touge-debug-0 Successfully assigned default/touge-debug-0 to ip-10-158-59-46.us-west-2.compute.internal
3s Normal Created pod/touge-debug-0 Created container ubuntu
3s Normal Started pod/touge-debug-0 Started container ubuntu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this lgtm overall, just a question about what's out there for delete events, but I'll defer to @fabriziopandini since he was the one who suggested a different approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @fabriziopandini
LGTM label has been added. Git tree hash: 550b5ac5296d8f9cfc560f319c9faa7de87af55e
|
I'm writing again to get an update on the review of this PR @fabriziopandini @vincepri @mtougeron , do you have any updates or topics to discuss? |
/retest |
…ctureReady change
07dd857
to
8aa4797
Compare
} | ||
|
||
controllerutil.RemoveFinalizer(cluster, clusterv1.ClusterFinalizer) | ||
r.recorder.Eventf(cluster, corev1.EventTypeNormal, "Deleted", "Cluster %s has been deleted", cluster.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this event could be emitted multiple times as it depends on how quickly the garbage collector in kube-controller-manager actually removes the object.
But I assume we don't care about how often this event is emitted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think that's acceptable. From the CAPI perspective the action happened so it's appropriate for us to emit the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is acceptable too
From a client-go perspective it should be possible. An informer gives you an old and new object and you can compare them to see if conditions have changed. The same is possible by adding predicates to a Reconciler in controller-runtime. We do something like this e.g. here in
I think what Alberto wanted to make you aware of is that when you build on top of events this can be very brittle as we don't consider events part of our API (similar to logs). Which means that any automation build on events might break at any time (~ CAPI minor releases). |
I don't consider the events part of the API contract as well. IMO, I don't think these new events are any different from the ones CAPI already emits for the kubeadm provider. The same limitations and conditions apply. |
Agree with this statement. What if we did start to define exported event names so that any changes to them are picked up by apidiff (like the Killing example mentioned above). This would then at least give some stability for people wanting to rely on them for some reason and make it less brittle. |
That's a good idea. What about the ones that are based on the |
My comment was only regarding what folks are building on top of this PR, not a blocker in any way to get this PR merged. It's probably better to move the discussion as a follow-up to a separate issue to not block this PR. I think the discussion is fundamentally "do we want to guarantee API stability for events" (and what does API stability for events exactly mean) and not just "how do we get the apidiff to fail" or "how do we define the events in our API package so they are technically a Go API". |
Since I have some use cases, I agree to open a discussion on this topic but I think it's not only related to CAPI controllers but more from the generic implementation of a controller. |
I agree on these statements, I mentioned it originally to start a discussion like this and not just "how do we get the apidiff to fail". Created discussion #8254 to carry on this. |
Sounds good. Coming back to this PR |
LGTM label has been added. Git tree hash: 9beb937cd41da71178da75ce49b799b40c326f8b
|
I also think: /lgtm |
/lgtm also, +1 to open a discussion thread on stability on events, it will be nice to be explicit about this |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
looks like a flake /test pull-cluster-api-e2e-main |
What this PR does / why we need it:
Emit Kubernetes Events when Cluster
status.Phase
,status.ControlPlaneReady
, orstatus.InfrastructureReady
change and when the Cluster is deleted.Which issue(s) this PR fixes:
relates to #7424 (but does not resolve it)
Output of events:
Note: In this example, the
SuccessfulSetNodeRefs
events were already being emitted by Cluster API.I will be doing a follow-up PR for the remaining aspects of #7424 as it will involve more work and potential coordination with the Cluster API Providers.