-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
As documented in #5957 it is possible to configure MachineDeployment Selectors in Cluster API minimally e.g. using the cluster name only, so that they are created correctly but are not unique enough to ensure correct adoption, for example in the event of a backup and restore. The default selector, set when no selector is set on the MachineDeployment, contains both the Cluster name and the MachineDeployment name and is sufficient to ensure adoption occurs correctly.
This configuration can cause unexpected rollouts to occur when attempting to adopt a cluster by a restored management cluster. It can also cause Machines to be adopted by the wrong MachineSet, putting the cluster in a working but invalid state.
Right now the correct form of selector is opaque to users and it only causes an issue on Day N when MachineDeployment and MachineSet adoption unexpectedly shows errors.
To solve this we could (from strongest to weakest):
- Add the default selectors to the MachineDeployment on creation, even if another selector is defined.
- Block the creation of MachineDeployments with insufficiently specific selectors?
- Create a webhook warning (requires Support warnings in webhook utils controller-runtime#1788) to advise users on creating MachineDeployments
- Document the requirement for MachineDeployment selectors.
I'd prefer to got with the stronger version of this to preclude any future occurrence of this issue. I'm not sure if there's workflows that would be blocked by making strong assumptions about the MachineDeployment / MachineSet selector.
Is there any reason we shouldn't default the selector to always include both the Cluster name and the MachineDeployment name?