Skip to content

Enforce good selectors on MachineDeployments #6283

@killianmuldoon

Description

@killianmuldoon

As documented in #5957 it is possible to configure MachineDeployment Selectors in Cluster API minimally e.g. using the cluster name only, so that they are created correctly but are not unique enough to ensure correct adoption, for example in the event of a backup and restore. The default selector, set when no selector is set on the MachineDeployment, contains both the Cluster name and the MachineDeployment name and is sufficient to ensure adoption occurs correctly.

This configuration can cause unexpected rollouts to occur when attempting to adopt a cluster by a restored management cluster. It can also cause Machines to be adopted by the wrong MachineSet, putting the cluster in a working but invalid state.

Right now the correct form of selector is opaque to users and it only causes an issue on Day N when MachineDeployment and MachineSet adoption unexpectedly shows errors.

To solve this we could (from strongest to weakest):

  1. Add the default selectors to the MachineDeployment on creation, even if another selector is defined.
  2. Block the creation of MachineDeployments with insufficiently specific selectors?
  3. Create a webhook warning (requires Support warnings in webhook utils controller-runtime#1788) to advise users on creating MachineDeployments
  4. Document the requirement for MachineDeployment selectors.

I'd prefer to got with the stronger version of this to preclude any future occurrence of this issue. I'm not sure if there's workflows that would be blocked by making strong assumptions about the MachineDeployment / MachineSet selector.

Is there any reason we shouldn't default the selector to always include both the Cluster name and the MachineDeployment name?

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions