-
Notifications
You must be signed in to change notification settings - Fork 126
Description
How to categorize this issue?
/area usability
/kind enhancement
/priority 3
What would you like to be added:
Today, maxSurge
and maxUnavailable
values are configured at the worker pool level (ref). Provider extensions usually distribute the configured values if multiple multiple zones are configured (ref).
Although distributing these numbers is generally acceptable, it seems unclear to end-users and thus can end in an unacceptable and unexpected cluster upgrade behavior. This is especially true when maxSurge < len(zones)
and maxSurge < len(zones) && maxUnavailable < maxSurge
Example:
workers:
name: worker
machine:
type: n1-standard-4
image:
name: gardenlinux
version: 318.8.0
maximum: 5
minimum: 3
maxSurge: 1
maxUnavailable: 0
zones:
- europe-west1-a
- europe-west1-b
- europe-west1-c
This will result in 3 MachineDeployments
:
MachineDeployment | Zone | maxSurge | maxUnavailable |
---|---|---|---|
worker-z1 | europe-west1-a | 1 | 0 |
worker-z2 | europe-west1-b | 0 | 0 |
worker-z3 | europe-west1-b | 0 | 0 |
While the workers in europe-west1-a
are upgraded in a rolling fashion, the ones in europe-west1-b
and europe-west1-c
are just replaced. During the upgrade procedure, the cluster will have less Node
s then configured in workers[*].minimum
.
We see the following options to improve this user experience (only when maxSurge < len(zones)
):
- Change API validation so that
maxSurge >= len(zones)
--> incompatible and will probably many automation functionalities around Gardener. - Automatically set
maxSurge: 1
for each zone (suggested by @AxiomSamarth @himanshu-kun) --> solves many "standard" cases in whichmaxUnavailable
is not used. - The worker actuator sets the configured values zone by zone when an upgrade is performed --> comes close to what is expected by end-users but implies long running worker reconciliations.
- Other thoughts?
Why is this needed:
Needed for better user experience to avoid unexpected outages.