From 6b15d549ebf7398c0a2181a6497279873d898eeb Mon Sep 17 00:00:00 2001 From: Piotr Zaniewski Date: Tue, 4 Nov 2025 15:33:35 +0100 Subject: [PATCH 1/2] fix(vcluster): etcd migration and recovery --- .../backing-store/etcd/embedded.mdx | 144 ++++++++++++------ 1 file changed, 97 insertions(+), 47 deletions(-) diff --git a/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx b/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx index d8a9581cf..5d550457d 100644 --- a/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx +++ b/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx @@ -157,102 +157,152 @@ kubectl logs [[VAR:VCLUSTER NAME:my-vcluster]]-0 -n [[VAR:NAMESPACE:vcluster-my- - - -Stop all vCluster instances: +:::warning +Before attempting any recovery procedure, create a backup of the virtual cluster namespace on the host cluster. If using namespace syncing, back up all synced namespaces as well. +::: - -
+ + -Confirm all pods have terminated: + + +Delete the corrupted pod and PVC for replica-0: - + +
+ +The pod restarts with a new empty PVC. After 1-3 pod restarts, the automatic recovery adds it back to the etcd cluster.
- -Delete the corrupted PVC for the first replica: + +Monitor the recovery process: -
-Verify the PVC has been deleted: +Check the logs to verify the pod rejoins successfully: -
+
- -Create a new PVC by [copying from a working replica](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#volume-cloning): +
- + +:::caution +If more than one pod is down with `podManagementPolicy: OrderedReady`, migrate to `Parallel` first before attempting recovery. +::: + + + +Check that the StatefulSet retains PVCs on deletion: + +
-Apply the PVC: +The policy should be `Retain`. This is the default but can be overridden by `controlPlane.statefulSet.persistence.volumeClaim.retentionPolicy` in your configuration. +
- +Delete the StatefulSet without deleting the pods: + +
- -Start with one replica to verify the restored data: + +Update your virtual cluster configuration to use `Parallel` pod management policy. -
-Monitor the startup: +Add or update the following configuration: - + +
+ +If using Helm, update your `values.yaml` and run: + + + +
+ +The StatefulSet is recreated with `Parallel` policy and pods pick up the existing PVCs. +
+ + +Now follow the same procedure as for `Parallel` mode: + +
-After it's stable, scale up to the desired number of replicas. +The pod restarts with a new empty PVC and automatic recovery adds it back to the cluster after 1-3 pod restarts.
+:::warning +Never clone PVCs from other replicas. Cloning PVCs causes etcd member ID conflicts and results in data loss. +::: + + + + ### Complete data loss recovery :::warning From 29373327b577234fd7d9289d339bc14a326ba1a8 Mon Sep 17 00:00:00 2001 From: Piotr Zaniewski Date: Wed, 5 Nov 2025 10:53:57 +0100 Subject: [PATCH 2/2] refactor: address pr feedback --- .../backing-store/etcd/embedded.mdx | 257 ++++++++++++++---- 1 file changed, 210 insertions(+), 47 deletions(-) diff --git a/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx b/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx index 5d550457d..87e28142c 100644 --- a/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx +++ b/vcluster/configure/vcluster-yaml/control-plane/components/backing-store/etcd/embedded.mdx @@ -9,6 +9,7 @@ description: Configure an embedded etcd instance as the virtual cluster's backin import ConfigReference from '../../../../../../_partials/config/controlPlane/backingStore/etcd/embedded.mdx' import ProAdmonition from '../../../../../../_partials/admonitions/pro-admonition.mdx' import InterpolatedCodeBlock from "@site/src/components/InterpolatedCodeBlock"; +import PageVariables from "@site/src/components/PageVariables"; import Flow, { Step } from '@site/src/components/Flow'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; @@ -95,6 +96,26 @@ Normal pod restarts or terminations do not require manual recovery. These events Recovery procedures depend on whether the first replica (the pod ending with `-0`) is among the failing replicas. +:::note +The recovery procedure for the first replica also depends on your StatefulSet's `podManagementPolicy` configuration (`Parallel` or `OrderedReady`). See the [first replica recovery section](#migrate-to-parallel) for details on migrating between policies if needed. +::: + +:::info Find your vCluster namespace +If using VirtualClusterInstance (platform), the vCluster StatefulSet runs in a different namespace than the VirtualClusterInstance itself. Find the StatefulSet namespace with: +```bash +kubectl get virtualclusterinstance -n -o jsonpath='{.spec.clusterRef.namespace}' +``` +For example, if your VirtualClusterInstance is named `my-vcluster` in the `p-default` namespace, the StatefulSet might be in `vcluster-my-vcluster-p-default`. + +If using Helm, the namespace is what you specified during installation (e.g., `vcluster-my-team`). +::: + + + Use the following procedures when some replicas are still functioning:
@@ -106,7 +127,7 @@ Use the following procedures when some replicas are still functioning: Scale the StatefulSet to one replica: @@ -115,7 +136,7 @@ Scale the StatefulSet to one replica: Verify only one pod is running: @@ -124,7 +145,7 @@ Verify only one pod is running: Monitor the rebuild process: @@ -137,7 +158,7 @@ Watch for log messages indicating etcd is ready and the cluster is in good condi Scale back up to your target replica count: @@ -146,8 +167,8 @@ Scale back up to your target replica count: Verify all replicas are running: @@ -158,15 +179,28 @@ kubectl logs [[VAR:VCLUSTER NAME:my-vcluster]]-0 -n [[VAR:NAMESPACE:vcluster-my- :::warning -Before attempting any recovery procedure, create a backup of the virtual cluster namespace on the host cluster. If using namespace syncing, back up all synced namespaces as well. +Before attempting any recovery procedure, [create a backup](../../../../../../manage/backup-restore/backup.mdx) of your virtual cluster using `vcluster snapshot create --include-volumes`. This ensures both the virtual cluster's etcd data and persistent volumes are backed up. + +If the virtual cluster's etcd is in a bad state and the snapshot command fails, you can still back up from the host cluster (which has its own functioning etcd). Use your preferred backup solution (e.g., Velero, Kasten, or cloud-native backup tools) to back up the host cluster namespace containing the vCluster resources. Ensure the backup includes: +- All Kubernetes resources in the vCluster namespace (StatefulSet, Services, etc.) +- PersistentVolumeClaims and their associated volume data (contains the virtual cluster's etcd data) +- Secrets and ConfigMaps + +When restored, the vCluster pods will restart and the virtual cluster will be recreated from the backed-up etcd data. + +If using namespace syncing, back up all synced namespaces on the host cluster as well. ::: The recovery procedure depends on your StatefulSet `podManagementPolicy` configuration. vCluster version 0.20 and later use `Parallel` by default. Earlier versions used `OrderedReady`. +:::info +If more than one pod is down with `podManagementPolicy: OrderedReady`, you must first [migrate to `Parallel`](#migrate-to-parallel) before attempting recovery. +::: + Check your configuration: @@ -175,24 +209,33 @@ Check your configuration: -Delete the corrupted pod and PVC for replica-0: +First, identify the PVC for replica-0:
-The pod restarts with a new empty PVC. After 1-3 pod restarts, the automatic recovery adds it back to the etcd cluster. +The PVC name typically follows the pattern `data--0` but may vary if customized in your configuration. Note the exact name from the output above, then delete the corrupted pod and its PVC: + + + +
+ +The pod restarts with a new empty PVC. The initial attempts fail because the new member tries to join the existing etcd cluster but lacks the required data. After 1-3 pod restarts, vCluster's automatic recovery detects the empty member and properly adds it as a new learner, allowing it to sync data from healthy members and join the cluster.
Monitor the recovery process: @@ -201,7 +244,7 @@ Monitor the recovery process: Check the logs to verify the pod rejoins successfully: @@ -220,7 +263,7 @@ If more than one pod is down with `podManagementPolicy: OrderedReady`, migrate t Check that the StatefulSet retains PVCs on deletion: @@ -233,41 +276,53 @@ The policy should be `Retain`. This is the default but can be overridden by `con Delete the StatefulSet without deleting the pods: + + Update your virtual cluster configuration to use `Parallel` pod management policy. -If using a VirtualClusterInstance: +If using a VirtualClusterInstance, edit the instance and update the `podManagementPolicy`: +Then add or update this section in the spec: + +```yaml +spec: + template: + helmRelease: + values: | + controlPlane: + statefulSet: + scheduling: + podManagementPolicy: Parallel +``` +
-Add or update the following configuration: +If using Helm, update your `values.yaml` to set the pod management policy: - - -
+ podManagementPolicy: Parallel +``` -If using Helm, update your `values.yaml` and run: +Then apply the update: -Now follow the same procedure as for `Parallel` mode: +Now follow the same procedure as for `Parallel` mode. + +First, identify the PVC for replica-0:
-The pod restarts with a new empty PVC and automatic recovery adds it back to the cluster after 1-3 pod restarts. +The PVC name typically follows the pattern `data--0` but may vary if customized in your configuration. Note the exact name from the output above, then delete the corrupted pod and its PVC: + + + +
+ +The pod restarts with a new empty PVC. The initial attempts fail because the new member tries to join the existing etcd cluster but lacks the required data. After 1-3 pod restarts, vCluster's automatic recovery detects the empty member and properly adds it as a new learner, allowing it to sync data from healthy members and join the cluster.
@@ -311,19 +377,28 @@ This recovery method results in data loss up to the last backup point. Only proc When the majority of etcd member replicas become corrupted or deleted simultaneously, the entire cluster requires recovery from backup. +:::info Prerequisites +Before starting recovery, ensure you have: +- Created a snapshot using `vcluster snapshot create --include-volumes ` +- The snapshot location URL (for example, `s3://my-bucket/backup` or `oci://registry/repo:tag`) +- Access to the host cluster namespace where the vCluster is deployed + +For detailed snapshot creation instructions, see [Create snapshots](../../../../../../manage/backup-restore/backup). +::: + Verify all PVCs are corrupted or inaccessible:
@@ -332,53 +407,141 @@ Verify all PVCs are corrupted or inaccessible: Stop all vCluster instances before beginning recovery: + +
+ +Verify all pods have terminated: + + + +:::warning PVC deletion timing +After scaling down, wait a few seconds to ensure pods have fully terminated before deleting PVCs. If a pod restarts immediately after PVC deletion, the PVC may get stuck in a "Terminating" state. If this happens, delete the pod again to allow the PVC deletion to complete. +::: + Delete all corrupted PVCs: + +
+ +Verify PVCs are deleted: + + + +Expected output: `No resources found`
- -Follow a backup restoration procedure. This typically involves restoring PVCs from your backup solution (Velero, CSI snapshots, or similar tools). + + +:::info Why scale up before restore? +The vCluster CLI requires an accessible vCluster instance to execute the restore command. Scaling up creates a new, empty vCluster that the CLI can connect to. The `vcluster restore` command will then scale it back down automatically, restore the etcd data from the snapshot, and restart the vCluster with restored data. +::: + +Scale up to the desired number of replicas: + +
-Restore from snapshot: +Wait for pods to be running: + +
+ +Expected output showing all replicas running: +``` +NAME READY STATUS RESTARTS AGE +my-vcluster-0 1/1 Running 0 45s +my-vcluster-1 1/1 Running 0 43s +my-vcluster-2 1/1 Running 0 41s +``` +
+ + +Use the vCluster CLI to restore from your snapshot. The restore process will: +1. Pause the vCluster (scale down to 0) +2. Delete the current PVCs +3. Start a snapshot pod to restore etcd data +4. Restore PVCs from volume snapshots +5. Resume the vCluster (scale back up) + + + +
+ +Expected output: +``` +16:16:38 info Pausing vCluster my-vcluster +16:16:38 info Scale down statefulSet vcluster-my-team/my-vcluster... +16:16:39 info Deleting vCluster pvc vcluster-my-team/data-my-vcluster-0 +16:16:39 info Deleting vCluster pvc vcluster-my-team/data-my-vcluster-1 +16:16:39 info Deleting vCluster pvc vcluster-my-team/data-my-vcluster-2 +16:16:39 info Starting snapshot pod for vCluster vcluster-my-team/my-vcluster... +... +Successfully restored snapshot +16:16:42 info Resuming vCluster my-vcluster +``` + +:::note Authentication for remote storage +If using S3 or OCI registry, ensure you have the appropriate credentials configured: +- **S3**: Use AWS CLI credentials or pass credentials in the URL +- **OCI**: Use Docker login or pass credentials in the URL + +See [Create snapshots](../../../../../../manage/backup-restore/backup) for authentication details. +:::
- -Scale up to a single replica to verify the restoration: + +Connect to the vCluster and verify your workloads are restored:
-Monitor logs and verify the cluster starts successfully: +Check that your resources are present:
-After it's verified, scale to the desired number of replicas. +If everything looks correct, disconnect: + +