Skip to content

CRS controller should wait for the 'kubernetes' service ready before reconciling CRS objects #7804

@jessehu

Description

@jessehu

What steps did you take and what happened:
This is an issue reported in the slack: https://kubernetes.slack.com/archives/C8TSNPY4T/p1667740494784379
Did anyone hit the error that ClusterResourceSet controller applies objects in ClusterResourceSet too early before the Service kubernetes is created? I hit this error once this week, while using ClusterResourceSet to deploy kapp-controller which contains a Service kapp-controller/packaging-api . This service is assigned with the IP “10.96.0.1”, and then creating the Service kubernetes failed due to service IP conflict.

# k logs -n kube-system       kube-apiserver-mycluster-controlplane-pl4vn
E1106 12:09:27.196308       1 controller.go:240] unable to sync kubernetes service: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.96.0.1"}: failed to allocate IP 10.96.0.1: provided IP is already allocated
E1106 12:09:37.197558       1 controller.go:240] unable to sync kubernetes service: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.96.0.1"}: failed to allocate IP 10.96.0.1: provided IP is already allocated

# k get svc -A
NAMESPACE         NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kapp-controller   packaging-api   ClusterIP   10.96.0.1    <none>        443/TCP                  2d2h
kube-system       kube-dns        ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   2d2h

# k get node
NAME                           STATUS     ROLES           AGE    VERSION
mycluster-controlplane-pl4vn   NotReady   control-plane   2d3h   v1.24.4
mycluster-workergroup1-ccfcz   NotReady   <none>          2d3h   v1.24.4
mycluster-workergroup1-lmx7b   NotReady   <none>          2d3h   v1.24.4

The service object creation timestamp:
# k get svc -n kapp-controller   packaging-api -oyaml |grep creationTimestamp
  creationTimestamp: "2022-11-04T09:37:14Z"

Seems the CRS controller just gets the remote client for the workload cluster, but does not check if the Service kubernetes in the workload cluster has been created:
https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.7/exp/addons/internal/controllers/clusterresourceset_controller.go#L239-L247

What did you expect to happen:
kapp-controller CRS should be applied successfully

Anything else you would like to add:
We tried to workaround this issue by adding the wait logic before applying CRS objects like this:

err = wlcClient.Get(ctx, apitypes.NamespacedName{
	Namespace: metav1.NamespaceDefault,
	Name:      "kubernetes",
}, &corev1.Service{})
if err != nil && !apierrors.IsNotFound(err) {
	return reconcile.Result{}, err
}
if apierrors.IsNotFound(err) {
	ctx.Logger.Info("Wait for the Service kubernetes to be created")
	return reconcile.Result{RequeueAfter: NormalRequeueTimeout}, nil
}

Environment:

  • Cluster-api version: 1.2.7
  • minikube/kind version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions