finalize multi-node control plane implementation #421

sbueringer · 2019-08-02T19:20:40Z

All kubeadm configs are now generated via go code. This allows us to
generate different configs for the first control plane node, every other
control plane node and normal worker nodes.

This also adds a flag to disablePortSecurity and it's now possible to
set the KUBECONFIG and KUBECONTEXT via environment variable.

What this PR does / why we need it:
This PR finalizes the multi-node control plane feature.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #382

Notes:

I verified it via a 3 master 3 node configuration with Neutron LBaaS and disabled port security
Let's discuss what kind of documentation we need

sbueringer · 2019-08-02T19:21:49Z

/assign @jichenjc
/assign @hidekazuna
/assign @chrigl
/assign @CamelCaseNotation

k8s-ci-robot · 2019-08-02T19:21:52Z

@sbueringer: GitHub didn't allow me to assign the following users: CamelCaseNotation.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @jichenjc
/assign @hidekazuna
/assign @chrigl
/assign @CamelCaseNotation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2019-08-03T05:59:55Z

@hidekazuna
I get the following errors in Prow:

go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/apiserver/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/csi-translation-lib/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/metrics/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cri-api/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/component-base/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-aggregator/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cluster-bootstrap/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-controller-manager/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/code-generator/@v/v0.0.0.info): 410 Gone
go: error loading module requirements
Makefile:29: recipe for target 'vendor' failed

I get the same when I execute make vendor locally. Do you know how to fix it?

EDIT: I think I fixed it, by specifying all these dependencies with kubernetes-1.13.4 and then adding replace entries for them in the go.mod file

hidekazuna · 2019-08-03T07:54:25Z

@hidekazuna
I get the following errors in Prow:

go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/apiserver/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/csi-translation-lib/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/metrics/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cri-api/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/component-base/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-aggregator/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cluster-bootstrap/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-controller-manager/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/code-generator/@v/v0.0.0.info): 410 Gone
go: error loading module requirements
Makefile:29: recipe for target 'vendor' failed

I get the same when I execute make vendor locally. Do you know how to fix it?

@sbueringer Adding k8s.io/kubernetes v1.13.4 to go.mod resolved the error in my local env.

sbueringer · 2019-08-05T18:01:09Z

@hidekazuna Only adding k8s.io/kubernetes wasn't enough in my case. But I found a solution. I think we can remove the replace stuff with v1alph2 anyway. I think they fixed it with 1.14 (at least the CAPA go.mod file looks like it)

pkg/cloud/openstack/cluster/actuator.go

pkg/cloud/openstack/services/userdata/kubeadm.go

hidekazuna · 2019-08-06T00:11:47Z

@hidekazuna Only adding k8s.io/kubernetes wasn't enough in my case. But I found a solution. I think we can remove the replace stuff with v1alph2 anyway. I think they fixed it with 1.14 (at least the CAPA go.mod file looks like it)

I had to add GOPROXY environment variable to reproduce the errors in my local env. And to fix this, I referred to CAPZ.

jichenjc · 2019-08-06T03:57:00Z

thanks a lot for the PR :)
I am still struggling with the LB issue so I can't test this locally with LB, just to confirm, we support ubuntu, rhel and coreos previous, with this change all those will still be supported (local tested and future we will have CI) ,right?

sbueringer · 2019-08-06T04:20:53Z

thanks a lot for the PR :)
I am still struggling with the LB issue so I can't test this locally with LB, just to confirm, we support ubuntu, rhel and coreos previous, with this change all those will still be supported (local tested and future we will have CI) ,right?

Yup. The idea, at least mine, is to support the same operating systems for now. But somebody has to help me test :).

I didn't try it but the goal should be that this PR also works without APIServer LoadBalancer and with single-node master.

If you want we can schedule a meeting, and I can take a look at your LB issue.

sbueringer · 2019-08-06T04:51:23Z

I'll add some documentation later today with example YAMLs how to deploy all this :)

sbueringer · 2019-08-11T07:31:28Z

I'll add some documentation later today with example YAMLs how to deploy all this :)

I'm currently deploying this by specifying multiple control plane machines and the following Cluster CRD:

apiVersion: "cluster.k8s.io/v1alpha1"
kind: Cluster
metadata:
  name: test
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["10.254.0.0/16"]
    pods:
      cidrBlocks: ["10.6.0.0/16"]
    serviceDomain: "cluster.local"
  providerSpec:
    value:
      apiVersion: "openstackproviderconfig/v1alpha1"
      kind: "OpenstackProviderSpec"
      nodeCidr: "10.6.0.0/24"
      managedAPIServerLoadBalancer: true
      apiServerLoadBalancerFloatingIP: 75.12.112.86
      apiServerLoadBalancerPort: 6443
      apiServerLoadBalancerAdditionalPorts:
        - 22
      dnsNameservers:
        - 75.12.7.144
      externalNetworkId: 49ab8e48-c542-410f-929c-ec9b1830f40c
      externalRouterIPs:
        - fixedIP: 75.12.112.81
          subnet:
            filter:
              name: ext_net_test
      managedSecurityGroups: false
      disablePortSecurity: true
      disableServerTags: true
      clusterConfiguration:
        controlPlaneEndpoint: 75.12.112.86:6443
        kubernetesVersion: 1.15.0

The control plane machines are configured like this:

  apiVersion: "cluster.k8s.io/v1alpha1"
  kind: Machine
  metadata:
    name: test-kube-master-01
    labels:
      set: master
      cluster.k8s.io/cluster-name: test
  spec:
    providerSpec:
      value:
        apiVersion: "openstackproviderconfig/v1alpha1"
        kind: "OpenstackProviderSpec"
        flavor: large
        image: coreos-2023.5.0
        keyName: cluster-api-provider-openstack
        availabilityZone: nova
        networks:
          - filter:
              name: k8s-cluster-default-test
            subnets:
              - filter:
                  name: k8s-cluster-default-test
        userDataSecret:
          name: master-user-data
          namespace: openstack-provider-system
        disableServerTags: true
    versions:
      kubelet: 1.15.0
      controlPlane: 1.15.0

Let's see at which configuration we land after the other PRs and you got this PR to work :) (@jichenjc @hidekazuna). Then I'll add some doc for multi-node control plane.

sbueringer · 2019-08-11T08:49:57Z

@hidekazuna Rebased onto current master (incl octavia support)

...sterctl/examples/openstack/provider-component/user-data/ubuntu/templates/master-user-data.sh

jichenjc · 2019-08-12T09:31:19Z

pkg/cloud/openstack/services/userdata/machinescript.go

-	PodCIDR           string
-	ServiceCIDR       string
-	GetMasterEndpoint func() (string, error)
+	KubeadmConfig string


we used .Cluster.Spec.ClusterNetwork.ServiceDomain in master startup scripts
not define here lead to issue about not find .Cluster.Spec.ClusterNetwork.ServiceDomain

You're right. But this is now handled via ClusterConfiguration:

cluster-api-provider-openstack/pkg/cloud/openstack/services/userdata/kubeadm.go

Line 72 in 8221be7

kubeadm.WithClusterNetworkFromClusterNetworkingConfig(cluster.Spec.ClusterNetwork),

kubeadm will add it automatically to the config file under: /var/lib/kubelet/config.yaml

So I removed the leftover code from master-user-data which set it manually.

I had the same error.

E0813 07:53:51.188082 1 actuator.go:424] Machine error openstack-master-cgpmj: error creating Openstack instance: template: star$ Up:11:30: executing "startUp" at <.Cluster.Spec.ClusterNetwork.ServiceDomain>: can't evaluate field Cluster in type userdata.setupParams

Actually Cluster-API v1alpha1 validates this:
https://github.com/kubernetes-sigs/cluster-api/blob/release-0.1/pkg/apis/cluster/v1alpha1/cluster_types.go#L131-L138

Have you tried my latest commit? ca0332b

It's still necessary to set it in the Cluster CRD but then it's just used to generate a kubeadm config via go code here:

cluster-api-provider-openstack/pkg/cloud/openstack/services/userdata/kubeadm.go

Line 72 in 8221be7

kubeadm.WithClusterNetworkFromClusterNetworkingConfig(cluster.Spec.ClusterNetwork),

So we just don't need it in the user data script anymore because it's configured via kubeadm config.

I get the latest commit e5cab15 and created the cluster successfully!

Great! Can you check if the cidrs etc are set in var/lib/kubelet/config.yaml?

/var/lib/kubelet/config.yaml and /etc/kubernetes/kubeadm_config.yaml seems fine. I added another master node successfully, meaning get nodes returns Ready Status.
But I failed to add worker node. Again missing .Cluster.Spec.ClusterNetwork.ServiceDomain error happened.

I0814 04:56:58.396114 1 kubeadm.go:136] Joining a worker node to the cluster E0814 04:56:58.409532 1 actuator.go:424] Machine error octavia-machinedeployment-5559b9ddd4-skpfm: error creating Openstack instance: template: startUp:10:30: executing "startUp" at <.Cluster.Spec.ClusterNetwork.ServiceDomain>: can't evaluate field Cluster in type userdata.setupParams

Okay also fixed it there, sorry.

...sterctl/examples/openstack/provider-component/user-data/ubuntu/templates/master-user-data.sh

All kubeadm configs are now generated via go code. This allows us to generate different configs for the first control plane node, every other control plane node and normal worker nodes. This also adds a flag to disablePortSecurity and it's now possible to set the KUBECONFIG and KUBECONTEXT via environment variable.

sbueringer · 2019-08-12T18:51:47Z

This PR on top might help to verify it: #435 (because it should overcome the restriction of creating floating ips)

jichenjc

@sbueringer
my test machine contains all code/settings is deleted by someone else
before that, I found this issue (cacert) and still struggling with last issue (kubeconfig has 443 port)

in order to restore system might takes me a few days... so test this will be slow down a little bit and I can do some code review during the period..

hope to merge this after my test or @hidekazuna can give some test if he has time

jichenjc · 2019-08-13T04:02:13Z

...sterctl/examples/openstack/provider-component/user-data/centos/templates/master-user-data.sh

-    mountPath: /etc/kubernetes/cloud.conf
-    name: cloud
-    readOnly: true
-  - hostPath: "/etc/certs/cacert"


this was missed in userdata/kubeadm.go

Just out of curiosity, do you know why this is needed? We don't need this on-premise, but our registry has a regular certificate

are you using https or http ? for me , it's a https environment, and openstack cloud you connected through the cacert,
unfortunately my env just broken so I can't show the error I had, it's actually kube controller container failed to start due to it can't find this file

so if we don't want the machines we created able to talk to openstack cloud, we can avoid this but we need fix somewhere else (I need find it later) to make kube controller able to start, or we need add cacert here ,after all ,it's previously there before this PR..

No it's okay, I'll add it back. Just wanted to understand for what it's used. I'll try to find out how it's done in our environments.

Okay I overlooked it in our on-prem installation because we're using the openstack cloud controller manager there instead.

But not sure how my Cluster Installation on CoreOS works currently. I'm:

using https

not configuring a cacert nor ignore tls

the CoreOS example user data uses the mount point but actually never rights the cacert file

(Maybe it's not working on CoreOS right now, but I"ll get ready nodes, never tried cinder though)

@jichenjc Unfortunately I do not have self-signed keystone environment now.
@sbueringer This cacert is for using self-signed keystone, will be ca-file in cloud.conf.
https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#global
I tested only Ubuntu though.

I can paste my test env later, I need cacert definitely otherwise the whole openstack APi can't be called

I added the mountpoint to controller-manager config

pkg/cloud/openstack/services/userdata/kubeadm.go

jichenjc · 2019-08-14T05:48:20Z

I have https env but I need more time to restore my test env.....
as @hidekazuna tested, how about we merge this and continue the test and submit fix later ?
@sbueringer ?

hidekazuna · 2019-08-14T06:01:50Z

@jichenjc I'm OK.
Reminder: Now I deployed one master node and added another master node on OpenStack Stein with Octavia and http keystone, but failed to add worker node.

jichenjc · 2019-08-14T06:20:10Z

ok, then we need wait for fix ,without worker is not an acceptable gate...

jichenjc · 2019-08-14T09:59:34Z

I made some modifications in worker-user-data including
remove the ServiceDomain mentioned above and the ServiceCIDR (new found), then I am able to boot a cluster without LB (one master + one node),

so @sbueringer can you fix then we can merge the code?

commit e5cab154382ce0579c3cc0ec48013b9400201f6d
Author: Stefan Bueringer <[email protected]>
Date:   Tue Aug 13 20:53:21 2019 +0200

NAME                     STATUS   ROLES    AGE     VERSION
openstack-master-v2f7h   Ready    master   9m58s   v1.15.0
openstack-node-gvwhl     Ready    node     5m6s    v1.15.0

sbueringer · 2019-08-14T16:28:31Z

@jichenjc fixed it for Ubuntu & CentOS. Please let me know if anything else goes wrong. I'm happy to fix it of course :)

cacert has also been added to controller-manager & apiserver. Not sure why apiserver might access OpenStack, but there are properties for the cloud-provider so I'm sure they know what they're doing ;)

jichenjc · 2019-08-15T01:49:12Z

let me add follow up patch based on your setting as you don't have env to test ubuntu and centos
@sbueringer

/approve
/lgtm

k8s-ci-robot · 2019-08-15T01:49:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jichenjc, sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jichenjc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jichenjc · 2019-08-15T01:48:03Z

...sterctl/examples/openstack/provider-component/user-data/ubuntu/templates/worker-user-data.sh

 MACHINE+={{ .Machine.ObjectMeta.Name }}
 CLUSTER_DNS_DOMAIN={{ .Cluster.Spec.ClusterNetwork.ServiceDomain }}
-POD_CIDR={{ .PodCIDR }}
 SERVICE_CIDR={{ .ServiceCIDR }}


still here... @sbueringer

sbueringer · 2019-08-15T04:28:06Z

let me add follow up patch based on your setting as you don't have env to test ubuntu and centos
@sbueringer

/approve
/lgtm

Yup that's okay. I pushed it on the wrong branch/PR :/

See: #435

* finalze multi-node control plane implementation All kubeadm configs are now generated via go code. This allows us to generate different configs for the first control plane node, every other control plane node and normal worker nodes. This also adds a flag to disablePortSecurity and it's now possible to set the KUBECONFIG and KUBECONTEXT via environment variable. * Fix dependencies * removed old test * review fixes * review fixes * add cacert to controller manager * fixup format

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2019

k8s-ci-robot requested review from jichenjc and roberthbailey August 2, 2019 19:20

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 2, 2019

k8s-ci-robot assigned chrigl, hidekazuna and jichenjc Aug 2, 2019

sbueringer changed the title ~~finalze multi-node control plane implementation~~ finalize multi-node control plane implementation Aug 5, 2019

texascloud reviewed Aug 5, 2019

View reviewed changes

pkg/cloud/openstack/cluster/actuator.go Outdated Show resolved Hide resolved

texascloud reviewed Aug 5, 2019

View reviewed changes

pkg/cloud/openstack/services/userdata/kubeadm.go Outdated Show resolved Hide resolved

texascloud reviewed Aug 5, 2019

View reviewed changes

pkg/cloud/openstack/services/userdata/kubeadm.go Show resolved Hide resolved

texascloud reviewed Aug 5, 2019

View reviewed changes

pkg/cloud/openstack/services/userdata/kubeadm.go Show resolved Hide resolved

hidekazuna mentioned this pull request Aug 10, 2019

Use Octavia #426

Merged

sbueringer force-pushed the pr-multi-node-control-plane branch from 9aa690d to 8221be7 Compare August 11, 2019 08:43

jichenjc mentioned this pull request Aug 12, 2019

add apiserver lb doc #432

Closed

jichenjc reviewed Aug 12, 2019

View reviewed changes

...sterctl/examples/openstack/provider-component/user-data/ubuntu/templates/master-user-data.sh Show resolved Hide resolved

sbueringer added 2 commits August 12, 2019 20:33

Fix dependencies

d60498d

sbueringer added 3 commits August 12, 2019 20:33

removed old test

50ccb19

review fixes

9c272a9

review fixes

ca0332b

sbueringer force-pushed the pr-multi-node-control-plane branch from 7bb1ca9 to ca0332b Compare August 12, 2019 18:33

sbueringer mentioned this pull request Aug 12, 2019

Pr create floating ip on demand #435

Merged

jichenjc reviewed Aug 13, 2019

View reviewed changes

sbueringer added 2 commits August 13, 2019 20:50

add cacert to controller manager

8d10359

fixup format

e5cab15

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2019

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2019

jichenjc reviewed Aug 15, 2019

View reviewed changes

k8s-ci-robot merged commit 42af510 into kubernetes-sigs:master Aug 15, 2019

jichenjc mentioned this pull request Aug 15, 2019

follow up PR 421 to make ubuntu and centos work #438

Merged

sbueringer deleted the pr-multi-node-control-plane branch August 15, 2019 04:28

EmilienM mentioned this pull request Jan 31, 2023

Manual rebase on upstream openshift/cluster-api-provider-openstack#252

Closed

finalize multi-node control plane implementation #421

finalize multi-node control plane implementation #421

Uh oh!

Conversation

sbueringer commented Aug 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbueringer commented Aug 2, 2019

Uh oh!

k8s-ci-robot commented Aug 2, 2019

Uh oh!

sbueringer commented Aug 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hidekazuna commented Aug 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbueringer commented Aug 5, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hidekazuna commented Aug 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jichenjc commented Aug 6, 2019

Uh oh!

sbueringer commented Aug 6, 2019

Uh oh!

sbueringer commented Aug 6, 2019

Uh oh!

sbueringer commented Aug 11, 2019

Uh oh!

sbueringer commented Aug 11, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sbueringer commented Aug 12, 2019

Uh oh!

jichenjc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbueringer Aug 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sbueringer commented Aug 2, 2019 •

edited

Loading

sbueringer commented Aug 3, 2019 •

edited

Loading

hidekazuna commented Aug 3, 2019 •

edited

Loading

hidekazuna commented Aug 6, 2019 •

edited

Loading

sbueringer Aug 12, 2019 •

edited

Loading

sbueringer Aug 13, 2019 •

edited

Loading

sbueringer Aug 13, 2019 •

edited

Loading

sbueringer Aug 13, 2019 •

edited

Loading

sbueringer Aug 13, 2019 •

edited

Loading

jichenjc commented Aug 14, 2019 •

edited

Loading

sbueringer commented Aug 14, 2019 •

edited

Loading