Skip to content

Conversation

@sbueringer
Copy link
Member

@sbueringer sbueringer commented Aug 2, 2019

All kubeadm configs are now generated via go code. This allows us to
generate different configs for the first control plane node, every other
control plane node and normal worker nodes.

This also adds a flag to disablePortSecurity and it's now possible to
set the KUBECONFIG and KUBECONTEXT via environment variable.

What this PR does / why we need it:
This PR finalizes the multi-node control plane feature.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #382

Notes:

  • I verified it via a 3 master 3 node configuration with Neutron LBaaS and disabled port security
  • Let's discuss what kind of documentation we need

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 2, 2019
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 2, 2019
@sbueringer
Copy link
Member Author

/assign @jichenjc
/assign @hidekazuna
/assign @chrigl
/assign @CamelCaseNotation

@k8s-ci-robot
Copy link
Contributor

@sbueringer: GitHub didn't allow me to assign the following users: CamelCaseNotation.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @jichenjc
/assign @hidekazuna
/assign @chrigl
/assign @CamelCaseNotation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sbueringer
Copy link
Member Author

sbueringer commented Aug 3, 2019

@hidekazuna
I get the following errors in Prow:

go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/apiserver/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/csi-translation-lib/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/metrics/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cri-api/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/component-base/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-aggregator/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cluster-bootstrap/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-controller-manager/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/code-generator/@v/v0.0.0.info): 410 Gone
go: error loading module requirements
Makefile:29: recipe for target 'vendor' failed

I get the same when I execute make vendor locally. Do you know how to fix it?

EDIT: I think I fixed it, by specifying all these dependencies with kubernetes-1.13.4 and then adding replace entries for them in the go.mod file

@hidekazuna
Copy link
Contributor

hidekazuna commented Aug 3, 2019

@hidekazuna
I get the following errors in Prow:

go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/apiserver/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/csi-translation-lib/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/metrics/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cri-api/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/component-base/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-aggregator/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/cluster-bootstrap/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/kube-controller-manager/@v/v0.0.0.info): 410 Gone
go: k8s.io/[email protected]: unexpected status (https://proxy.golang.org/k8s.io/code-generator/@v/v0.0.0.info): 410 Gone
go: error loading module requirements
Makefile:29: recipe for target 'vendor' failed

I get the same when I execute make vendor locally. Do you know how to fix it?

@sbueringer Adding k8s.io/kubernetes v1.13.4 to go.mod resolved the error in my local env.

@sbueringer
Copy link
Member Author

@hidekazuna Only adding k8s.io/kubernetes wasn't enough in my case. But I found a solution. I think we can remove the replace stuff with v1alph2 anyway. I think they fixed it with 1.14 (at least the CAPA go.mod file looks like it)

@sbueringer sbueringer changed the title finalze multi-node control plane implementation finalize multi-node control plane implementation Aug 5, 2019
@hidekazuna
Copy link
Contributor

hidekazuna commented Aug 6, 2019

@hidekazuna Only adding k8s.io/kubernetes wasn't enough in my case. But I found a solution. I think we can remove the replace stuff with v1alph2 anyway. I think they fixed it with 1.14 (at least the CAPA go.mod file looks like it)

I had to add GOPROXY environment variable to reproduce the errors in my local env. And to fix this, I referred to CAPZ.

@jichenjc
Copy link
Contributor

jichenjc commented Aug 6, 2019

thanks a lot for the PR :)
I am still struggling with the LB issue so I can't test this locally with LB, just to confirm, we support ubuntu, rhel and coreos previous, with this change all those will still be supported (local tested and future we will have CI) ,right?

@sbueringer
Copy link
Member Author

thanks a lot for the PR :)
I am still struggling with the LB issue so I can't test this locally with LB, just to confirm, we support ubuntu, rhel and coreos previous, with this change all those will still be supported (local tested and future we will have CI) ,right?

Yup. The idea, at least mine, is to support the same operating systems for now. But somebody has to help me test :).

I didn't try it but the goal should be that this PR also works without APIServer LoadBalancer and with single-node master.

If you want we can schedule a meeting, and I can take a look at your LB issue.

@sbueringer
Copy link
Member Author

I'll add some documentation later today with example YAMLs how to deploy all this :)

@hidekazuna hidekazuna mentioned this pull request Aug 10, 2019
@sbueringer
Copy link
Member Author

I'll add some documentation later today with example YAMLs how to deploy all this :)

I'm currently deploying this by specifying multiple control plane machines and the following Cluster CRD:

apiVersion: "cluster.k8s.io/v1alpha1"
kind: Cluster
metadata:
  name: test
spec:
  clusterNetwork:
    services:
      cidrBlocks: ["10.254.0.0/16"]
    pods:
      cidrBlocks: ["10.6.0.0/16"]
    serviceDomain: "cluster.local"
  providerSpec:
    value:
      apiVersion: "openstackproviderconfig/v1alpha1"
      kind: "OpenstackProviderSpec"
      nodeCidr: "10.6.0.0/24"
      managedAPIServerLoadBalancer: true
      apiServerLoadBalancerFloatingIP: 75.12.112.86
      apiServerLoadBalancerPort: 6443
      apiServerLoadBalancerAdditionalPorts:
        - 22
      dnsNameservers:
        - 75.12.7.144
      externalNetworkId: 49ab8e48-c542-410f-929c-ec9b1830f40c
      externalRouterIPs:
        - fixedIP: 75.12.112.81
          subnet:
            filter:
              name: ext_net_test
      managedSecurityGroups: false
      disablePortSecurity: true
      disableServerTags: true
      clusterConfiguration:
        controlPlaneEndpoint: 75.12.112.86:6443
        kubernetesVersion: 1.15.0

The control plane machines are configured like this:

  apiVersion: "cluster.k8s.io/v1alpha1"
  kind: Machine
  metadata:
    name: test-kube-master-01
    labels:
      set: master
      cluster.k8s.io/cluster-name: test
  spec:
    providerSpec:
      value:
        apiVersion: "openstackproviderconfig/v1alpha1"
        kind: "OpenstackProviderSpec"
        flavor: large
        image: coreos-2023.5.0
        keyName: cluster-api-provider-openstack
        availabilityZone: nova
        networks:
          - filter:
              name: k8s-cluster-default-test
            subnets:
              - filter:
                  name: k8s-cluster-default-test
        userDataSecret:
          name: master-user-data
          namespace: openstack-provider-system
        disableServerTags: true
    versions:
      kubelet: 1.15.0
      controlPlane: 1.15.0

Let's see at which configuration we land after the other PRs and you got this PR to work :) (@jichenjc @hidekazuna). Then I'll add some doc for multi-node control plane.

@sbueringer sbueringer force-pushed the pr-multi-node-control-plane branch from 9aa690d to 8221be7 Compare August 11, 2019 08:43
@sbueringer
Copy link
Member Author

@hidekazuna Rebased onto current master (incl octavia support)

@jichenjc jichenjc mentioned this pull request Aug 12, 2019
PodCIDR string
ServiceCIDR string
GetMasterEndpoint func() (string, error)
KubeadmConfig string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we used .Cluster.Spec.ClusterNetwork.ServiceDomain in master startup scripts
not define here lead to issue about not find .Cluster.Spec.ClusterNetwork.ServiceDomain

Copy link
Member Author

@sbueringer sbueringer Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. But this is now handled via ClusterConfiguration:

kubeadm.WithClusterNetworkFromClusterNetworkingConfig(cluster.Spec.ClusterNetwork),

kubeadm will add it automatically to the config file under: /var/lib/kubelet/config.yaml

So I removed the leftover code from master-user-data which set it manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same error.

E0813 07:53:51.188082       1 actuator.go:424] Machine error openstack-master-cgpmj: error creating Openstack instance: template: star$
Up:11:30: executing "startUp" at <.Cluster.Spec.ClusterNetwork.ServiceDomain>: can't evaluate field Cluster in type userdata.setupParams

Actually Cluster-API v1alpha1 validates this:
https://github.com/kubernetes-sigs/cluster-api/blob/release-0.1/pkg/apis/cluster/v1alpha1/cluster_types.go#L131-L138

Copy link
Member Author

@sbueringer sbueringer Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried my latest commit? ca0332b

It's still necessary to set it in the Cluster CRD but then it's just used to generate a kubeadm config via go code here:

kubeadm.WithClusterNetworkFromClusterNetworkingConfig(cluster.Spec.ClusterNetwork),

So we just don't need it in the user data script anymore because it's configured via kubeadm config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the latest commit e5cab15 and created the cluster successfully!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Can you check if the cidrs etc are set in var/lib/kubelet/config.yaml?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/var/lib/kubelet/config.yaml and /etc/kubernetes/kubeadm_config.yaml seems fine. I added another master node successfully, meaning get nodes returns Ready Status.
But I failed to add worker node. Again missing .Cluster.Spec.ClusterNetwork.ServiceDomain error happened.

I0814 04:56:58.396114       1 kubeadm.go:136] Joining a worker node to the cluster
E0814 04:56:58.409532       1 actuator.go:424] Machine error octavia-machinedeployment-5559b9ddd4-skpfm: error creating Openstack instance: template: startUp:10:30: executing "startUp" at <.Cluster.Spec.ClusterNetwork.ServiceDomain>: can't evaluate field Cluster in type userdata.setupParams

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay also fixed it there, sorry.

All kubeadm configs are now generated via go code. This allows us to
generate different configs for the first control plane node, every other
control plane node and normal worker nodes.

This also adds a flag to disablePortSecurity and it's now possible to
set the KUBECONFIG and KUBECONTEXT via environment variable.
@sbueringer sbueringer force-pushed the pr-multi-node-control-plane branch from 7bb1ca9 to ca0332b Compare August 12, 2019 18:33
@sbueringer
Copy link
Member Author

This PR on top might help to verify it: #435 (because it should overcome the restriction of creating floating ips)

Copy link
Contributor

@jichenjc jichenjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbueringer
my test machine contains all code/settings is deleted by someone else
before that, I found this issue (cacert) and still struggling with last issue (kubeconfig has 443 port)

in order to restore system might takes me a few days... so test this will be slow down a little bit and I can do some code review during the period..

hope to merge this after my test or @hidekazuna can give some test if he has time

mountPath: /etc/kubernetes/cloud.conf
name: cloud
readOnly: true
- hostPath: "/etc/certs/cacert"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was missed in userdata/kubeadm.go

Copy link
Member Author

@sbueringer sbueringer Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, do you know why this is needed? We don't need this on-premise, but our registry has a regular certificate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using https or http ? for me , it's a https environment, and openstack cloud you connected through the cacert,
unfortunately my env just broken so I can't show the error I had, it's actually kube controller container failed to start due to it can't find this file

so if we don't want the machines we created able to talk to openstack cloud, we can avoid this but we need fix somewhere else (I need find it later) to make kube controller able to start, or we need add cacert here ,after all ,it's previously there before this PR..

Copy link
Member Author

@sbueringer sbueringer Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's okay, I'll add it back. Just wanted to understand for what it's used. I'll try to find out how it's done in our environments.

Copy link
Member Author

@sbueringer sbueringer Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I overlooked it in our on-prem installation because we're using the openstack cloud controller manager there instead.

But not sure how my Cluster Installation on CoreOS works currently. I'm:

  • using https
  • not configuring a cacert nor ignore tls
  • the CoreOS example user data uses the mount point but actually never rights the cacert file

(Maybe it's not working on CoreOS right now, but I"ll get ready nodes, never tried cinder though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jichenjc Unfortunately I do not have self-signed keystone environment now.
@sbueringer This cacert is for using self-signed keystone, will be ca-file in cloud.conf.
https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#global
I tested only Ubuntu though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can paste my test env later, I need cacert definitely otherwise the whole openstack APi can't be called

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the mountpoint to controller-manager config

@jichenjc
Copy link
Contributor

I have https env but I need more time to restore my test env.....
as @hidekazuna tested, how about we merge this and continue the test and submit fix later ?
@sbueringer ?

@hidekazuna
Copy link
Contributor

@jichenjc I'm OK.
Reminder: Now I deployed one master node and added another master node on OpenStack Stein with Octavia and http keystone, but failed to add worker node.

@jichenjc
Copy link
Contributor

ok, then we need wait for fix ,without worker is not an acceptable gate...

@jichenjc
Copy link
Contributor

jichenjc commented Aug 14, 2019

I made some modifications in worker-user-data including
remove the ServiceDomain mentioned above and the ServiceCIDR (new found), then I am able to boot a cluster without LB (one master + one node),

so @sbueringer can you fix then we can merge the code?

commit e5cab154382ce0579c3cc0ec48013b9400201f6d
Author: Stefan Bueringer <[email protected]>
Date:   Tue Aug 13 20:53:21 2019 +0200

NAME                     STATUS   ROLES    AGE     VERSION
openstack-master-v2f7h   Ready    master   9m58s   v1.15.0
openstack-node-gvwhl     Ready    node     5m6s    v1.15.0

@sbueringer
Copy link
Member Author

sbueringer commented Aug 14, 2019

@jichenjc fixed it for Ubuntu & CentOS. Please let me know if anything else goes wrong. I'm happy to fix it of course :)

cacert has also been added to controller-manager & apiserver. Not sure why apiserver might access OpenStack, but there are properties for the cloud-provider so I'm sure they know what they're doing ;)

@jichenjc
Copy link
Contributor

let me add follow up patch based on your setting as you don't have env to test ubuntu and centos
@sbueringer

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jichenjc, sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2019
MACHINE+={{ .Machine.ObjectMeta.Name }}
CLUSTER_DNS_DOMAIN={{ .Cluster.Spec.ClusterNetwork.ServiceDomain }}
POD_CIDR={{ .PodCIDR }}
SERVICE_CIDR={{ .ServiceCIDR }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still here... @sbueringer

@k8s-ci-robot k8s-ci-robot merged commit 42af510 into kubernetes-sigs:master Aug 15, 2019
@sbueringer
Copy link
Member Author

let me add follow up patch based on your setting as you don't have env to test ubuntu and centos
@sbueringer

/approve
/lgtm

Yup that's okay. I pushed it on the wrong branch/PR :/

See: #435

@sbueringer sbueringer deleted the pr-multi-node-control-plane branch August 15, 2019 04:28
pierreprinetti pushed a commit to shiftstack/cluster-api-provider-openstack that referenced this pull request Apr 22, 2024
* finalze multi-node control plane implementation

All kubeadm configs are now generated via go code. This allows us to
generate different configs for the first control plane node, every other
control plane node and normal worker nodes.

This also adds a flag to disablePortSecurity and it's now possible to
set the KUBECONFIG and KUBECONTEXT via environment variable.

* Fix dependencies

* removed old test

* review fixes

* review fixes

* add cacert to controller manager

* fixup format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for multi-node control plane

6 participants