-
Notifications
You must be signed in to change notification settings - Fork 284
finalize multi-node control plane implementation #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finalize multi-node control plane implementation #421
Conversation
|
/assign @jichenjc |
|
@sbueringer: GitHub didn't allow me to assign the following users: CamelCaseNotation. Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@hidekazuna I get the same when I execute EDIT: I think I fixed it, by specifying all these dependencies with kubernetes-1.13.4 and then adding replace entries for them in the |
@sbueringer Adding k8s.io/kubernetes v1.13.4 to go.mod resolved the error in my local env. |
|
@hidekazuna Only adding k8s.io/kubernetes wasn't enough in my case. But I found a solution. I think we can remove the replace stuff with v1alph2 anyway. I think they fixed it with 1.14 (at least the CAPA go.mod file looks like it) |
I had to add GOPROXY environment variable to reproduce the errors in my local env. And to fix this, I referred to CAPZ. |
|
thanks a lot for the PR :) |
Yup. The idea, at least mine, is to support the same operating systems for now. But somebody has to help me test :). I didn't try it but the goal should be that this PR also works without APIServer LoadBalancer and with single-node master. If you want we can schedule a meeting, and I can take a look at your LB issue. |
|
I'll add some documentation later today with example YAMLs how to deploy all this :) |
I'm currently deploying this by specifying multiple control plane machines and the following Cluster CRD: The control plane machines are configured like this: Let's see at which configuration we land after the other PRs and you got this PR to work :) (@jichenjc @hidekazuna). Then I'll add some doc for multi-node control plane. |
9aa690d to
8221be7
Compare
|
@hidekazuna Rebased onto current master (incl octavia support) |
...sterctl/examples/openstack/provider-component/user-data/ubuntu/templates/master-user-data.sh
Outdated
Show resolved
Hide resolved
| PodCIDR string | ||
| ServiceCIDR string | ||
| GetMasterEndpoint func() (string, error) | ||
| KubeadmConfig string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used .Cluster.Spec.ClusterNetwork.ServiceDomain in master startup scripts
not define here lead to issue about not find .Cluster.Spec.ClusterNetwork.ServiceDomain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. But this is now handled via ClusterConfiguration:
| kubeadm.WithClusterNetworkFromClusterNetworkingConfig(cluster.Spec.ClusterNetwork), |
kubeadm will add it automatically to the config file under: /var/lib/kubelet/config.yaml
So I removed the leftover code from master-user-data which set it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same error.
E0813 07:53:51.188082 1 actuator.go:424] Machine error openstack-master-cgpmj: error creating Openstack instance: template: star$
Up:11:30: executing "startUp" at <.Cluster.Spec.ClusterNetwork.ServiceDomain>: can't evaluate field Cluster in type userdata.setupParamsActually Cluster-API v1alpha1 validates this:
https://github.com/kubernetes-sigs/cluster-api/blob/release-0.1/pkg/apis/cluster/v1alpha1/cluster_types.go#L131-L138
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried my latest commit? ca0332b
It's still necessary to set it in the Cluster CRD but then it's just used to generate a kubeadm config via go code here:
| kubeadm.WithClusterNetworkFromClusterNetworkingConfig(cluster.Spec.ClusterNetwork), |
So we just don't need it in the user data script anymore because it's configured via kubeadm config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the latest commit e5cab15 and created the cluster successfully!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Can you check if the cidrs etc are set in var/lib/kubelet/config.yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/var/lib/kubelet/config.yaml and /etc/kubernetes/kubeadm_config.yaml seems fine. I added another master node successfully, meaning get nodes returns Ready Status.
But I failed to add worker node. Again missing .Cluster.Spec.ClusterNetwork.ServiceDomain error happened.
I0814 04:56:58.396114 1 kubeadm.go:136] Joining a worker node to the cluster
E0814 04:56:58.409532 1 actuator.go:424] Machine error octavia-machinedeployment-5559b9ddd4-skpfm: error creating Openstack instance: template: startUp:10:30: executing "startUp" at <.Cluster.Spec.ClusterNetwork.ServiceDomain>: can't evaluate field Cluster in type userdata.setupParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay also fixed it there, sorry.
...sterctl/examples/openstack/provider-component/user-data/ubuntu/templates/master-user-data.sh
Show resolved
Hide resolved
All kubeadm configs are now generated via go code. This allows us to generate different configs for the first control plane node, every other control plane node and normal worker nodes. This also adds a flag to disablePortSecurity and it's now possible to set the KUBECONFIG and KUBECONTEXT via environment variable.
7bb1ca9 to
ca0332b
Compare
|
This PR on top might help to verify it: #435 (because it should overcome the restriction of creating floating ips) |
jichenjc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbueringer
my test machine contains all code/settings is deleted by someone else
before that, I found this issue (cacert) and still struggling with last issue (kubeconfig has 443 port)
in order to restore system might takes me a few days... so test this will be slow down a little bit and I can do some code review during the period..
hope to merge this after my test or @hidekazuna can give some test if he has time
| mountPath: /etc/kubernetes/cloud.conf | ||
| name: cloud | ||
| readOnly: true | ||
| - hostPath: "/etc/certs/cacert" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was missed in userdata/kubeadm.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, do you know why this is needed? We don't need this on-premise, but our registry has a regular certificate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you using https or http ? for me , it's a https environment, and openstack cloud you connected through the cacert,
unfortunately my env just broken so I can't show the error I had, it's actually kube controller container failed to start due to it can't find this file
so if we don't want the machines we created able to talk to openstack cloud, we can avoid this but we need fix somewhere else (I need find it later) to make kube controller able to start, or we need add cacert here ,after all ,it's previously there before this PR..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's okay, I'll add it back. Just wanted to understand for what it's used. I'll try to find out how it's done in our environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I overlooked it in our on-prem installation because we're using the openstack cloud controller manager there instead.
But not sure how my Cluster Installation on CoreOS works currently. I'm:
- using https
- not configuring a cacert nor ignore tls
- the CoreOS example user data uses the mount point but actually never rights the cacert file
(Maybe it's not working on CoreOS right now, but I"ll get ready nodes, never tried cinder though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jichenjc Unfortunately I do not have self-signed keystone environment now.
@sbueringer This cacert is for using self-signed keystone, will be ca-file in cloud.conf.
https://kubernetes.io/docs/concepts/cluster-administration/cloud-providers/#global
I tested only Ubuntu though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can paste my test env later, I need cacert definitely otherwise the whole openstack APi can't be called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the mountpoint to controller-manager config
|
I have https env but I need more time to restore my test env..... |
|
@jichenjc I'm OK. |
|
ok, then we need wait for fix ,without worker is not an acceptable gate... |
|
I made some modifications in worker-user-data including so @sbueringer can you fix then we can merge the code? |
|
@jichenjc fixed it for Ubuntu & CentOS. Please let me know if anything else goes wrong. I'm happy to fix it of course :) cacert has also been added to controller-manager & apiserver. Not sure why apiserver might access OpenStack, but there are properties for the cloud-provider so I'm sure they know what they're doing ;) |
|
let me add follow up patch based on your setting as you don't have env to test ubuntu and centos /approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jichenjc, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| MACHINE+={{ .Machine.ObjectMeta.Name }} | ||
| CLUSTER_DNS_DOMAIN={{ .Cluster.Spec.ClusterNetwork.ServiceDomain }} | ||
| POD_CIDR={{ .PodCIDR }} | ||
| SERVICE_CIDR={{ .ServiceCIDR }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still here... @sbueringer
Yup that's okay. I pushed it on the wrong branch/PR :/ See: #435 |
* finalze multi-node control plane implementation All kubeadm configs are now generated via go code. This allows us to generate different configs for the first control plane node, every other control plane node and normal worker nodes. This also adds a flag to disablePortSecurity and it's now possible to set the KUBECONFIG and KUBECONTEXT via environment variable. * Fix dependencies * removed old test * review fixes * review fixes * add cacert to controller manager * fixup format
All kubeadm configs are now generated via go code. This allows us to
generate different configs for the first control plane node, every other
control plane node and normal worker nodes.
This also adds a flag to disablePortSecurity and it's now possible to
set the KUBECONFIG and KUBECONTEXT via environment variable.
What this PR does / why we need it:
This PR finalizes the multi-node control plane feature.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #382
Notes: