Skip to content

Conversation

@lentzi90
Copy link
Contributor

@lentzi90 lentzi90 commented Nov 3, 2022

What this PR does / why we need it:

This adds e2e clusterctl upgrade tests from the CAPI e2e framework. Having a clusterctl upgrade test is very important for verifying upgrades between API versions.
The CAPI clusterctl upgrade test spec is a bit special since it creates not just a workload cluster, but a secondary management cluster also. The flow goes like this:

  1. From the bootstrap cluster, create secondary management cluster with older CAPI/CAPO
  2. From the secondary management cluster, create a workload cluster to check that the older versions are working correctly
  3. Run clusterctl upgrade for the secondary management cluster (this requires a way to get the e2e image to the node where there upgraded controller will run)
  4. Scale the workload cluster to verify that the upgraded versions are working.

The reason for the secondary management cluster is to avoid any issues with parallel jobs requiring different versions, so the bootstrap cluster can be shared as normal.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #1363 but I would want to add some more tests before closing it.

Special notes for your reviewer:

  1. There are 2 commits in this PR for making it easier to discuss and review until we agree on the implementation:
    1. Adding the tests
    2. WIP upload the e2e image to the devstack controller and pre-pull it on the nodes.
      Note that this is only needed if we want to test the image built by CI (for example on PRs). If it is ok to just take the main tagged image, then we can drop this.
  2. I added tests only for v0.6 -> current as agreed in the office hours. The work required to add older versions would make things more complicated in a few different ways:
    1. They don't all support the latest kubernetes version, so we need to set the version separately for them
    2. We get new combinations of CAPI/CAPO API versions, which means that the current way to just divide by the CAPO API version here would not be enough. (We would have CAPI v1alpha4 + CAPO v1alpha4 and CAPI v1beta1 + CAPO v1alpha4.)
    3. Already for v0.5 we would have the issue that it does not support Openstack yoga, so we would need to run the devstack with some older version.
  3. Because of the secondary management cluster, we get quite a lot of VMs, which means we cannot run 2 of these tests in parallel as we currently do. We get a total of 1 bastion + 1 KCP + 1 worker for the management cluster + 1 bastion + 1 KCP + 2 workers (when scaling to test) in the workload cluster. If we run in parallel we are already above the devstack default quota of 10 instances.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 3, 2022
@netlify
Copy link

netlify bot commented Nov 3, 2022

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 40f1a33
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/637f4c9635a78200098b22a8
😎 Deploy Preview https://deploy-preview-1371--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 3, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @lentzi90. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 3, 2022
@lentzi90
Copy link
Contributor Author

lentzi90 commented Nov 3, 2022

NOTE: The e2e clusterctl test will not pass with the WIP commit (I couldn't get it to import the container image properly so it gets stuck in ImagePullBackOff). I have however verified that it is working when using the main tag instead of e2e.

@jichenjc
Copy link
Contributor

jichenjc commented Nov 7, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 7, 2022
@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from 63b7795 to 38c8038 Compare November 7, 2022 06:50
@lentzi90
Copy link
Contributor Author

lentzi90 commented Nov 7, 2022

/retest

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from 38c8038 to e820c94 Compare November 7, 2022 08:41
@lentzi90
Copy link
Contributor Author

lentzi90 commented Nov 7, 2022

/retest

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from e820c94 to 619d372 Compare November 7, 2022 09:42
@lentzi90
Copy link
Contributor Author

lentzi90 commented Nov 7, 2022

/test pull-cluster-api-provider-openstack-e2e-test

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from 619d372 to f620925 Compare November 15, 2022 10:56
@lentzi90
Copy link
Contributor Author

/test pull-cluster-api-provider-openstack-e2e-test

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from f620925 to e9a5e2e Compare November 15, 2022 14:00
@lentzi90
Copy link
Contributor Author

/test pull-cluster-api-provider-openstack-e2e-test

1 similar comment
@lentzi90
Copy link
Contributor Author

/test pull-cluster-api-provider-openstack-e2e-test

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from 190276a to f49cee4 Compare November 17, 2022 08:45
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 17, 2022
These tests make use of the CAPI e2e framework. The test spec creates a
secondary management cluster with older versions of the controllers. A
workload cluster is created to test the functionality of the old
controllers before they are upgraded. Then clusterctl upgrade is used to
upgrade them and the workload cluster is scaled to check that things are
working also after the upgrade.
@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from f49cee4 to 0a67fbf Compare November 17, 2022 10:03
@lentzi90
Copy link
Contributor Author

/test pull-cluster-api-provider-openstack-e2e-test

Comment on lines +89 to +99
source "${REPO_ROOT}/hack/ci/${RESOURCE_TYPE}.sh"
CONTAINER_ARCHIVE="${ARTIFACTS}/capo-e2e-image.tar"
SSH_KEY="$(get_ssh_private_key_file)"
SSH_ARGS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o PasswordAuthentication=no"
CONTROLLER_IP=${CONTROLLER_IP:-"10.0.3.15"}

make e2e-image
docker save -o "${CONTAINER_ARCHIVE}" gcr.io/k8s-staging-capi-openstack/capi-openstack-controller:e2e
scp -i "${SSH_KEY}" ${SSH_ARGS} "${CONTAINER_ARCHIVE}" "cloud@${CONTROLLER_IP}:capo-e2e-image.tar"
ssh -i "${SSH_KEY}" ${SSH_ARGS} "cloud@${CONTROLLER_IP}" -- sudo chown root:root capo-e2e-image.tar
ssh -i "${SSH_KEY}" ${SSH_ARGS} "cloud@${CONTROLLER_IP}" -- sudo chmod u=rw,g=r,o=r capo-e2e-image.tar
ssh -i "${SSH_KEY}" ${SSH_ARGS} "cloud@${CONTROLLER_IP}" -- sudo mv capo-e2e-image.tar /var/www/html/capo-e2e-image.tar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on how to improve this?
The point of this is to build the e2e image, save it as an archive and upload it to the controller where it can be fetched for the secondary management cluster. (In the KinD bootstrap cluster, we simply inject the image directly, but we cannot do this for the secondary management cluster that it used for the upgrade test.)

The main issue I have with it is that we build the e2e image twice. Once here and once when running make test-e2e below. For the upgrade test it may be okay but it is completely unnecessary for all the other tests.

@lentzi90 lentzi90 marked this pull request as ready for review November 21, 2022 08:00
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2022
@mdbooth
Copy link
Contributor

mdbooth commented Nov 21, 2022

/assign

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions quite possibly just for my own understanding.

Can we sort out the commit message of that second commit? I don't want to merge something with WIP in the title!

.PHONY: e2e-templates
e2e-templates: ## Generate cluster templates for e2e tests
e2e-templates: $(addprefix $(E2E_TEMPLATES_DIR)/, \
cluster-template-v1alpha5.yaml \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to use this to write v1alpha5 simulator tests!

Comment on lines +77 to +91
# This is only for clusterctl upgrade tests
- name: v0.6.3
value: "https://github.com/kubernetes-sigs/cluster-api-provider-openstack/releases/download/v0.6.3/infrastructure-components.yaml"
type: url
contract: v1beta1
files:
- sourcePath: "../data/shared/v1beta1_provider/metadata.yaml"
- sourcePath: "./infrastructure-openstack/cluster-template.yaml"
replacements:
- old: "imagePullPolicy: Always"
new: "imagePullPolicy: IfNotPresent"
- old: "--v=2"
new: "--v=4"
- old: "--leader-elect"
new: "--leader-elect=false\n - --sync-period=1m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all honesty this remains magic to me. I'll work it out one day when it breaks.

@tobiasgiese @jichenjc @seanschneeweiss Are any of you able to give this stanza meaningful review?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything in particular that you are wondering about? My understanding of this is that it is used to create a repository that clusterctl is then configured to use as the source of truth instead of reaching out to github to check what releases we have. This is mostly useful for not-yet-released versions, but since the same config is also used for upgrade tests, we need to explicitly list the versions that should be "available" to the test.

I didn't think much about the additional config here (replacements and files) but I think they make sense and follow what we already have as well as what CAPI is doing.

OPENSTACK_VOLUME_TYPE_ALT: "test-volume-type"
CONFORMANCE_WORKER_MACHINE_COUNT: "5"
CONFORMANCE_CONTROL_PLANE_MACHINE_COUNT: "1"
INIT_WITH_KUBERNETES_VERSION: "v1.25.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used only for the management cluster? We still have the issue that our CI installation image is stuck on 1.18.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the secondary management cluster. It will be used when rendering the cluster-template here.
I also noticed that the installation image is stuck on 1.18, but we use the CI artifacts script to actually upgrade before kubeadm is run.

- op: add
path: /spec/kubeadmConfigSpec/postKubeadmCommands
value:
- /usr/local/bin/ci-artifacts-openstack.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this run? Is it on the test runner or on the target? I assume the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runs on the target yes. I'm mimicking what we already use to to get the "CI artifacts".
For that we call GenerateCIArtifactsInjectedTemplateForDebian which injects a script that runs as part of the preKubeadmCommands.

At first I actually tried adding the new script to the ci-artifacts-platform-kustomization.yaml, but that turned out to be harder than expected since it is way too easy to overwrite each others file in those patches. 😬 So I ended up putting it here together with the common patches.

var _ = Describe("When testing clusterctl upgrades (v0.6=>current) [clusterctl-upgrade]", func() {
ctx := context.TODO()
shared.SetEnvVar("USE_CI_ARTIFACTS", "true", false)
shared.SetEnvVar("DOWNLOAD_E2E_IMAGE", "true", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this setting an environment variable on the test runner? If so, how does that affect /usr/local/bin/ci-artifacts-openstack.sh which I'm assuming (possibly incorrectly) is executing on the target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! It is used when rendering the cluster-template. Both these variables are used in the cluster-template that we use in the tests. You can see the rendered result in _artifacts/templates. Here is the one from the previous (successful) e2e test run on this PR: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-openstack/1371/pull-cluster-api-provider-openstack-e2e-test/1594601712893562880/artifacts/templates/cluster-template-ci-artifacts.yaml (and here is the complete _artifacts folder from that run for reference).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to have to look at the template renderer. I didn't think it would support bash syntax like ${DOWNLOAD_E2E_IMAGE:=false}.

If that question itself reveals my fundamental misunderstanding, please can you enlighten me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The substitution language is linked from the clusterctl documentation here: https://cluster-api.sigs.k8s.io/clusterctl/commands/generate-yaml.html

@lentzi90 lentzi90 force-pushed the lentzi90/e2e-upgrade branch from 9c8b60c to 40f1a33 Compare November 24, 2022 10:50
@lentzi90
Copy link
Contributor Author

I expect the e2e tests to fail now on this PR until #1390 and kubernetes/test-infra#28082 are merged. This is because of resource constraints. We cannot run the clusterctl-upgrade test in parallel with other tests.

@lentzi90
Copy link
Contributor Author

Can we sort out the commit message of that second commit? I don't want to merge something with WIP in the title!

Done ✔️

@mdbooth
Copy link
Contributor

mdbooth commented Nov 30, 2022

/retest

@lentzi90
Copy link
Contributor Author

/test pull-cluster-api-provider-openstack-e2e-full-test

@lentzi90
Copy link
Contributor Author

lentzi90 commented Dec 1, 2022

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2022
@jichenjc
Copy link
Contributor

jichenjc commented Dec 1, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2022
@mdbooth
Copy link
Contributor

mdbooth commented Dec 1, 2022

I'd like to wait until I get my head round the DOWNLOAD_E2E_IMAGE thing. I'm sure it's fine, it's just eluding me right now.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2022
@mdbooth
Copy link
Contributor

mdbooth commented Dec 1, 2022

/hold cancel
/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 1, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lentzi90, mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2022
@k8s-ci-robot k8s-ci-robot merged commit 3430748 into kubernetes-sigs:main Dec 1, 2022
@lentzi90 lentzi90 deleted the lentzi90/e2e-upgrade branch December 1, 2022 11:44
@tobiasgiese tobiasgiese mentioned this pull request Dec 7, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants