-
Notifications
You must be signed in to change notification settings - Fork 30
OPRUN-4068: OTE: rewrite the upgrade incompatible operator test #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPRUN-4068: OTE: rewrite the upgrade incompatible operator test #427
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tmshort The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
125ef4a
to
b2acb60
Compare
@tmshort: This pull request references OPRUN-4068 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@@ -40,7 +40,7 @@ | |||
"environmentSelector": {} | |||
}, | |||
{ | |||
"name": "[sig-olmv1][OCPFeatureGate:NewOLM][Skipped:Disconnected] OLMv1 operator installation should block cluster upgrades if an incompatible operator is installed", | |||
"name": "[sig-olmv1][OCPFeatureGate:NewOLM][Skipped:Disconnected] OLMv1 operator installation should fail to install a non-existing cluster extension", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot change the names
If we do that, we break component readiness
The name is stored in a database and used we need to ensure them over the time
We might could rename:https://github.com/openshift/operator-framework-operator-controller/tree/main/openshift/tests-extension#how-to-rename-a-test
But I would say at this stage, let's just keep the same names and tests to avoid problems with the Sippy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not changing the names, because the test was moved to a new file, it's changing the ordering. Look at the list overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I validate thank you a lot !!!
b2acb60
to
b8d3694
Compare
}) | ||
}) | ||
|
||
func createClusterExtension(name, namespace, serviceaccount, bundle string) *olmv1.ClusterExtension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! One small ask:
Please use the helper and call cleanup with defer right after creating the CE:
https://github.com/openshift/operator-framework-operator-controller/blob/main/openshift/tests-extension/pkg/helpers/cluster_extension.go#L29
After creating, please wait for it to be installed:
https://github.com/openshift/operator-framework-operator-controller/blob/main/openshift/tests-extension/pkg/helpers/cluster_extension.go#L135-L158
This helps avoid CI flakes, race conditions, and noisy Component Readiness signals. It also prevents leaking resources into other tests or local runs. (all that we create here will be available for all ther other tests if not properly cleaned which can cause issues)
Deleting only the namespace can hang and get stuck.
Deleting the CE via the helper cleans up everything it created so that the namespace can terminate cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See that we do not wait the ns be deleted and it might either take longer
that is why we need to ensure that all will run smoth as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look, the code is waiting for the ClusterExtension to come up. I'm not using the current set of helper functions because they don't do exactly what I want them to do. Basically, I want to know what the unique
value is throughout the whole test (for all my resources) and unique
is not exposed.
See lines 182~189.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClusterExtension is a non-namespace-scoped resource, and does not get cleaned up when when the namespace is deleted. So, there is an explicit clean up of the non-namespace-scoped resources.
Please look at all the cleanup that does happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because each new package has a unique, random name, there's no point in ensuring the deployment of the cluster extension already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have only one Operator/Content per cluster.
If we install a cluster extension in one test, then try to install the same one in another test (not installed by OLM), it will fail :-( for now we use this one only here but I think we should probably re-use more of it :-) Since it is amazing
Ideally, each test should delete everything it creates.
We use unique names to avoid conflicts. (We need to overthing because of Component Readiness) And I know, some resources will not cause problems if left over, but they make troubleshooting harder when many are left behind.
You did a fantastic job here and solved a big problem 🚀
In follow-up PRs, we’ll add more helpers. So, I propose
If this passes CI 💥, we’ll merge it.
Later, we can move some parts to pkg as utils, which will also can help QE tests a lot. They will need use the bindata as well.
All good to improve in future updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test create a uniquely named operator without a CRD, it can actually be installed multiple times,
I agree that the test should delete everything that it creates, and that's what the test does.
I did move over to the helper
routines, after adding a unique
argument to the create CE call.
b8d3694
to
b61c1b3
Compare
By("creating the ClusterExtension") | ||
ce := createClusterExtension(ceName, nsName, saName, bundleName) | ||
Expect(k8sClient.Create(ctx, ce)).To(Succeed(), "failed to create ClusterExtension") | ||
DeferCleanup(func() { | ||
By("deleting the ClusterExtension") | ||
Expect(k8sClient.Delete(context.Background(), ce)).To(Succeed()) | ||
}) | ||
waitForClusterExtensionInstalled(ctx, ceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the cluster extension has a DeferCleanup
and a wait for it to be installed.
func waitForClusterExtensionInstalled(ctx SpecContext, name string) { | ||
k8sClient := env.Get().K8sClient | ||
Eventually(func(g Gomega) { | ||
ce := &olmv1.ClusterExtension{} | ||
err := k8sClient.Get(ctx, client.ObjectKey{Name: name}, ce) | ||
g.Expect(err).ToNot(HaveOccurred()) | ||
|
||
progressing := meta.FindStatusCondition(ce.Status.Conditions, olmv1.TypeProgressing) | ||
g.Expect(progressing).ToNot(BeNil()) | ||
g.Expect(progressing.Status).To(Equal(metav1.ConditionTrue)) | ||
|
||
installed := meta.FindStatusCondition(ce.Status.Conditions, olmv1.TypeInstalled) | ||
g.Expect(installed).ToNot(BeNil()) | ||
g.Expect(installed.Status).To(Equal(metav1.ConditionTrue)) | ||
}).WithTimeout(5 * time.Minute).WithPolling(1 * time.Second).Should(Succeed()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use instead:
operator-framework-operator-controller/openshift/tests-extension/pkg/helpers/cluster_extension.go
Lines 113 to 131 in 56f527a
func ExpectClusterExtensionToBeInstalled(ctx context.Context, name string) { | |
k8sClient := env.Get().K8sClient | |
Eventually(func(g Gomega) { | |
var ext olmv1.ClusterExtension | |
err := k8sClient.Get(ctx, client.ObjectKey{Name: name}, &ext) | |
g.Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to get ClusterExtension %q", name)) | |
conditions := ext.Status.Conditions | |
g.Expect(conditions).NotTo(BeEmpty(), fmt.Sprintf("ClusterExtension %q has empty status.conditions", name)) | |
progressing := meta.FindStatusCondition(conditions, string(olmv1.TypeProgressing)) | |
g.Expect(progressing).ToNot(BeNil(), "Progressing condition not found") | |
g.Expect(progressing.Status).To(Equal(metav1.ConditionTrue), "Progressing should be True") | |
installed := meta.FindStatusCondition(conditions, string(olmv1.TypeInstalled)) | |
g.Expect(installed).ToNot(BeNil(), "Installed condition not found") | |
g.Expect(installed.Status).To(Equal(metav1.ConditionTrue), "Installed should be True") | |
}).WithTimeout(5 * time.Minute).WithPolling(1 * time.Second).Should(Succeed()) | |
} |
So, we keep those centralised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I modified the code to use this.
b61c1b3
to
b70382b
Compare
Updated to use |
da55a78
to
80a82bc
Compare
/test openshift-e2e-aws |
80a82bc
to
dea990a
Compare
I added support to get the OCP version, so, now, we should never need to update this test! |
This test replaces the existing upgrade incompatible test. The main change is that operator and catalog bundles are created on-the-fly to support OCP 4.20. This means we are no longer dependent on public operators for this test. This creates new bundles in the OCP ImageRegistry, this requires using a number of OCP APIs, including using a raw API URL to invoke the build. This is done by invoking an external k8s client (either `oc` or `kubectl`), and passing it a tarball of the bundle to be created. So, it can't be done by the golang k8sClient normally available (i.e. the create input is a tarball not a YAML file). This introduces the use of go-bindata to store the bundle contents. It also pulls in openshift mage, buld and operator APIs. Signed-off-by: Todd Short <[email protected]>
dea990a
to
2560a11
Compare
rebased. |
/retest |
/lgtm |
/test openshift-e2e-aws |
At least the |
@tmshort: This pull request references OPRUN-4068 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@tmshort: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
2f8f54b
into
openshift:main
[ART PR BUILD NOTIFIER] Distgit: ose-olm-catalogd |
This test replaces the existing upgrade incompatible test. The main change is that operator and catalog bundles are created on-the-fly to support OCP 4.20. This means we are no longer dependent on public operators for this test.
This creates new bundles in the OCP ImageRegistry, this requires using a number of OCP APIs, including using a raw API URL to invoke the build. This is done by invoking an external k8s client (either
oc
orkubectl
), and passing it a tarball of the bundle to be created. So, it can't be done by the golang k8sClient normally available (i.e. the create input is a tarball not a YAML file).This introduces the use of go-bindata to store the bundle contents.
It also pulls in openshift image, buld and operator APIs.