Skip to content

OPRUN-4068: OTE: rewrite the upgrade incompatible operator test #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 13, 2025

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Aug 11, 2025

This test replaces the existing upgrade incompatible test. The main change is that operator and catalog bundles are created on-the-fly to support OCP 4.20. This means we are no longer dependent on public operators for this test.

This creates new bundles in the OCP ImageRegistry, this requires using a number of OCP APIs, including using a raw API URL to invoke the build. This is done by invoking an external k8s client (either oc or kubectl), and passing it a tarball of the bundle to be created. So, it can't be done by the golang k8sClient normally available (i.e. the create input is a tarball not a YAML file).

This introduces the use of go-bindata to store the bundle contents.

It also pulls in openshift image, buld and operator APIs.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2025
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2025
@tmshort tmshort force-pushed the incompatible-test2 branch from 125ef4a to b2acb60 Compare August 11, 2025 18:09
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2025
@tmshort tmshort changed the title UPSTREAM: <carry>: OTE: rewrite the upgrade incompatible operator test OPRUN-4068: rewrite the upgrade incompatible operator test Aug 11, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 11, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 11, 2025

@tmshort: This pull request references OPRUN-4068 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This test replaces the existing upgrade incompatible test. The main change is that operator and catalog bundles are created on-the-fly to support OCP 4.20. This means we are no longer dependent on public operators for this test.

This creates new bundles in the OCP ImageRegistry, this requires using a number of OCP APIs, including using a raw API URL to invoke the build. This is done by invoking an external k8s client (either oc or kubectl), and passing it a tarball of the bundle to be created. So, it can't be done by the golang k8sClient normally available (i.e. the create input is a tarball not a YAML file).

This introduces the use of go-bindata to store the bundle contents.

It also pulls in openshift mage, buld and operator APIs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -40,7 +40,7 @@
"environmentSelector": {}
},
{
"name": "[sig-olmv1][OCPFeatureGate:NewOLM][Skipped:Disconnected] OLMv1 operator installation should block cluster upgrades if an incompatible operator is installed",
"name": "[sig-olmv1][OCPFeatureGate:NewOLM][Skipped:Disconnected] OLMv1 operator installation should fail to install a non-existing cluster extension",
Copy link
Contributor

@camilamacedo86 camilamacedo86 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot change the names
If we do that, we break component readiness
The name is stored in a database and used we need to ensure them over the time

We might could rename:https://github.com/openshift/operator-framework-operator-controller/tree/main/openshift/tests-extension#how-to-rename-a-test

But I would say at this stage, let's just keep the same names and tests to avoid problems with the Sippy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not changing the names, because the test was moved to a new file, it's changing the ordering. Look at the list overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I validate thank you a lot !!!

@tmshort tmshort force-pushed the incompatible-test2 branch from b2acb60 to b8d3694 Compare August 11, 2025 19:03
})
})

func createClusterExtension(name, namespace, serviceaccount, bundle string) *olmv1.ClusterExtension {
Copy link
Contributor

@camilamacedo86 camilamacedo86 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! One small ask:

Please use the helper and call cleanup with defer right after creating the CE:
https://github.com/openshift/operator-framework-operator-controller/blob/main/openshift/tests-extension/pkg/helpers/cluster_extension.go#L29

After creating, please wait for it to be installed:
https://github.com/openshift/operator-framework-operator-controller/blob/main/openshift/tests-extension/pkg/helpers/cluster_extension.go#L135-L158

This helps avoid CI flakes, race conditions, and noisy Component Readiness signals. It also prevents leaking resources into other tests or local runs. (all that we create here will be available for all ther other tests if not properly cleaned which can cause issues)

Deleting only the namespace can hang and get stuck.
Deleting the CE via the helper cleans up everything it created so that the namespace can terminate cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See that we do not wait the ns be deleted and it might either take longer
that is why we need to ensure that all will run smoth as possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look, the code is waiting for the ClusterExtension to come up. I'm not using the current set of helper functions because they don't do exactly what I want them to do. Basically, I want to know what the unique value is throughout the whole test (for all my resources) and unique is not exposed.

See lines 182~189.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClusterExtension is a non-namespace-scoped resource, and does not get cleaned up when when the namespace is deleted. So, there is an explicit clean up of the non-namespace-scoped resources.
Please look at all the cleanup that does happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, because each new package has a unique, random name, there's no point in ensuring the deployment of the cluster extension already exists.

Copy link
Contributor

@camilamacedo86 camilamacedo86 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have only one Operator/Content per cluster.
If we install a cluster extension in one test, then try to install the same one in another test (not installed by OLM), it will fail :-( for now we use this one only here but I think we should probably re-use more of it :-) Since it is amazing

Ideally, each test should delete everything it creates.
We use unique names to avoid conflicts. (We need to overthing because of Component Readiness) And I know, some resources will not cause problems if left over, but they make troubleshooting harder when many are left behind.

You did a fantastic job here and solved a big problem 🚀
In follow-up PRs, we’ll add more helpers. So, I propose

If this passes CI 💥, we’ll merge it.
Later, we can move some parts to pkg as utils, which will also can help QE tests a lot. They will need use the bindata as well.

All good to improve in future updates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test create a uniquely named operator without a CRD, it can actually be installed multiple times,

I agree that the test should delete everything that it creates, and that's what the test does.

I did move over to the helper routines, after adding a unique argument to the create CE call.

@tmshort tmshort force-pushed the incompatible-test2 branch from b8d3694 to b61c1b3 Compare August 11, 2025 19:14
Comment on lines 182 to 189
By("creating the ClusterExtension")
ce := createClusterExtension(ceName, nsName, saName, bundleName)
Expect(k8sClient.Create(ctx, ce)).To(Succeed(), "failed to create ClusterExtension")
DeferCleanup(func() {
By("deleting the ClusterExtension")
Expect(k8sClient.Delete(context.Background(), ce)).To(Succeed())
})
waitForClusterExtensionInstalled(ctx, ceName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the cluster extension has a DeferCleanup and a wait for it to be installed.

Comment on lines 393 to 366
func waitForClusterExtensionInstalled(ctx SpecContext, name string) {
k8sClient := env.Get().K8sClient
Eventually(func(g Gomega) {
ce := &olmv1.ClusterExtension{}
err := k8sClient.Get(ctx, client.ObjectKey{Name: name}, ce)
g.Expect(err).ToNot(HaveOccurred())

progressing := meta.FindStatusCondition(ce.Status.Conditions, olmv1.TypeProgressing)
g.Expect(progressing).ToNot(BeNil())
g.Expect(progressing.Status).To(Equal(metav1.ConditionTrue))

installed := meta.FindStatusCondition(ce.Status.Conditions, olmv1.TypeInstalled)
g.Expect(installed).ToNot(BeNil())
g.Expect(installed.Status).To(Equal(metav1.ConditionTrue))
}).WithTimeout(5 * time.Minute).WithPolling(1 * time.Second).Should(Succeed())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use instead:

func ExpectClusterExtensionToBeInstalled(ctx context.Context, name string) {
k8sClient := env.Get().K8sClient
Eventually(func(g Gomega) {
var ext olmv1.ClusterExtension
err := k8sClient.Get(ctx, client.ObjectKey{Name: name}, &ext)
g.Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to get ClusterExtension %q", name))
conditions := ext.Status.Conditions
g.Expect(conditions).NotTo(BeEmpty(), fmt.Sprintf("ClusterExtension %q has empty status.conditions", name))
progressing := meta.FindStatusCondition(conditions, string(olmv1.TypeProgressing))
g.Expect(progressing).ToNot(BeNil(), "Progressing condition not found")
g.Expect(progressing.Status).To(Equal(metav1.ConditionTrue), "Progressing should be True")
installed := meta.FindStatusCondition(conditions, string(olmv1.TypeInstalled))
g.Expect(installed).ToNot(BeNil(), "Installed condition not found")
g.Expect(installed.Status).To(Equal(metav1.ConditionTrue), "Installed should be True")
}).WithTimeout(5 * time.Minute).WithPolling(1 * time.Second).Should(Succeed())
}

So, we keep those centralised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I modified the code to use this.

@tmshort tmshort force-pushed the incompatible-test2 branch from b61c1b3 to b70382b Compare August 11, 2025 20:36
@tmshort
Copy link
Contributor Author

tmshort commented Aug 11, 2025

Updated to use helper routines, and tidied up the cleanup a bit, and renamed "bundle" to be "operator", which is a bit more precise.

@tmshort tmshort force-pushed the incompatible-test2 branch 2 times, most recently from da55a78 to 80a82bc Compare August 11, 2025 20:54
@tmshort tmshort changed the title OPRUN-4068: rewrite the upgrade incompatible operator test OPRUN-4068: OTE: rewrite the upgrade incompatible operator test Aug 11, 2025
@camilamacedo86
Copy link
Contributor

/test openshift-e2e-aws

@tmshort tmshort force-pushed the incompatible-test2 branch from 80a82bc to dea990a Compare August 12, 2025 16:41
@tmshort
Copy link
Contributor Author

tmshort commented Aug 12, 2025

I added support to get the OCP version, so, now, we should never need to update this test!

This test replaces the existing upgrade incompatible test.
The main change is that operator and catalog bundles are created on-the-fly
to support OCP 4.20. This means we are no longer dependent on public
operators for this test.

This creates new bundles in the OCP ImageRegistry, this requires using
a number of OCP APIs, including using a raw API URL to invoke the build.
This is done by invoking an external k8s client (either `oc` or `kubectl`),
and passing it a tarball of the bundle to be created. So, it can't be done
by the golang k8sClient normally available (i.e. the create input is a
tarball not a YAML file).

This introduces the use of go-bindata to store the bundle contents.

It also pulls in openshift mage, buld and operator APIs.

Signed-off-by: Todd Short <[email protected]>
@tmshort tmshort force-pushed the incompatible-test2 branch from dea990a to 2560a11 Compare August 12, 2025 18:47
@tmshort
Copy link
Contributor Author

tmshort commented Aug 12, 2025

rebased.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 12, 2025

/retest

@camilamacedo86
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 5ad2b55 and 2 for PR HEAD 2560a11 in total

@camilamacedo86
Copy link
Contributor

/test openshift-e2e-aws

@tmshort
Copy link
Contributor Author

tmshort commented Aug 13, 2025

At least the openshift-e2e-aws error was not due to OLM

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 13, 2025

@tmshort: This pull request references OPRUN-4068 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

This test replaces the existing upgrade incompatible test. The main change is that operator and catalog bundles are created on-the-fly to support OCP 4.20. This means we are no longer dependent on public operators for this test.

This creates new bundles in the OCP ImageRegistry, this requires using a number of OCP APIs, including using a raw API URL to invoke the build. This is done by invoking an external k8s client (either oc or kubectl), and passing it a tarball of the bundle to be created. So, it can't be done by the golang k8sClient normally available (i.e. the create input is a tarball not a YAML file).

This introduces the use of go-bindata to store the bundle contents.

It also pulls in openshift image, buld and operator APIs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Aug 13, 2025

@tmshort: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 2560a11 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 2f8f54b into openshift:main Aug 13, 2025
12 of 13 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-olm-catalogd
This PR has been included in build ose-olm-catalogd-container-v4.20.0-202508131744.p0.g2f8f54b.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants