Skip to content

OPRUN-3941: [OTE] Add webhook tests (without should be tolerant to openshift-service-ca certificate rotation) #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Aug 15, 2025

PR Description

It reintroduces the reverted PR #434,
but without the failing test (as agreed in Slack thread).

The base reference for this file is #442, which was reverted [#434].


Delta Summary (New vs Old)

Removed test
It("should be tolerant to openshift-service-ca certificate rotation", …)
‣ Failed → reason for the revert

Added (requested in #424 review
Diagnostics & dumps

  • AfterEach failure hook → pod logs, resource describes, webhook config dumps
  • DeferCleanup in TLS secret test → dump cert details on failure
  • printTLSCertInfo() util → parse & log Subject/Issuer/Validity/SANs

Const

  • webhookServiceCert = "webhook-operator-webhook-service-cert" (instead of fixed string literal)

Renamed / Changed
🔄 webhookTestGVRV1webhookTestV1 (requested in #424 review
🔄 webhookTestGVRV2webhookTestV2 (requested in #424 review
🔄 newWebhookTestV1()newWebhookTest() (requested in #424 review
🔄 Namespace deletion → wait.PollUntilContextTimeoutEventually (5m timeout, avoid flakes)
🔄 TLS secret recreation wait → 2m → 5m (to avoid flakes)


📌 To check the delta, compare the webhook.go file from the old merged+reverted PR and the new version in this PR. All the other files are utils.

Tests

Tested locally
Passing in e2e / pre-merge
Verified with /payload-aggregate See details on https://pr-payload-tests.ci.openshift.org/runs/ci/af81eff0-7b94-11f0-93ec-12968849859f-0

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 15, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

  • Increase the timeout for "should be tolerant to openshift-service-ca certificate rotation" from 2 minutes to 5 minutes to reduce flakes.
  • Add dumping of container logs and kubectl describe pods output for better diagnostics.
  • Include targeted certificate details dump (tls.crt parse) when failures occur.
  • Add additional check to verify webhook responsiveness after certificate rotation.

This change is a refactor of code from openshift/origin#30059.

Assisted-by: Gemini

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 15, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 15, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

  • Skip "should be tolerant to openshift-service-ca certificate rotation". More info: https://issues.redhat.com/browse/OCPBUGS-60564
  • Add dumping of container logs and kubectl describe pods output for better diagnostics.
  • Include targeted certificate details dump (tls.crt parse) when failures occur.
  • Add additional check to verify webhook responsiveness after certificate rotation.

This change is a refactor of code from openshift/origin#30059.

Assisted-by: Gemini

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 15, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

  • Skip "should be tolerant to openshift-service-ca certificate rotation". More info: https://issues.redhat.com/browse/OCPBUGS-60564
  • Add dumping of container logs and kubectl describe pods output for better diagnostics.
  • Include targeted certificate details dump (tls.crt parse) when failures occur.
  • Add additional check to verify webhook responsiveness after certificate rotation.

This change is a refactor of code from openshift/origin#30059.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@camilamacedo86 camilamacedo86 force-pushed the add-webhook-tests-with-skip-certca branch from 112467e to 07d5c5a Compare August 15, 2025 17:42

It("should be tolerant to openshift-service-ca certificate rotation", func(ctx SpecContext) {
// FIXME: https://issues.redhat.com/browse/OCPBUGS-60564
Skip("Skipping this test until we ensure that it can work reliably")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test must be skipped for now.
We will work on it in a follow up

@camilamacedo86
Copy link
Contributor Author

/payload-aggregate-with-prs periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 5

1 similar comment
@camilamacedo86
Copy link
Contributor Author

/payload-aggregate-with-prs periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 5

@camilamacedo86
Copy link
Contributor Author

/payload-aggregate-with-prs periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 10

@camilamacedo86 camilamacedo86 force-pushed the add-webhook-tests-with-skip-certca branch 2 times, most recently from baa450a to 0415e67 Compare August 15, 2025 20:57
@camilamacedo86
Copy link
Contributor Author

/payload-aggregate-with-prs periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 10

@camilamacedo86
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 10

Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

@camilamacedo86: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b96934e0-7a1f-11f0-91bf-2cddbeabd8dd-0

@camilamacedo86
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 10

Copy link
Contributor

openshift-ci bot commented Aug 16, 2025

@camilamacedo86: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/94147170-7a77-11f0-9e36-38ae811e5825-0

@camilamacedo86
Copy link
Contributor Author

/restest-required

@camilamacedo86 camilamacedo86 force-pushed the add-webhook-tests-with-skip-certca branch from 0415e67 to 3a22145 Compare August 17, 2025 17:46
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 17, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

(Improvements)

  • Add dumping of container logs and kubectl describe pods output for better diagnostics.
  • Include targeted certificate details dump (tls.crt parse) when failures occur.
  • Add additional check to verify webhook responsiveness after certificate rotation.

Add tests without problematic:

"name": "[sig-olmv1][OCPFeatureGate:NewOLMWebhookProviderOpenshiftServiceCA][Skipped:Disconnected][Serial] OLMv1 operator with webhooks should be tolerant to openshift-service-ca certificate rotation",

This change is a refactor of code from openshift/origin#30059.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@camilamacedo86 camilamacedo86 force-pushed the add-webhook-tests-with-skip-certca branch from 3a22145 to d778da1 Compare August 17, 2025 17:58
@camilamacedo86 camilamacedo86 changed the title OPRUN-3941: [OTE] Add webhook tests skipping flake test OPRUN-3941: [OTE] Add webhook tests (without should be tolerant to openshift-service-ca certificate rotation) Aug 17, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 17, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

(Improvements)

  • Add dumping of container logs and kubectl describe pods output for better diagnostics.
  • Include targeted certificate details dump (tls.crt parse) when failures occur.
  • Add additional check to verify webhook responsiveness after certificate rotation.

Add tests without problematic:

"name": "[sig-olmv1][OCPFeatureGate:NewOLMWebhookProviderOpenshiftServiceCA][Skipped:Disconnected][Serial] OLMv1 operator with webhooks should be tolerant to openshift-service-ca certificate rotation",

Reason: It did not pass in the latest try on the sippy, and we created an OCPBug to analyse the scenario.

This change is a refactor of code from openshift/origin#30059.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 17, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

This change is a refactor of code from openshift/origin#30059 with improvements and logs for better diagnostics.

Add tests without that one which faced issues, which is:

"name": "[sig-olmv1][OCPFeatureGate:NewOLMWebhookProviderOpenshiftServiceCA][Skipped:Disconnected][Serial] OLMv1 operator with webhooks should be tolerant to openshift-service-ca certificate rotation",

We created an OCPBug to analyse the scenario.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@camilamacedo86
Copy link
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial 5

Copy link
Contributor

openshift-ci bot commented Aug 17, 2025

@camilamacedo86: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/af81eff0-7b94-11f0-93ec-12968849859f-0

@camilamacedo86 camilamacedo86 force-pushed the add-webhook-tests-with-skip-certca branch from d778da1 to 697e82c Compare August 17, 2025 19:25
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 18, 2025

@camilamacedo86: This pull request references OPRUN-3941 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.0" version, but no target version was set.

In response to this:

This change is a refactor of code from openshift/origin#30059 with improvements and logs for better diagnostics.

Add tests without that one which faced issues, which is:

"name": "[sig-olmv1][OCPFeatureGate:NewOLMWebhookProviderOpenshiftServiceCA][Skipped:Disconnected][Serial] OLMv1 operator with webhooks should be tolerant to openshift-service-ca certificate rotation",
Screenshot 2025-08-18 at 06 55 56

We created an OCPBug to analyse the scenario.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@@ -85,7 +81,7 @@ var _ = Describe("[sig-olmv1][OCPFeatureGate:NewOLM][Skipped:Disconnected] OLMv1
DeferCleanup(fileCleanup)
By(fmt.Sprintf("created operator tarball %q", fileOperator))

By(fmt.Sprintf("starting the operator build with %q via RAW URL", cmdLine))
By("starting the operator build via RAW URL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is RAW URL?

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A RAW URL usually means a direct link to the unprocessed (raw) content of a file, without any extra formatting, preview, or surrounding interface.

Example:

The change here is just because we are using the utils, so we don't need or have cmdLine anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested in the PR to add the follow ups (which is in hold because we cannot add the test that fails): #430 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular case, it is invoking a k8s API via a URL (using --raw) rather than allowing the client (kubectl or oc) to determine the URL to use. This is needed because building an image is a specific Openshift operation not supported by kubectl. Although it is supported by oc, I wanted to make it generic enough to allow kubectl to be used. Because of this raw URL use, we can't directly use the k8sClient API.

Comment on lines 80 to 90
if CurrentSpecReport().Failed() {
By("dumping pod logs for debugging")
helpers.GetAllPodLogs(ctx, webhookOperatorInstallNamespace)
helpers.DescribePods(ctx, webhookOperatorInstallNamespace)
helpers.DescribeClusterCatalogs(ctx)
helpers.DescribeAllClusterExtensions(ctx, webhookOperatorInstallNamespace)
By("dumping webhook diagnostics")
// Additional diagnostics specific for this test
helpers.RunAndPrint(ctx, "get", "secret", "-n", webhookOperatorInstallNamespace, webhookServiceCert, "-oyaml")
helpers.RunAndPrint(ctx, "get", "mutatingwebhookconfigurations.admissionregistration.k8s.io", "-oyaml")
helpers.RunAndPrint(ctx, "get", "validatingwebhookconfigurations.admissionregistration.k8s.io", "-oyaml")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking this: do we need all these? Fyi the original PR has the executil.DumpPodsLogs etc etc which I pulled in without any changes...but rethinking this now: this is over explaining things. We'll get all of these from the must gather of the test cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly — the must-gather isn’t collected before we run all the cleanups.
Yes, we need this, but it will only appear if/when a test fails, and only for that specific test (similar to how it works in ocp/origin) if we develop it this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are definitely circumstances where we need to gather this data before the must-gather, e.g. in the case he pod is deleted during the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that secrets are not part of the must gather.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Comment on lines +171 to +232
It("should be tolerant to tls secret deletion", func(ctx SpecContext) {
certificateSecretName := webhookServiceCert
By("ensuring secret exists before deletion attempt")
Eventually(func(g Gomega) {
secret := &corev1.Secret{}
err := k8sClient.Get(ctx, client.ObjectKey{Name: certificateSecretName, Namespace: webhookOperatorInstallNamespace}, secret)
g.Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to get secret %s/%s", webhookOperatorInstallNamespace, certificateSecretName))
}).WithTimeout(1 * time.Minute).WithPolling(5 * time.Second).Should(Succeed())

By("checking webhook is responsive through secret recreation after manual deletion")
tlsSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: certificateSecretName,
Namespace: webhookOperatorInstallNamespace,
},
}
err := k8sClient.Delete(ctx, tlsSecret, client.PropagationPolicy(metav1.DeletePropagationBackground))
Expect(client.IgnoreNotFound(err)).ToNot(HaveOccurred())

DeferCleanup(func() {
// Specific check for this test
if CurrentSpecReport().Failed() {
By("dumping certificate details for debugging")
secret := &corev1.Secret{}
if err := k8sClient.Get(ctx, client.ObjectKey{
Name: webhookServiceCert,
Namespace: webhookOperatorInstallNamespace,
}, secret); err == nil {
if crt, ok := secret.Data["tls.crt"]; ok && len(crt) > 0 {
printTLSCertInfo(crt)
} else {
_, _ = GinkgoWriter.Write([]byte("[diag] tls.crt key not found or empty in secret\n"))
}
} else {
fmt.Fprintf(GinkgoWriter, "[diag] failed to get secret for cert dump: %v\n", err)
}
}
})

By("waiting for the webhook operator's service certificate secret to be recreated and populated")
Eventually(func(g Gomega) {
secret := &corev1.Secret{}
err := k8sClient.Get(ctx, client.ObjectKey{Name: certificateSecretName, Namespace: webhookOperatorInstallNamespace}, secret)
if apierrors.IsNotFound(err) {
GinkgoLogr.Info(fmt.Sprintf("Secret %s/%s not found yet (still polling for recreation)", webhookOperatorInstallNamespace, certificateSecretName))
return
}
g.Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to get webhook service certificate secret %s/%s: %v", webhookOperatorInstallNamespace, certificateSecretName, err))
g.Expect(secret.Data).ToNot(BeEmpty(), "expected webhook service certificate secret data to not be empty after recreation")
}).WithTimeout(5*time.Minute).WithPolling(10*time.Second).Should(Succeed(), "webhook service certificate secret did not get recreated and populated within timeout")

Eventually(func(g Gomega) {
resourceName := fmt.Sprintf("tls-deletion-test-%s", rand.String(5))
resource := newWebhookTest(resourceName, webhookOperatorInstallNamespace, true)

_, err := dynamicClient.Resource(webhookTestV1).Namespace(webhookOperatorInstallNamespace).Create(ctx, resource, metav1.CreateOptions{})
g.Expect(err).ToNot(HaveOccurred(), fmt.Sprintf("failed to create test resource %s: %v", resourceName, err))

err = dynamicClient.Resource(webhookTestV1).Namespace(webhookOperatorInstallNamespace).Delete(ctx, resource.GetName(), metav1.DeleteOptions{})
g.Expect(client.IgnoreNotFound(err)).ToNot(HaveOccurred(), fmt.Sprintf("failed to delete test resource %s: %v", resourceName, err))
}).WithTimeout(5 * time.Minute).WithPolling(10 * time.Second).Should(Succeed())
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we stick to whatever was here only please: https://github.com/openshift/origin/pull/30059/files#diff-6dd6fa78ac85235012d3c9910f8510bc1640c830a83888681bd5922cec0dffcbR166-R188

All of these print stuff is okay when debugging but feels unclean in a prod environment to run every time. And is also a lot of code to review, which means a lot of surface area for bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print is required when errors occur, otherwise we don’t get the information needed for debugging.
The output will only appear if the test fails — that’s the key point to keep in mind.

See the snippet below and note the guard: if CurrentSpecReport().Failed().
The only change here compared to the previously approved PR is the code inside that if block.

By the way, you were one of the folks who originally requested additional dumps and logs 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the information is available when the test fails, and is silent otherwise, I think we are fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

@anik120 anik120 removed their assignment Aug 19, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file are not strictly related to the subject of this PR. It is mostly some cleanup using other APIs (i.e. to find a command-line client), to avoid duplication. The changes are fine. So, while it may not belong as part of this PR, I'm not going to push for it to be separated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this could've used some of the new helpers that were written in this PR (e.g. NewClusterCatalog), but things can be combined later as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Admittedly, I did write this fairly independently of the rest of the code, where there wasn't that much infrastructure in-place.)

Comment on lines +24 to +27
// Verify that the tool is working by checking its version.
if err := exec.Command(t, "version", "--client").Run(); err == nil {
return t, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is strictly necessary, but it does ensure that the client can be executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is good right to ensure that all is fine mainly because we can either run locally

// RunAndPrint runs a `kubectl/oc` command via RunK8sCommand and writes both stdout and stderr
// to the GinkgoWriter. It also prints the exact command being run.
func RunAndPrint(ctx context.Context, args ...string) {
fmt.Fprintf(GinkgoWriter, "\n[diag] running: oc %s\n", strings.Join(args, " "))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it's not running oc, but could be running kubectl, so the statement "It also prints the exact command being run" is in correct.
A nit, some arguments might contain spaces, that might accidentally be interpreted as separate commands when printed in this manner. I wonder if it's worthwhile to add quotes around each of args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should be

fmt.Fprintf(GinkgoWriter, "\n[diag] running: %s\n", strings.Join(args, " "))

Only.

I think we can add that I wonder if it's worthwhile to add quotes around each of args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

"strings"

//nolint:staticcheck // ST1001: dot-imports for readability
. "github.com/onsi/ginkgo/v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative would be to use g as the import reference. That might improve readability more than using . (and the consequences surrounding that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IHMO the g is hard to understand because one we need to add g other o. But we are doing it in all files (not only on those), while I agree that can be debatable can we agree taht if we change that would be in a follow up so we can change all places?

}

func subHeader(format string, a ...any) {
fmt.Fprintf(GinkgoWriter, "\n--- %s ---\n", fmt.Sprintf(format, a...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I almost want the --- and === to be the same length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

}

// DescribeClusterCatalogs lists all ClusterCatalogs and runs `describe` on each.
func DescribeClusterCatalogs(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, below "All" is used in the function name for ClusterExtensions, but not ClusterCatalogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename for DescribeAllClusterCatalogs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

// printTLSCertInfo parses a PEM-encoded TLS certificate and prints useful debug info.
// It shows validity period and SANs (DNS/IP) to help debug webhook cert issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function feels as though it should be in the helpers package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems valid only for this specific case scenario that is why it is here

Comment on lines 366 to 370
fmt.Fprintf(GinkgoWriter, "[diag] Subject: %s\n", cert.Subject.String())
fmt.Fprintf(GinkgoWriter, "[diag] Issuer: %s\n", cert.Issuer.String())
fmt.Fprintf(GinkgoWriter, "[diag] Valid From: %s\n", cert.NotBefore.Format(time.RFC3339))
fmt.Fprintf(GinkgoWriter, "[diag] Valid Until: %s\n", cert.NotAfter.Format(time.RFC3339))
fmt.Fprintf(GinkgoWriter, "[diag] IsCA: %t\n", cert.IsCA)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: serial number might be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interest yep we might could either add as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

webhookServiceCert = "webhook-operator-webhook-service-cert"
)

var _ = Describe("[sig-olmv1][OCPFeatureGate:NewOLMWebhookProviderOpenshiftServiceCA][Skipped:Disconnected][Serial] OLMv1 operator with webhooks",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 [Serial]

helpers.DescribeAllClusterExtensions(ctx, webhookOperatorInstallNamespace)
By("dumping webhook diagnostics")
// Additional diagnostics specific for this test
helpers.RunAndPrint(ctx, "get", "secret", "-n", webhookOperatorInstallNamespace, webhookServiceCert, "-oyaml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be using the printTLSCertInfo function below? I'm a bit concerned about dumping out secret contents (although admittedly, it's only a temporary secret for the lifetime of the service in question).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is is fine for this reason, and it could help us diagnostic issues

k8sClient = env.Get().K8sClient
restCfg := env.Get().RestCfg
var err error
dynamicClient, err = dynamic.NewForConfig(restCfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being used rather than k8sClient? We should just have to add the scheme and we'd be good to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used for 2 types of resources.

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Controller-runtime client.Client (k8sClient) -> Only works with structured, typed client. (e.g., corev1.Secret, olmv1.ClusterCatalog).
  • dynamic.Interface (dynamicClient) -> This is a generic, unstructured client. -> We need it because we are constructing the webhookTestV1

We are testing webhook behavior for a CRD (WebhookTest) defined by an operator. That is why we used it. To do:

https://github.com/openshift/operator-framework-operator-controller/pull/442/files/697e82c8242753c96b004573a908a9fd116e1cc7#diff-27e72954a06af3e527121bb5416f87b6b262bba32643109274b9d7a54fc464d9R235-R263

The dynamic client provides a way to create, get, and delete arbitrary resources using only their Group/Version/Resource identifiers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... so we can't add that API to our scheme? These are coreos APIs... But I see your point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our APIs, we add the schema once in the test env and then use typed objects.

For the operator under test, it doesn’t seems to make sense to import and register its schema for all tests. We don’t know where its Go types are, and we don’t want to depend on them. Instead, we just use the dynamic client with unstructured.

I think we can might say

  • Our APIs (same repo) + K8s : add schema + typed k8sClient is fine.
  • Other operator’s CRDs/thridy-parties : it is easier use dynamic client.

@tmshort
Copy link
Contributor

tmshort commented Aug 19, 2025

Most of the nits, can be fixed up later, but I do question the use of dynamicClient, when the k8sClient ought to be usable with the correct scheme.
Also, since we're dumping out a secret, specifically a certificate, it would be better presented as a certificate than some random secret, which might be flagged as a security/data leak.

@tmshort
Copy link
Contributor

tmshort commented Aug 19, 2025

Dumping out the secret as a secret is not all that useful, having it printed out as a certificate will be much more useful, and since the key is not included in the dump, we won’t be potentially tagged with data leakage. If you can fix that, I’d be OK with the PR, and the nits fixed later

- Add dumping of container logs and `kubectl describe pods` output for better diagnostics.
- Include targeted certificate details dump (`tls.crt` parse) when failures occur.
- Add additional check to verify webhook responsiveness after certificate rotation.

This change is a refactor of code from openshift/origin#30059.

Assisted-by: Gemini
@camilamacedo86 camilamacedo86 force-pushed the add-webhook-tests-with-skip-certca branch from 697e82c to 1f2debf Compare August 19, 2025 15:58

fmt.Fprintf(GinkgoWriter, "[diag] Subject: %s\n", cert.Subject.String())
fmt.Fprintf(GinkgoWriter, "[diag] Issuer: %s\n", cert.Issuer.String())
fmt.Fprintf(GinkgoWriter, "[diag] Serial Number: %X\n", cert.SerialNumber)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmshort ^ added :-)

@camilamacedo86
Copy link
Contributor Author

@tmshort

Thank you a lot for your help !!!
All nits and comments is addressed we remove the dump of the secret!

Comment on lines +75 to +80
// Add quotes only if whitespace or special chars are present
if strings.ContainsAny(a, " \t") {
quoted[i] = fmt.Sprintf("%q", a)
} else {
quoted[i] = a
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@tmshort
Copy link
Contributor

tmshort commented Aug 19, 2025

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2025
Copy link
Contributor

openshift-ci bot commented Aug 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anik120, camilamacedo86, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2025
@camilamacedo86
Copy link
Contributor Author

/test openshift-e2e-aws

Copy link
Contributor

openshift-ci bot commented Aug 19, 2025

@camilamacedo86: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 1f2debf link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit fbdba7b into openshift:main Aug 19, 2025
12 of 13 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-olm-operator-controller
This PR has been included in build ose-olm-operator-controller-container-v4.20.0-202508192316.p0.gfbdba7b.assembly.stream.el9.
All builds following this will include this PR.

@camilamacedo86 camilamacedo86 deleted the add-webhook-tests-with-skip-certca branch August 19, 2025 23:46
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-olm-catalogd
This PR has been included in build ose-olm-catalogd-container-v4.20.0-202508192316.p0.gfbdba7b.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants