Skip to content

NO-ISSUE: Handle service-ca cert availability/rotation #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 17, 2025

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Aug 15, 2025

There is a problem when the service-ca certificate is not available at pod start. This is an issue because the SystemCertPool is created from SSL_CERT_DIR, which may include the empty service-ca. The SystemCertPool is never regenerated during the lifetime of the program execution, so it will never get updated when the service-ca is filled. Thus, we need to use --pull-cas-dir to reference the CAs that we want to use. This will also allow OLMv1 to reload the service-ca when it is reloaded (after 2 years, mind you). Removing the SSL_CERT_DIR setting, and adding the --pull-cas-dir flag ought to be equivalent to what we have now (i.e. SSL_CERT_DIR and no --pull-cas-dir), except that rotation will be handled better.

There is problem when the service-ca certificate is not available at pod start.
This is an issue because the SystemCertPool is created from SSL_CERT_DIR,
which may include the empty service-ca. The SystemCertPool is never regenerated
during the lifetime of the program execution, so it will never get updated when
the service-ca is filled. Thus, we need to use --pull-cas-dir to reference the
CAs that we want to use. This will also allow OLMv1 to reload the service-ca
when it is reloaded (after 2 years, mind you). Removing the SSL_CERT_DIR setting,
and adding the --pull-cas-dir flag ought to be equivalent to what we have now
(i.e. SSL_CERT_DIR and no --pull-cas-dir), except that rotation will be handled
better.

Signed-off-by: Todd Short <[email protected]>
@openshift-ci openshift-ci bot requested review from bentito and perdasilva August 15, 2025 13:21
Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2025
@tmshort
Copy link
Contributor Author

tmshort commented Aug 15, 2025

/payload-aggregate openshift-e2e-aws-techpreview 5

Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@tmshort
Copy link
Contributor Author

tmshort commented Aug 15, 2025

/payload-aggregate ci/prow/openshift-e2e-aws-techpreview 5

Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

@tmshort: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 15, 2025

/payload-aggregate periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview 10

Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ba047b60-79db-11f0-821b-e294e8e344e7-0

@tmshort tmshort changed the title UPSTREAM: <carry>: Handle service-ca cert availability/rotation NO-ISSUE: Handle service-ca cert availability/rotation Aug 15, 2025
@openshift-ci-robot
Copy link

@tmshort: This pull request explicitly references no jira issue.

In response to this:

There is a problem when the service-ca certificate is not available at pod start. This is an issue because the SystemCertPool is created from SSL_CERT_DIR, which may include the empty service-ca. The SystemCertPool is never regenerated during the lifetime of the program execution, so it will never get updated when the service-ca is filled. Thus, we need to use --pull-cas-dir to reference the CAs that we want to use. This will also allow OLMv1 to reload the service-ca when it is reloaded (after 2 years, mind you). Removing the SSL_CERT_DIR setting, and adding the --pull-cas-dir flag ought to be equivalent to what we have now (i.e. SSL_CERT_DIR and no --pull-cas-dir), except that rotation will be handled better.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 15, 2025
Copy link
Contributor

openshift-ci bot commented Aug 15, 2025

@camilamacedo86: This PR was included in a payload test run from #430
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/84478cd0-79f7-11f0-868c-f17f1c5c9461-0

@tmshort
Copy link
Contributor Author

tmshort commented Aug 15, 2025

/test openshift-e2e-aws-techpreview
/test openshift-e2e-aws
Neither of the runs of openshift-e2e-aws openshift-e2e-aws-techpreview had the problem indicator (i.e. empty service-ca), so they need to be run again anyways,.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 15, 2025

/test openshift-e2e-aws-techpreview
/test openshift-e2e-aws
Neither of the 2nd runs of openshift-e2e-aws openshift-e2e-aws-techpreview had the problem indicator (i.e. empty service-ca), so they need to be run again anyways.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 16, 2025

/test openshift-e2e-aws-techpreview
/test openshift-e2e-aws
Neither of the 3rd runs of openshift-e2e-aws openshift-e2e-aws-techpreview had the problem indicator (i.e. empty service-ca), so they need to be run again anyways.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 16, 2025

/test openshift-e2e-aws-techpreview
/test openshift-e2e-aws
Neither of the 4th runs of openshift-e2e-aws openshift-e2e-aws-techpreview had the problem indicator (i.e. empty service-ca), so they need to be run again anyways.

Copy link
Contributor

openshift-ci bot commented Aug 16, 2025

@tmshort: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 9cc13d8 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 16, 2025

/test openshift-e2e-aws-techpreview
/test openshift-e2e-aws
Neither of the 5th runs of openshift-e2e-aws openshift-e2e-aws-techpreview had the problem indicator (i.e. empty service-ca), so they need to be run again anyways.
This will be the last time.
It's obvious that this configuration works, but we've yet to encounter the reproduction scenario.

@tmshort
Copy link
Contributor Author

tmshort commented Aug 16, 2025

Neither of the 6th runs of openshift-e2e-aws openshift-e2e-aws-techpreview had the problem indicator (i.e. empty service-ca), so they need to be run again anyways.

Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that sorted out
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 1a9f810 into openshift:main Aug 17, 2025
11 of 12 checks passed
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-olm-operator-controller
This PR has been included in build ose-olm-operator-controller-container-v4.20.0-202508171444.p0.g1a9f810.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-olm-catalogd
This PR has been included in build ose-olm-catalogd-container-v4.20.0-202508171444.p0.g1a9f810.assembly.stream.el9.
All builds following this will include this PR.

@tmshort tmshort deleted the update-cert-handling branch August 17, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants