Skip to content

Conversation

@jrvaldes
Copy link
Contributor

@jrvaldes jrvaldes commented Nov 24, 2025

This commit ensures the hybridOverlay service receives explicitly the address of the API Server Endpoint along with the cacert to enable certificate rotation.

The --bootstrap-kubeconfig flag should be enough but there is a bug
in OVN-K HybridOverlay [1] where the apiserver and cacert are not extracted
from the bootstrap information, hence introducing the --k8s-apiserver and
--k8s-cacert flags as part of the command.

[1] https://issues.redhat.com/browse/OCPBUGS-65856

hybrid-overlay.log before:

I1124 15:56:01.961205    1104 config.go:2474] Kubernetes config: {BootstrapKubeconfig:C:\k\kubeconfig CertDir:C:\k\cni\config CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:  Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:172.16.1.0/24 ServiceCIDRs:[172.16.1.0/24] OVNConfigNamespace:ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:<nil> HostNetworkNamespace: DisableRequestedChassis:false PlatformType: HealthzBindAddress: CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:kube-system DNSServiceName:kube-dns}

hybrid-overlay.log with proposed implementation, see the APIServer and CACert fields populated

I1124 15:56:01.961205    1104 config.go:2474] Kubernetes config: {BootstrapKubeconfig:C:\k\kubeconfig CertDir:C:\k\cni\config CertDuration:10m0s Kubeconfig: CACert:C:\k\bootstrap-ca.crt CAData:[] APIServer:https://api-int.jvaldes.vmc.devcluster.openshift.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:172.16.1.0/24 ServiceCIDRs:[172.16.1.0/24] OVNConfigNamespace:ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:<nil> HostNetworkNamespace: DisableRequestedChassis:false PlatformType: HealthzBindAddress: CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:kube-system DNSServiceName:kube-dns}

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jrvaldes
Copy link
Contributor Author

hybrid-overlay.log showing successful rotation with 10min expiration:

I1124 16:01:50.993123    1104 certificate_manager.go:566] "Rotating certificates" logger="kubernetes.io/kube-apiserver-client"  
I1124 16:01:51.000616    1104 reflector.go:357] "Starting reflector" logger="kubernetes.io/kube-apiserver-client" type="*v1.CertificateSigningRequest" resyncPeriod="0s" reflector="k8s.io/client-go/tools/watch/informerwatcher.go:162"  
I1124 16:01:51.000616    1104 reflector.go:403] "Listing and watching" logger="kubernetes.io/kube-apiserver-client" type="*v1.CertificateSigningRequest" reflector="k8s.io/client-go/tools/watch/informerwatcher.go:162"  
I1124 16:01:51.001563    1104 reflector.go:430] "Caches populated" logger="kubernetes.io/kube-apiserver-client" type="*v1.CertificateSigningRequest" reflector="k8s.io/client-go/tools/watch/informerwatcher.go:162"  
I1124 16:01:51.005853    1104 csr.go:274] "Certificate signing request is approved, waiting to be issued" logger="kubernetes.io/kube-apiserver-client" csr="csr-n7l5w"  
I1124 16:01:51.013189    1104 csr.go:270] "Certificate signing request is issued" logger="kubernetes.io/kube-apiserver-client" csr="csr-n7l5w"  
I1124 16:01:51.013189    1104 reflector.go:363] "Stopping reflector" logger="kubernetes.io/kube-apiserver-client" type="*v1.CertificateSigningRequest" resyncPeriod="0s" reflector="k8s.io/client-go/tools/watch/informerwatcher.go:162"  
I1124 16:01:52.023255    1104 certificate_manager.go:715] "Certificate rotation deadline determined" logger="kubernetes.io/kube-apiserver-client" expiration="2025-11-24 16:11:51 +0000 UTC" deadline="2025-11-24 16:09:59.674187978 +0000 UTC"  
I1124 16:01:52.023360    1104 certificate_manager.go:431] "Waiting for next certificate rotation" logger="kubernetes.io/kube-apiserver-client" sleep="8m7.650827778s"  
I1124 16:01:52.970199    1104 cert_rotation.go:92] "Certificate rotation detected, shutting down client connections to start using new credentials" logger="tls-transport-cache"  
I1124 16:01:52.970199    1104 streamwatcher.go:123] "Unable to decode an event from the watch stream" err="read tcp 192.168.222.128:49691->192.168.222.31:6443: use of closed network connection"  
I1124 16:01:52.970199    1104 reflector.go:946] "Watch close" reflector="k8s.io/client-go/informers/factory.go:160" type="*v1.Node" totalItems=49  
I1124 16:01:52.971824    1104 streamwatcher.go:123] "Unable to decode an event from the watch stream" err="read tcp 192.168.222.128:49691->192.168.222.31:6443: use of closed network connection"  
I1124 16:01:52.971824    1104 reflector.go:946] "Watch close" reflector="k8s.io/client-go/informers/factory.go:160" type="*v1.Pod" totalItems=71  
I1124 16:02:07.184385    1104 informer.go:314] Successfully synced 'win-webserver/win-webserver-85d5d4469b-gfb7p'  
I1124 16:02:07.189861    1104 informer.go:314] Successfully synced 'win-webserver/win-webserver-85d5d4469b-hmcpf'

Flow:

  1. CSR Created and Approved (lines 2-4): The reflector successfully lists and watches the CertificateSigningRequest, and the cache is populated immediately.
  2. CSR Approved and Issued (lines 5-6):
"Certificate signing request is approved" (16:01:51.005853)
"Certificate signing request is issued" (16:01:51.013189)
  1. The CSR csr-n7l5w was approved and issued successfully within ~7ms.
  • Certificate Rotation Deadline Set (line 8):
Certificate rotation deadline determined" expiration="2025-11-24 16:11:51" deadline="2025-11-24 16:09:59
  1. The system calculated the next rotation will occur at 16:09:59 (about 1.9 minutes before the 10-minute expiration).
  2. Graceful Connection Handling (lines 10-13): The transport cache detected the new certificate and closed existing connections gracefully. The "use of closed network connection" errors are expected and indicate the system is intentionally closing old connections to use the new certificate.
  3. Successful Resync (lines 14-15): The informers reconnected and successfully synced, processing queued items.

@jrvaldes jrvaldes changed the title [services] Add APIServer to HybridOverlay config OCPBUGS-64719: Add APIServer to HybridOverlay config Nov 24, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 24, 2025
@openshift-ci-robot
Copy link

@jrvaldes: This pull request references Jira Issue OCPBUGS-64719, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This commit ensures the hybridOverlay service receives explicitly the address of the API Server Endpoint along with the cacert to enable certificate rotation.

The --bootstrap-kubeconfig flag should be enough but there is a bug
in OVN-K HybridOverlay [1] where the apiserver and cacert are not extracted
from the bootstrap information, hence introducing the --k8s-apiserver and
--k8s-cacert flags as part of the command.

[1] https://issues.redhat.com/browse/OCPBUGS-65856

hybrid-overlay.log before:

I1124 15:56:01.961205    1104 config.go:2474] Kubernetes config: {BootstrapKubeconfig:C:\k\kubeconfig CertDir:C:\k\cni\config CertDuration:10m0s Kubeconfig: CACert: CAData:[] APIServer:  Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:172.16.1.0/24 ServiceCIDRs:[172.16.1.0/24] OVNConfigNamespace:ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:<nil> HostNetworkNamespace: DisableRequestedChassis:false PlatformType: HealthzBindAddress: CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:kube-system DNSServiceName:kube-dns}

hybrid-overlay.log with proposed implementation, see the APIServer and CACert fields populated

I1124 15:56:01.961205    1104 config.go:2474] Kubernetes config: {BootstrapKubeconfig:C:\k\kubeconfig CertDir:C:\k\cni\config CertDuration:10m0s Kubeconfig: CACert:C:\k\bootstrap-ca.crt CAData:[] APIServer:https://api-int.jvaldes.vmc.devcluster.openshift.com:6443 Token: TokenFile: CompatServiceCIDR: RawServiceCIDRs:172.16.1.0/24 ServiceCIDRs:[172.16.1.0/24] OVNConfigNamespace:ovn-kubernetes OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:<nil> HostNetworkNamespace: DisableRequestedChassis:false PlatformType: HealthzBindAddress: CompatMetricsBindAddress: CompatOVNMetricsBindAddress: CompatMetricsEnablePprof:false DNSServiceNamespace:kube-system DNSServiceName:kube-dns}

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 24, 2025
@jrvaldes jrvaldes force-pushed the 00-hybrid-overlay-add-apiserver branch from b35d84b to f6c28f7 Compare November 24, 2025 18:34
@jrvaldes
Copy link
Contributor Author

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 24, 2025

@jrvaldes: The following commands are available to trigger required jobs:

/test aws-e2e-operator
/test azure-e2e-operator
/test azure-e2e-upgrade
/test ci-bundle-wmco-bundle
/test gcp-e2e-operator
/test images
/test lint
/test nutanix-e2e-operator
/test platform-none-vsphere-e2e-operator
/test security
/test unit
/test vsphere-disconnected-e2e-operator
/test vsphere-e2e-operator
/test vsphere-proxy-e2e-operator
/test wicd-unit-vsphere

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jrvaldes
Copy link
Contributor Author

/remove-approve

@jrvaldes
Copy link
Contributor Author

/test images

@jrvaldes
Copy link
Contributor Author

/test lint

@jrvaldes
Copy link
Contributor Author

/test unit

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 24, 2025
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

Copy link
Contributor

@sebsoto sebsoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks jose!
Mainly LGTM.

Comment on lines 62 to 65
// GetAPIServerEndpoint returns the cached Kubernetes API server endpoint
func GetAPIServerEndpoint() string {
return nodeConfigCache.apiServerEndpoint
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing this through nodeConfig is weird. Its an internal cache for nodeconfig,

Can we just get the infrastructure object in this function:

func generateServicesManifest(ctx context.Context, client client.Client, port string, platform oconfig.PlatformType) (*servicescm.Data, error) {

And pass the value where we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e8bc82b

@jrvaldes jrvaldes force-pushed the 00-hybrid-overlay-add-apiserver branch from f6c28f7 to e8bc82b Compare November 26, 2025 14:37
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

This commit ensures the hybridOverlay service receives explicitly the
address of the API Server Endpoint along with the cacert to enable
certificate rotation.

 The --bootstrap-kubeconfig flag should be enough but there is a bug
 in OVN-K HybridOverlay [1] where the apiserver and cacert are not extracted
 from the bootstrap information, hence introducing the --k8s-apiserver and
 --k8s-cacert flags as part of the command.

 [1] https://issues.redhat.com/browse/OCPBUGS-65856
@jrvaldes jrvaldes force-pushed the 00-hybrid-overlay-add-apiserver branch from e8bc82b to d89e96e Compare November 26, 2025 19:45
@jrvaldes
Copy link
Contributor Author

/test vsphere-e2e-operator

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sebsoto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025
@jrvaldes jrvaldes marked this pull request as ready for review November 26, 2025 22:38
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2025
@openshift-ci openshift-ci bot requested a review from sebsoto November 26, 2025 22:39
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 27, 2025

@jrvaldes: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/vsphere-disconnected-e2e-operator d89e96e link true /test vsphere-disconnected-e2e-operator
ci/prow/aws-e2e-operator d89e96e link true /test aws-e2e-operator
ci/prow/vsphere-proxy-e2e-operator d89e96e link true /test vsphere-proxy-e2e-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants