Skip to content

Conversation

ehearne-redhat
Copy link

@ehearne-redhat ehearne-redhat commented Oct 20, 2025

See https://issues.redhat.com/browse/OCPBUGS-62726 for reference .

What:

  • CEL expression added to enforce name as cluster on kubedescheduler instances.

How:

  • CEL validation expression added to pkg/apis/descheduler/v1/types_descheduler.go to enforce metadata.name == cluster

Why:

  • If name is not cluster, there is no error to handle it to the user. This ensures the user knows why exactly their kubedescheduler instance did not start if the name was not cluster .

How to test

  1. Clone ehearne-redhat's fork of this repository.
  2. Change to OCPBUGS-62726-cel-enforce-singleton-naming branch.
  3. Launch a 4.20 cluster.
  4. Log into the cluster via CLI and Console.

Test via Console

  1. Install Kube Descheduler Operator from OperatorHub. You can find this in Ecosystem --> Software Catalog and then search for kube descheduler .
  2. Apply the manifest manifests/kube-descheduler-operator.crd.yaml to the cluster via the CLI --> oc apply -f manifests/kube-descheduler-operator.crd.yaml .
  3. Try to create a Kube Descheduler Instance in the console with an invalid name.
    a. Go to Ecosystem --> Installed Operators.
    b. Change the project to openshift-kube-descheduler-operator.
    c. Click on the Kube Descheduler Operator .
    d. Click on the Kube Descheduler tab, then click on the blue Create KubeDescheduler button .
    e. Change the Name field to something other than cluster. E.g. not-cluster . Scroll to the bottom and click Create.
    f. You should see the following error message: image
    g. Now try to create the instance using the name cluster . The instance should create as normal. Delete the instance and test via CLI below.

Test via CLI

  1. Create a YAML file using this format.
  2. Change .metadata.name to something other than cluster .
  3. Apply the YAML file --> oc apply -f <your yaml filename>.yaml .
  4. You should receive the following error: The KubeDescheduler "not-cluster" is invalid: <nil>: Invalid value: "object": kubedescheduler is a singleton, .metadata.name must be 'cluster'
  5. Change .metadata.name to cluster and reapply using the same command as above.
  6. You should be able to create an instance and see a message similar to: kubedescheduler.operator.openshift.io/cluster created .
  7. You can now shut down the cluster.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 20, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 20, 2025
@openshift-ci-robot
Copy link

@ehearne-redhat: This pull request references Jira Issue OCPBUGS-62726, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kasturinarra

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

See https://issues.redhat.com/browse/OCPBUGS-62726 for reference .

What:

  • CEL expression added to enforce name as cluster on kubedescheduler instances.

How:

  • CEL validation expression added to pkg/apis/descheduler/v1/types_descheduler.go to enforce metadata.name == cluster

Why:

  • If name is not cluster, there is no error to handle it to the user. This ensures the user knows why exactly their kubedescheduler instance did not start if the name was not cluster .

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ehearne-redhat
Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ehearne-redhat
Copy link
Author

Will test change on 4.20 cluster through custom built image and update description on how to test change tomorrow.

Copy link

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to also regenerate the CustomResourceDefinition.

Looks like make regen-crd is what you'll need [1].

  1. regen-crd:
    go build -o _output/tools/bin/controller-gen ./vendor/sigs.k8s.io/controller-tools/cmd/controller-gen
    cp manifests/kube-descheduler-operator.crd.yaml manifests/operator.openshift.io_kubedeschedulers.yaml
    ./_output/tools/bin/controller-gen crd paths=./pkg/apis/descheduler/v1/... schemapatch:manifests=./manifests output:crd:dir=./manifests
    mv manifests/operator.openshift.io_kubedeschedulers.yaml manifests/kube-descheduler-operator.crd.yaml

@ehearne-redhat ehearne-redhat force-pushed the OCPBUGS-62726-cel-enforce-singleton-naming branch from 0784906 to c3027f1 Compare October 21, 2025 08:14
@ehearne-redhat
Copy link
Author

Had some build issues - will update tomorrow.

@ehearne-redhat
Copy link
Author

/retest

@ehearne-redhat
Copy link
Author

ehearne-redhat commented Oct 22, 2025

Hello - can confirm that the CEL expression works in console and through cli

Console

image

CLI

ehearne-mac:cluster-kube-descheduler-operator ehearne$ oc apply -f file.yaml 
The KubeDescheduler "not-cluster" is invalid: <nil>: Invalid value: "object": kubedescheduler is a singleton, .metadata.name must be 'cluster'

For some reason, .metadata.name is treated as object and not string. This results in an unclear message being seen on both CLI and console. Is there a way around this?

@openshift-ci-robot
Copy link

@ehearne-redhat: This pull request references Jira Issue OCPBUGS-62726, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @kasturinarra

In response to this:

See https://issues.redhat.com/browse/OCPBUGS-62726 for reference .

What:

  • CEL expression added to enforce name as cluster on kubedescheduler instances.

How:

  • CEL validation expression added to pkg/apis/descheduler/v1/types_descheduler.go to enforce metadata.name == cluster

Why:

  • If name is not cluster, there is no error to handle it to the user. This ensures the user knows why exactly their kubedescheduler instance did not start if the name was not cluster .

How to test

  1. Clone ehearne-redhat's fork of this repository.
  2. Change to OCPBUGS-62726-cel-enforce-singleton-naming branch.
  3. Launch a 4.20 cluster.
  4. Log into the cluster via CLI and Console.

Test via Console

  1. Install Kube Descheduler Operator from OperatorHub. You can find this in Ecosystem --> Software Catalog and then search for kube descheduler .
  2. Apply the manifest manifests/kube-descheduler-operator.crd.yaml to the cluster via the CLI --> oc apply -f manifests/kube-descheduler-operator.crd.yaml .
  3. Try to create a Kube Descheduler Instance in the console with an invalid name.
    a. Go to Ecosystem --> Installed Operators.
    b. Change the project to openshift-kube-descheduler-operator.
    c. Click on the Kube Descheduler Operator .
    d. Click on the Kube Descheduler tab, then click on the blue Create KubeDescheduler button .
    e. Change the Name field to something other than cluster. E.g. not-cluster . Scroll to the bottom and click Create.
    f. You should see the following error message: image
    g. Now try to create the instance using the name cluster . The instance should create as normal. Delete the instance and test via CLI below.

Test via CLI

  1. Create a YAML file using this format.
  2. Change .metadata.name to something other than cluster .
  3. Apply the YAML file --> oc apply -f <your yaml filename>.yaml .
  4. You should receive the following error: The KubeDescheduler "not-cluster" is invalid: <nil>: Invalid value: "object": kubedescheduler is a singleton, .metadata.name must be 'cluster'
  5. Change .metadata.name to cluster and reapply using the same command as above.
  6. You should be able to create an instance and see a message similar to: kubedescheduler.operator.openshift.io/cluster created .
  7. You can now shut down the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ehearne-redhat ehearne-redhat changed the title [WIP] OCPBUGS-62726: add CEL expression to enforce name cluster on singletons OCPBUGS-62726: add CEL expression to enforce name cluster on singletons Oct 22, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2025
@ehearne-redhat
Copy link
Author

@ingvagabund kindly requesting your review on this PR as validation steps complete :)

@ehearne-redhat
Copy link
Author

@everettraven would you have any thoughts on the strange error format seen in an above comment?

@ingvagabund
Copy link
Member

Generating the CRD is a semi automatic step here. I suppose this could be improved, yet there's still something new to learn about the generators. Besides those few comments this looks good. Thank you for improving this :)

@ehearne-redhat ehearne-redhat force-pushed the OCPBUGS-62726-cel-enforce-singleton-naming branch from c3027f1 to f3c037b Compare October 22, 2025 12:35
@everettraven
Copy link

everettraven commented Oct 22, 2025

@everettraven would you have any thoughts on the strange error format seen in an above comment?

@ehearne-redhat It is likely because of where the validation is placed (i.e KubeDescheduler is an openapi "object"). If you wanted to have a more granular error message, I believe you can set a field path to point directly to the field that is in error.

See https://book.kubebuilder.io/reference/markers/crd-validation for more information. It is easiest to find if you search the page for XValidation

@ehearne-redhat
Copy link
Author

So it looks like from discussions here that it is not possible to use CEL expression to include metadata.name values within the error, but it will work to validate the field. That's why <nil> shows up in the error message.

To have a cleaner error message would involve using a validation webhook. Otherwise the user would see this unclear message when creating the kubedescheduler instance with an invalid name through CLI and on Console.

Is having <nil> in the error message acceptable? @ingvagabund If so, I'm happy to re-request a review and hopefully get this merged.

@ehearne-redhat
Copy link
Author

/retest

@everettraven
Copy link

everettraven commented Oct 22, 2025

@ehearne-redhat Even if you specify +kubebuilder:validation:XValidation:=rule="...",message="...",fieldPath=".metadata.name" (field path may not need to start with . i dont recall exactly) it shows the nil field ?

@ingvagabund
Copy link
Member

If there's a way it's better to replace nil with a more readable alternative.

@ehearne-redhat
Copy link
Author

@everettraven @ingvagabund I will try this out and update you shortly.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 22, 2025

@ehearne-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator f3c037b link true /test e2e-aws-operator
ci/prow/images f3c037b link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ehearne-redhat
Copy link
Author

@everettraven so I can add fieldPath='metadata.name' and make regen-crd does regenerate, but applying it to the cluster throws the following error:

ehearne-mac:cluster-kube-descheduler-operator ehearne$ oc apply -f manifests/kube-descheduler-operator.crd.yaml
Warning: resource customresourcedefinitions/kubedeschedulers.operator.openshift.io is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically.
The CustomResourceDefinition "kubedeschedulers.operator.openshift.io" is invalid: spec.validation.openAPIV3Schema.x-kubernetes-validations[0].fieldPath: Invalid value: "metadata.name": fieldPath must be a valid path

I have checked other operators built with similar logic to this, such as Kueue Operator, which also has this similar error:
image

I was able to track when they first implemented similar logic here . However, I could not find any comment about this behaviour there.

So, it looks like this way of implementing the enforcement of name cluster on singletons is quite common within OpenShift . It still seems strange that we would allow <nil> error messages to be seen by the user.

I am happy to implement a better solution using a validation webhook if preferred, but given the following information I will leave it to you @ingvagabund to decide on that. :)

@everettraven
Copy link

@ehearne-redhat I wonder if it is considering that path invalid because it is missing the leading dot? Looking at the tests for path validation in https://github.com/kubernetes/apiextensions-apiserver/blob/4c7c8214a2fa680ac4f485e8ed8c52a248bafb7a/pkg/apiserver/schema/cel/validation_test.go#L3750-L3756 it looks like it wants a leading dot in the path.

Does using .metadata.name (note that this starts with .) resolve the issue?

@everettraven
Copy link

If not, I don't think it is a huge deal. It is pretty standard practice for us to have this check for cluster singletons and I don't think going down the path of a validating webhook is worth it for this small of an issue.

@ehearne-redhat
Copy link
Author

@everettraven I should have mentioned that I did try different combinations for fieldPath such as name, Name, .metadata.name, etc. until metadata.name was accepted by controller-gen but application failed.

I think that makes sense, because it is so commonly used anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants