Skip to content

Update to AppWrapper v0.20.2 #581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dgrove-oss
Copy link
Collaborator

@ChristianZaccaria @sutaakar -- I believe this fixes the problem with a RayCluster wrapped in an AppWrapper being reset every 5 minutes (https://issues.redhat.com/browse/RHOAIENG-8834 and discussed in #521).

The problem is that ODH fork of Kueue sets waitForPodsReady.enabled to true (https://github.com/opendatahub-io/kueue/blob/dev/config/components/manager/controller_manager_config.yaml#L25-L26) while the AppWrapper controller was assuming that waitForPodsReady.enabled was false (which is the default in Kueue). As a result, the AppWrapper Job Reconciler loop was not updating the PodsReady condition in the Workload object (this write is conditional on waitForPodsReady being true).

We did a quick fix in AppWrapper 0.20.2. The right longer term fix is probably for the codeflare operator to read the Kueue configuration on startup so we can ensure that the two parts of the system are in synch.

Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried locally, works well

Copy link

openshift-ci bot commented Jun 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sutaakar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 6bc0bc7 into project-codeflare:main Jun 28, 2024
8 checks passed
@dgrove-oss dgrove-oss deleted the appwrapper-bump branch June 28, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants