-
Notifications
You must be signed in to change notification settings - Fork 56
Closed
Description
There is a case, when the cluster is out of capacity and MCAD fails to create a cluster, where wait_ready()
will fail with:
TypeError: 'MissingModel' object is not callable
Failed to init Ray cluster, error 'MissingModel' object is not callable
This appears to be caused by line 469 in _map_to_app_wrapper()
when no state yet exists.
codeflare-sdk/src/codeflare_sdk/cluster/cluster.py
Lines 465 to 472 in 0d9b23c
def _map_to_app_wrapper(cluster) -> AppWrapper: | |
cluster_model = cluster.model | |
return AppWrapper( | |
name=cluster.name(), | |
status=AppWrapperStatus(cluster_model.status.state.lower()), | |
can_run=cluster_model.status.canrun, | |
job_state=cluster_model.status.queuejobstate, | |
) |
We should updatecluster.status()
to account for this instance and keep the cluster status "pending" until it is resolved, or times out.
We should also add some error handling to _map_to_app_wrapper()
so that the reason for failure is clear and the correct value is passed up to cluster.status()
.
Orginal Request from Slack:
https://project-codeflare.slack.com/archives/C04PF8V5MB3/p1689336080924819
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done