-
Notifications
You must be signed in to change notification settings - Fork 716
[Serve] Make controller cloud choose from replica resources if it is not up #3231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
controller_resources_in_config.cloud is None): | ||
controller_clouds = requested_clouds | ||
else: | ||
controller_clouds = {controller_resources_in_config.cloud} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If controller_exist
and controller_resources_in_config.cloud is None
, the controller_clouds
will still be set to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, because if the controller exists, we don't want to trigger a resources not match error for the controller. For example, the controller exists and it is on AWS, and no controller resources in config is found; while now I try to sky serve up
a service w/ replica on GCP. In such case, we want to directly use the controller, instead of setting controller cloud to GCP and causing a resources not match on controller.
Co-authored-by: Ziming Mao <[email protected]>
sky/serve/core.py
Outdated
# If the controller and all replicas are from the same cloud, it should | ||
# provide better connectivity. We will let the controller choose from | ||
# any cloud in the resources if the controller does not exist. | ||
requested_clouds: Set['clouds.Cloud'] = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respecting cloud is good, but we should also consider respecting the regions/zones specified for the resources as well, e.g. we may want to priortize the regions requested by the service.
How about we have a TODO here and file an issue for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requested_clouds: Set['clouds.Cloud'] = set() | ||
for resources in task.resources: | ||
if resources.cloud is not None: | ||
requested_clouds.add(resources.cloud) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resources.cloud
is an object and will not be able to be distinguished by set
.
if (not controller_exist and | ||
controller_resources_in_config.cloud is None): | ||
controller_clouds = requested_clouds | ||
else: | ||
controller_clouds = {controller_resources_in_config.cloud} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider moving these logics into controller_utils
and have it shared by serve and spot controller?
@Michaelvll Thanks for the suggestion! Fixed in #3363. |
Tested (run the relevant ones):
bash format.sh
Tested with the following YAML and print out controller resources to make sure two cloud is chosen
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh