-
Notifications
You must be signed in to change notification settings - Fork 632
[jobs] multi-user managed jobs #4787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jobs] multi-user managed jobs #4787
Conversation
ca5a529
to
79883fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cg505! Mostly looks good to me. Left several comments.
sky/jobs/utils.py
Outdated
@@ -1044,11 +1072,14 @@ def generate_details(failure_reason: Optional[str]) -> str: | |||
if not managed_job_status.is_terminal(): | |||
status_str += f' (task: {current_task_id})' | |||
|
|||
job_id = job_hash[1] if tasks_have_user else job_hash | |||
user_values = generate_user_values(job_tasks[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
user_values = generate_user_values(job_tasks[0]) | |
user_values = get_user_columns(job_tasks[0]) |
@@ -3,7 +3,7 @@ | |||
# API server version, whenever there is a change in API server that requires a | |||
# restart of the local API server or error out when the client does not match | |||
# the server version. | |||
API_VERSION = '1' | |||
API_VERSION = '2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Is it possible that we only raise an update error when the user actually use -u
with sky jobs queue
or sky jobs cancel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, it may require some pydantic investigation etc. Do we think this is quite valuable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine for now, but once we have more user on client-server, it would be better to avoid frequent API version upgrade.
It was already bumped in an earlier commit in the PR.
/quicktest-core |
Instead of writing to an env file locally, we will just do it as part of the run commands.
e07011e
to
054d020
Compare
/quicktest-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cg505! LGTM. We should add a test in https://github.com/skypilot-org/skypilot/blob/b55a050fd5a6a846e0d002abb6ddb22880ba871a/tests/smoke_tests/test_api_server.py
and run the smoke test for managed jobs.
btw, the We should run backward compatibility tests manually. |
Manually also broken after client server merged. Fixed in #4741. |
/quicktest-core |
9977f3d
to
74487b6
Compare
/quicktest-core |
/smoke-test -k test_multi_tenant |
/smoke-test --managed-jobs |
Changes managed jobs to work like clusters
sky jobs queue
shows only your jobs, runsky jobs queue -u
to show all jobssky jobs cancel -a
cancels only your jobs, runsky jobs cancel -u
to cancel all users' jobsCloses #4686.
Tested (run the relevant ones):
bash format.sh
conda deactivate; bash -i tests/backward_compatibility_tests.sh