-
Notifications
You must be signed in to change notification settings - Fork 633
[k8s] Support for switching k8s contexts #3913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @romilbhardwaj! Looks mostly good to me with minor comments for an interface.
sky/adaptors/kubernetes.py
Outdated
@@ -66,7 +59,7 @@ def wrapped(*args, **kwargs): | |||
return decorated_api | |||
|
|||
|
|||
def _load_config(): | |||
def _load_config(context: str = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _load_config(context: str = None): | |
def _load_config(context: Optional[str] = None): |
sky/backends/backend_utils.py
Outdated
@@ -1559,6 +1559,7 @@ def check_owner_identity(cluster_name: str) -> None: | |||
|
|||
cloud = handle.launched_resources.cloud | |||
current_user_identity = cloud.get_current_user_identity() | |||
supported_user_identities = cloud.get_supported_identities() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we directly make get_current_user_identity
as a list, instead of adding a new API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little confused by our implementation which made me add this new method.
Currently, we iterate on the zip(owner_identity, current_user_identity)
to check identities:
skypilot/sky/backends/backend_utils.py
Lines 1578 to 1586 in 2e545b8
# It is OK if the owner identity is shorter, which will happen when | |
# the cluster is launched before #1808. In that case, we only check | |
# the same length (zip will stop at the shorter one). | |
for i, (owner, | |
current) in enumerate(zip(owner_identity, | |
current_user_identity)): | |
# Clean up the owner identity for the backslash and newlines, caused | |
# by the cloud CLI output, e.g. gcloud. | |
owner = owner.replace('\n', '').replace('\\', '') |
Say the user had launched the cluster with just one identity configured in their kubeconfig, and thus owner_identity
is ["myid1"]
.
Say they updated their kubeconfig and now they have 3 identities in current_user_identity
, the original one being 3rd in the list: ["myid3", "myid2", "myid1"]
.
zip(owner_identity, current_user_identity)
will now be ["myid1", "myid3"]
, which will fail the check we have.
>>> x=["myid1"]
>>> y=["myid3", "myid2", "myid1"]
>>> list(zip(x,y))
[('myid1', 'myid3')]
For k8s this logic should be changed to itertools.product instead of zip
, but I think that will affect AWS implementation, which has a stricter ordering requirement to handle [user_id, AccountId]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we have a list for the current_user_identity
and owner_identity
is that it can be fine for a cloud owner with a different cloud account but from the same project to operate a VM, for example, we allows the following to match
owner_identity = ['myaccount1', 'myproject']
current_identity = ['myaccount2', 'myproject']
for i, (owner,
current) in enumerate(zip(owner_identity,
current_user_identity)):
In order to support multiple current_identity
, we can do something like the following:
owner_identity = ['myaccount1', 'myproject']
current_identity = [['myaccount2', 'myproject']]
for available_identity in current_identities:
for i, (owner,
current) in enumerate(zip(owner_identity, available_identity)):
And for k8s, it can be:
owner_identity = ['mycontext']
current_identity = [['mycontext1'], ['mycontext2']]
for available_identity in current_identities:
for i, (owner,
current) in enumerate(zip(owner_identity, available_identity)):
Thanks @Michaelvll! I've updated our identity logic:
We can roll these into one method as well (i.e., just have |
Tested:
|
This PR now also fixes a bug introduced by #3897 where if a namespace was passed but no context was set (e.g., when happen inside a controller pod when using incluster auth) to |
Backward compatibility is now fixed. TODO:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @romilbhardwaj! Mostly looks good to me.
# Setup service for SSH jump pod. We create the SSH jump service here | ||
# because we need to know the service IP address and port to set the | ||
# ssh_proxy_command in the autoscaler config. | ||
kubernetes_utils.setup_ssh_jump_svc(ssh_jump_name, namespace, | ||
kubernetes_utils.setup_ssh_jump_svc(ssh_jump_name, namespace, context, | ||
service_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need a jump pod for k8s? I thought we have removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still used when kubernetes.networking
is nodeport
(we will deprecate that soon)
sky/clouds/cloud.py
Outdated
@classmethod | ||
def get_user_identities(cls) -> Optional[List[List[str]]]: | ||
"""Returns all available user identities of this cloud. | ||
|
||
See get_active_user_identity for definition of user identity. | ||
|
||
This method returns a list of user identities, with the current active | ||
identity being the first element. Most clouds have only one identity | ||
available, so the returned list will only have one element: the current | ||
active identity. | ||
|
||
However, some clouds (e.g., Kubernetes) can have multiple current | ||
identities (e.g., multiple contexts configured in kubeconfig) that | ||
can be dynamically switched, so the list can have multiple elements. | ||
|
||
Example return values: | ||
- AWS: [[UserId, AccountId]] | ||
- GCP: [[email address + project ID]] | ||
- Azure: [[email address + subscription ID]] | ||
- Kubernetes: [[current active context], [context 2], ...] | ||
|
||
Returns: | ||
None if the cloud does not have a concept of user identity; | ||
otherwise all the user identities. | ||
""" | ||
active_identity = cls.get_active_user_identity() | ||
return [active_identity] if active_identity is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found calling get_active_user_identity
within get_user_identities
a bit weird to me. Should we instead make get_user_identities
always return a list of identities with the active one being the first one, and having get_active_user_identity
to index the returned list of identities?
With that, we can also add TODO
in AWS and GCP's implementation: Return a list of identities in the profile, when we support automatically switching context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, fixed now!
sky/provision/kubernetes/utils.py
Outdated
# We add a version suffix to the port-forward proxy command to ensure backward | ||
# compatibility and avoid overwriting the older version. | ||
PORT_FORWARD_PROXY_CMD_PATH = ('~/.sky/' | ||
'kubernetes-port-forward-proxy-command-v2.sh') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can save the version of the proxy command in constants, if we expect it can change in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yeah we don't have a constants for kubernetes yet, but I've moved it to it's own variable for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romilbhardwaj for the great effort! LGTM.
…o k8s_multik8s_state2 # Conflicts: # sky/adaptors/kubernetes.py
Verified it works with the job controller as well and |
Previously, if the user switched kube contexts and tried
exec
/down
or other operations on a cluster, they would fail with a ClusterIdentityMismatch error.This PR changes the behavior to allow the user to operate on clusters even if the
current-context
in the kubeconfig is different from the context that was used to launch the cluster.This is implemented by storing kubecontext information as a part of the cluster config and making all our Kubernetes API calls context-sensitive. Since this effectively requires switching "identities", we also add a
cloud.get_supported_identities
method to get a list of all identities SkyPilot can switch to.I considered using a less invasive approach (which does not require changing all API calls to be context sensitive), one which used a python contextmanager to temporarily set the context globally in
adaptors.kubernetes._load_config()
, but that was quickly discarded since many SkyPilot operations (e.g., down) are run in parallel and thread safety becomes hard.TODO:
Tested (run the relevant ones):
bash format.sh
test_kubernetes_context_switch