Skip to content

Removed create-cert init containers #509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 0 additions & 28 deletions docs/cluster-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,39 +22,11 @@ cluster = Cluster(ClusterConfiguration(
image="quay.io/project-codeflare/ray:latest-py39-cu118", # Mandatory Field
instascale=False, # Default False
machine_types=["m5.xlarge", "g4dn.xlarge"],
ingress_domain="example.com" # Default None, Mandatory for Vanilla Kubernetes Clusters - ingress_domain is ignored on OpenShift Clusters as a route is created.
local_interactive=False, # Default False
))
```
Note: On OpenShift, the `ingress_domain` is only required when `local_interactive` is enabled. - This may change soon.

Upon creating a cluster configuration with `mcad=True` an appwrapper will be created featuring the Ray Cluster and any Routes, Ingresses or Secrets that are needed to be created along side it.<br>
From there a user can call `cluster.up()` and `cluster.down()` to create and remove the appwrapper thus creating and removing the Ray Cluster.

In cases where `mcad=False` a yaml file will be created with the individual Ray Cluster, Route/Ingress and Secret included.<br>
The Ray Cluster and service will be created by KubeRay directly and the other components will be individually created.

## Ray Cluster Configuration in a Vanilla Kubernetes environment (Non-OpenShift)
To create a Ray Cluster using the CodeFlare SDK in a Vanilla Kubernetes environment an `ingress_domain` must be passed in the Cluster Configuration.
This is used for the creation of the Ray Dashboard and Client ingresses.

`ingress_options` can be passed to create a custom Ray Dashboard ingress, `ingress_domain` is still a required variable for the Client route/ingress.
An example of `ingress_options` would look like this.

```
ingress_options = {
"ingresses": [
{
"ingressName": "<ingress_name>",
"port": <port_number>,
"pathType": "<path_type>",
"path": "<path>",
"host":"<host>",
"annotations": {
"foo": "bar",
"foo": "bar",
}
}
]
}
```
2 changes: 0 additions & 2 deletions src/codeflare_sdk.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,9 @@ src/codeflare_sdk/cluster/cluster.py
src/codeflare_sdk/cluster/config.py
src/codeflare_sdk/cluster/model.py
src/codeflare_sdk/job/__init__.py
src/codeflare_sdk/job/jobs.py
src/codeflare_sdk/job/ray_jobs.py
src/codeflare_sdk/utils/__init__.py
src/codeflare_sdk/utils/generate_cert.py
src/codeflare_sdk/utils/generate_yaml.py
src/codeflare_sdk/utils/kube_api_helpers.py
src/codeflare_sdk/utils/openshift_oauth.py
src/codeflare_sdk/utils/pretty_print.py
30 changes: 2 additions & 28 deletions src/codeflare_sdk/cluster/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,6 @@ def create_app_wrapper(self):
mcad = self.config.mcad
instance_types = self.config.machine_types
env = self.config.envs
local_interactive = self.config.local_interactive
image_pull_secrets = self.config.image_pull_secrets
dispatch_priority = self.config.dispatch_priority
write_to_file = self.config.write_to_file
Expand All @@ -203,7 +202,6 @@ def create_app_wrapper(self):
mcad=mcad,
instance_types=instance_types,
env=env,
local_interactive=local_interactive,
image_pull_secrets=image_pull_secrets,
dispatch_priority=dispatch_priority,
priority_val=priority_val,
Expand Down Expand Up @@ -479,13 +477,6 @@ def from_k8_cluster_object(
verify_tls=True,
):
config_check()
if (
rc["metadata"]["annotations"]["sdk.codeflare.dev/local_interactive"]
== "True"
):
local_interactive = True
else:
local_interactive = False
machine_types = (
rc["metadata"]["labels"]["orderedinstance"].split("_")
if "orderedinstance" in rc["metadata"]["labels"]
Expand Down Expand Up @@ -526,19 +517,15 @@ def from_k8_cluster_object(
image=rc["spec"]["workerGroupSpecs"][0]["template"]["spec"]["containers"][
0
]["image"],
local_interactive=local_interactive,
mcad=mcad,
write_to_file=write_to_file,
verify_tls=verify_tls,
)
return Cluster(cluster_config)

def local_client_url(self):
if self.config.local_interactive == True:
ingress_domain = _get_ingress_domain(self)
return f"ray://{ingress_domain}"
else:
return "None"
ingress_domain = _get_ingress_domain(self)
return f"ray://{ingress_domain}"

def _component_resources_up(
self, namespace: str, api_instance: client.CustomObjectsApi
Expand Down Expand Up @@ -678,13 +665,6 @@ def _delete_resources(
plural="rayclusters",
name=name,
)
elif resource["kind"] == "Secret":
name = resource["metadata"]["name"]
secret_instance = client.CoreV1Api(api_config_handler())
secret_instance.delete_namespaced_secret(
namespace=namespace,
name=name,
)


def _create_resources(yamls, namespace: str, api_instance: client.CustomObjectsApi):
Expand All @@ -697,12 +677,6 @@ def _create_resources(yamls, namespace: str, api_instance: client.CustomObjectsA
plural="rayclusters",
body=resource,
)
elif resource["kind"] == "Secret":
secret_instance = client.CoreV1Api(api_config_handler())
secret_instance.create_namespaced_secret(
namespace=namespace,
body=resource,
)


def _check_aw_exists(name: str, namespace: str) -> bool:
Expand Down
1 change: 0 additions & 1 deletion src/codeflare_sdk/cluster/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ class ClusterConfiguration:
mcad: bool = False
envs: dict = field(default_factory=dict)
image: str = ""
local_interactive: bool = False
image_pull_secrets: list = field(default_factory=list)
dispatch_priority: str = None
write_to_file: bool = False
Expand Down
101 changes: 1 addition & 100 deletions src/codeflare_sdk/templates/base-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,6 @@ spec:
apiVersion: ray.io/v1
kind: RayCluster
metadata:
annotations:
sdk.codeflare.dev/local_interactive: "False"
labels:
workload.codeflare.dev/appwrapper: "aw-kuberay"
controller-tools.k8s.io: "1.0"
Expand Down Expand Up @@ -117,20 +115,7 @@ spec:
- "aw-kuberay"
containers:
# The Ray head pod
- env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: RAY_USE_TLS
value: "0"
- name: RAY_TLS_SERVER_CERT
value: /home/ray/workspace/tls/server.crt
- name: RAY_TLS_SERVER_KEY
value: /home/ray/workspace/tls/server.key
- name: RAY_TLS_CA_CERT
value: /home/ray/workspace/tls/ca.crt
name: ray-head
- name: ray-head
image: quay.io/project-codeflare/ray:latest-py39-cu118
imagePullPolicy: Always
ports:
Expand All @@ -154,12 +139,6 @@ spec:
memory: "8G"
nvidia.com/gpu: 0
volumeMounts:
- name: ca-vol
mountPath: "/home/ray/workspace/ca"
readOnly: true
- name: server-cert
mountPath: "/home/ray/workspace/tls"
readOnly: true
- mountPath: /etc/pki/tls/certs/odh-trusted-ca-bundle.crt
name: odh-trusted-ca-cert
subPath: odh-trusted-ca-bundle.crt
Expand All @@ -172,30 +151,7 @@ spec:
- mountPath: /etc/ssl/certs/odh-ca-bundle.crt
name: odh-ca-cert
subPath: odh-ca-bundle.crt
initContainers:
- command:
- sh
- -c
- cd /home/ray/workspace/tls && openssl req -nodes -newkey rsa:2048 -keyout server.key -out server.csr -subj '/CN=ray-head' && printf "authorityKeyIdentifier=keyid,issuer\nbasicConstraints=CA:FALSE\nsubjectAltName = @alt_names\n[alt_names]\nDNS.1 = 127.0.0.1\nDNS.2 = localhost\nDNS.3 = ${FQ_RAY_IP}\nDNS.4 = $(awk 'END{print $1}' /etc/hosts)\nDNS.5 = rayclient-deployment-name-$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).server-name">./domain.ext && cp /home/ray/workspace/ca/* . && openssl x509 -req -CA ca.crt -CAkey ca.key -in server.csr -out server.crt -days 365 -CAcreateserial -extfile domain.ext
image: quay.io/project-codeflare/ray:latest-py39-cu118
name: create-cert
# securityContext:
# runAsUser: 1000
# runAsGroup: 1000
volumeMounts:
- name: ca-vol
mountPath: "/home/ray/workspace/ca"
readOnly: true
- name: server-cert
mountPath: "/home/ray/workspace/tls"
readOnly: false
volumes:
- name: ca-vol
secret:
secretName: ca-secret-deployment-name
optional: false
- name: server-cert
emptyDir: {}
- name: odh-trusted-ca-cert
configMap:
name: odh-trusted-ca-bundle
Expand Down Expand Up @@ -250,40 +206,9 @@ spec:
operator: In
values:
- "aw-kuberay"
initContainers:
# the env var $RAY_IP is set by the operator if missing, with the value of the head service name
- name: create-cert
image: quay.io/project-codeflare/ray:latest-py39-cu118
command:
- sh
- -c
- cd /home/ray/workspace/tls && openssl req -nodes -newkey rsa:2048 -keyout server.key -out server.csr -subj '/CN=ray-head' && printf "authorityKeyIdentifier=keyid,issuer\nbasicConstraints=CA:FALSE\nsubjectAltName = @alt_names\n[alt_names]\nDNS.1 = 127.0.0.1\nDNS.2 = localhost\nDNS.3 = ${FQ_RAY_IP}\nDNS.4 = $(awk 'END{print $1}' /etc/hosts)">./domain.ext && cp /home/ray/workspace/ca/* . && openssl x509 -req -CA ca.crt -CAkey ca.key -in server.csr -out server.crt -days 365 -CAcreateserial -extfile domain.ext
# securityContext:
# runAsUser: 1000
# runAsGroup: 1000
volumeMounts:
- name: ca-vol
mountPath: "/home/ray/workspace/ca"
readOnly: true
- name: server-cert
mountPath: "/home/ray/workspace/tls"
readOnly: false
containers:
- name: machine-learning # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: quay.io/project-codeflare/ray:latest-py39-cu118
env:
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: RAY_USE_TLS
value: "0"
- name: RAY_TLS_SERVER_CERT
value: /home/ray/workspace/tls/server.crt
- name: RAY_TLS_SERVER_KEY
value: /home/ray/workspace/tls/server.key
- name: RAY_TLS_CA_CERT
value: /home/ray/workspace/tls/ca.crt
# environment variables to set in the container.Optional.
# Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
lifecycle:
Expand All @@ -300,12 +225,6 @@ spec:
memory: "12G"
nvidia.com/gpu: "1"
volumeMounts:
- name: ca-vol
mountPath: "/home/ray/workspace/ca"
readOnly: true
- name: server-cert
mountPath: "/home/ray/workspace/tls"
readOnly: true
- mountPath: /etc/pki/tls/certs/odh-trusted-ca-bundle.crt
name: odh-trusted-ca-cert
subPath: odh-trusted-ca-bundle.crt
Expand All @@ -319,12 +238,6 @@ spec:
name: odh-ca-cert
subPath: odh-ca-bundle.crt
volumes:
- name: ca-vol
secret:
secretName: ca-secret-deployment-name
optional: false
- name: server-cert
emptyDir: {}
- name: odh-trusted-ca-cert
configMap:
name: odh-trusted-ca-bundle
Expand All @@ -339,15 +252,3 @@ spec:
- key: odh-ca-bundle.crt
path: odh-ca-bundle.crt
optional: true
- replicas: 1
generictemplate:
apiVersion: v1
data:
ca.crt: generated_crt
ca.key: generated_key
kind: Secret
metadata:
name: ca-secret-deployment-name
labels:
# allows me to return name of service that Ray operator creates
odh-ray-cluster-service: deployment-name-head-svc
Loading