Skip to content

Conversation

insukim1994
Copy link
Contributor

@insukim1994 insukim1994 commented May 11, 2025

FILL IN THE PR DESCRIPTION HERE

FIX #101 (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

@insukim1994 insukim1994 marked this pull request as ready for review May 11, 2025 16:15
@insukim1994
Copy link
Contributor Author

insukim1994 commented May 11, 2025

How it works

  • The KubeRay operator is used to deploy a Ray cluster utilizing a VLLM image:

    • The Ray cluster consists of a head node and multiple worker nodes.
    • To enable VLLM with pipeline parallelism, the vllm serve ... --distributed-executor-backend ray command must be executed from the head node (or at one of worker nodes).
    • A custom Helm chart template was developed to provision the Ray cluster.
    • Minor modifications were made to existing templates to separate resource creation processes.
  • Additional Implementation Details:

    • Although the Ray head and worker nodes eventually become ready, the Ray cluster does not provide a built-in mechanism to determine when all nodes are fully initialized.
      • A Python script was implemented to verify the readiness of all Ray nodes.
      • This script is executed via a startupProbe on the head node to detect when the cluster is fully ready.
    • The vllm serve ... --distributed-executor-backend ray command must be executed only after the entire Ray cluster is confirmed to be ready.
    • To achieve this, a background shell script is used to periodically check the cluster’s readiness using the aforementioned Python script.
    • This shell script also includes the necessary VLLM command and is integrated into the Helm chart.
    • It runs the VLLM command when all ray nodes are ready.

@insukim1994 insukim1994 marked this pull request as draft May 12, 2025 19:52
@insukim1994 insukim1994 marked this pull request as ready for review May 12, 2025 21:49
@insukim1994
Copy link
Contributor Author

Example Snippet

servingEngineSpec:
  runtimeClassName: ""
  raySpec:
    headNode:
      requestCPU: 2
      requestMemory: "20Gi"
      requestGPU: 1
  modelSpec:
  - name: "distilgpt2"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 2
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    hf_token: <YOUR HF TOKEN>
   kubectl exec -it vllm-distilgpt2-raycluster-head-xrcgw -- /bin/bash
   root@vllm-distilgpt2-raycluster-head-xrcgw:/vllm-workspace# nvidia-smi
   Mon May 12 14:51:41 2025
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
   | N/A   76C    P0             40W /   72W |   20129MiB /  23034MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |    0   N/A  N/A        13      C   /usr/bin/python3                                0MiB |
   +-----------------------------------------------------------------------------------------+

   ###########################################################################################

   kubectl exec -it vllm-distilgpt2-raycluster-ray-worker-92zrr -- /bin/bash
   root@vllm-distilgpt2-raycluster-ray-worker-92zrr:/vllm-workspace# nvidia-smi
   Mon May 12 14:51:44 2025
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA L4                      Off |   00000000:00:04.0 Off |                    0 |
   | N/A   71C    P0             39W /   72W |   20119MiB /  23034MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |    0   N/A  N/A       273      C   ray::RayWorkerWrapper                           0MiB |
   +-----------------------------------------------------------------------------------------+

@insukim1994
Copy link
Contributor Author

  • Added tutorial documents for:
    • Installing Kuberay operator on Kubernetes environment
    • Deploying production stack with using Kuberay with pipeline parallel size of 2

@insukim1994 insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from c0c7607 to a671425 Compare May 12, 2025 22:11
@ahinsutime
Copy link
Contributor

I'm going to add tutorials for:

  • Setting Kubernetes cluster of 2 nodes

Also, I will test pipeline parallelism on the node to confirm multi node distributed inference.

@ahinsutime
Copy link
Contributor

Confirmed working from Kubernetes cluster of 2 nodes with 2 gpus:

NAMESPACE          NAME                                                          READY   STATUS      RESTARTS      AGE    IP                NODE                       NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-cccf4bb9f-8lbc7                              1/1     Running     0             55m    192.168.190.7     instance-20250503-060921   <none>           <none>
calico-apiserver   calico-apiserver-cccf4bb9f-knn9c                              1/1     Running     0             55m    192.168.190.4     instance-20250503-060921   <none>           <none>
calico-system      calico-kube-controllers-56dfdbb787-c24gd                      1/1     Running     0             55m    192.168.190.2     instance-20250503-060921   <none>           <none>
calico-system      calico-node-dtbcq                                             1/1     Running     0             55m    10.128.0.37       instance-20250503-060921   <none>           <none>
calico-system      calico-node-vrg6s                                             1/1     Running     0             55m    10.128.15.228     insudevmachine             <none>           <none>
calico-system      calico-typha-b7d75bc58-kfr7j                                  1/1     Running     0             55m    10.128.15.228     insudevmachine             <none>           <none>
calico-system      csi-node-driver-bb7dl                                         2/2     Running     0             55m    192.168.190.1     instance-20250503-060921   <none>           <none>
calico-system      csi-node-driver-g6hmt                                         2/2     Running     0             55m    192.168.165.193   insudevmachine             <none>           <none>
calico-system      goldmane-7b5b4cd5d9-6bk5p                                     1/1     Running     0             55m    192.168.190.6     instance-20250503-060921   <none>           <none>
calico-system      whisker-5dbf545674-hnkpz                                      2/2     Running     0             55m    192.168.190.8     instance-20250503-060921   <none>           <none>
default            kuberay-operator-f89ddb644-858bw                              1/1     Running     0             14m    192.168.165.203   insudevmachine             <none>           <none>
default            vllm-deployment-router-8666bf6464-v97v8                       1/1     Running     0             6m7s   192.168.165.206   insudevmachine             <none>           <none>
default            vllm-distilgpt2-raycluster-head-wvqj5                         1/1     Running     0             6m7s   192.168.190.20    instance-20250503-060921   <none>           <none>
default            vllm-distilgpt2-raycluster-ray-worker-fdvnh                   1/1     Running     0             6m7s   192.168.165.207   insudevmachine             <none>           <none>
gpu-operator       gpu-feature-discovery-psvdk                                   1/1     Running     0             12m    192.168.190.17    instance-20250503-060921   <none>           <none>
gpu-operator       gpu-feature-discovery-wpv52                                   1/1     Running     0             53m    192.168.165.201   insudevmachine             <none>           <none>
gpu-operator       gpu-operator-6c8c8bb855-xw5h7                                 1/1     Running     0             54m    192.168.190.11    instance-20250503-060921   <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-gc-7f6fbc9775-6s7fm       1/1     Running     0             54m    192.168.165.194   insudevmachine             <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-master-6ccd579c8c-lt86f   1/1     Running     0             54m    192.168.190.10    instance-20250503-060921   <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-worker-7p2x6              1/1     Running     0             54m    192.168.190.9     instance-20250503-060921   <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-worker-x84mm              1/1     Running     0             54m    192.168.165.195   insudevmachine             <none>           <none>
gpu-operator       nvidia-container-toolkit-daemonset-7fwnx                      1/1     Running     0             12m    192.168.190.15    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-container-toolkit-daemonset-mxnxd                      1/1     Running     0             53m    192.168.165.197   insudevmachine             <none>           <none>
gpu-operator       nvidia-cuda-validator-dckfh                                   0/1     Completed   0             12m    192.168.190.18    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-cuda-validator-fv2vr                                   0/1     Completed   0             53m    192.168.165.202   insudevmachine             <none>           <none>
gpu-operator       nvidia-dcgm-exporter-2srrd                                    1/1     Running     0             53m    192.168.165.200   insudevmachine             <none>           <none>
gpu-operator       nvidia-dcgm-exporter-2txh5                                    1/1     Running     0             12m    192.168.190.13    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-device-plugin-daemonset-575nq                          1/1     Running     0             53m    192.168.165.199   insudevmachine             <none>           <none>
gpu-operator       nvidia-device-plugin-daemonset-f2lqw                          1/1     Running     1 (12m ago)   12m    192.168.190.16    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-operator-validator-dthhx                               1/1     Running     0             53m    192.168.165.198   insudevmachine             <none>           <none>
gpu-operator       nvidia-operator-validator-kcpsf                               1/1     Running     0             12m    192.168.190.12    instance-20250503-060921   <none>           <none>
kube-system        coredns-668d6bf9bc-5hvx7                                      1/1     Running     0             59m    192.168.190.3     instance-20250503-060921   <none>           <none>
kube-system        coredns-668d6bf9bc-wb7qq                                      1/1     Running     0             59m    192.168.190.5     instance-20250503-060921   <none>           <none>
kube-system        etcd-instance-20250503-060921                                 1/1     Running     3             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-apiserver-instance-20250503-060921                       1/1     Running     2             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-controller-manager-instance-20250503-060921              1/1     Running     1             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-proxy-bk7sk                                              1/1     Running     0             59m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-proxy-nm8xn                                              1/1     Running     0             58m    10.128.15.228     insudevmachine             <none>           <none>
kube-system        kube-scheduler-instance-20250503-060921                       1/1     Running     3             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
tigera-operator    tigera-operator-844669ff44-5775m                              1/1     Running     0             56m    10.128.15.228     insudevmachine             <none>           <none>
root@vllm-distilgpt2-raycluster-head-wvqj5:/vllm-workspace# nvidia-smi
Fri May 16 11:22:07 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   63C    P0             31W /   72W |   20313MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      Off |   00000000:00:04.0 Off |                    0 |
| N/A   61C    P0             31W /   72W |   20305MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         8      C   /usr/bin/python3                                0MiB |
|    1   N/A  N/A      1082      C   ray::RayWorkerWrapper                           0MiB |
+-----------------------------------------------------------------------------------------+
root@vllm-distilgpt2-raycluster-ray-worker-fdvnh:/vllm-workspace# nvidia-smi
Fri May 16 11:21:48 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   69C    P0             37W /   72W |   20065MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      Off |   00000000:00:04.0 Off |                    0 |
| N/A   66C    P0             37W /   72W |   20063MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       243      C   ray::RayWorkerWrapper                           0MiB |
|    1   N/A  N/A       244      C   ray::RayWorkerWrapper                           0MiB |
+-----------------------------------------------------------------------------------------+
kubectl port-forward svc/vllm-router-service 30080:80
curl -X POST http://localhost:30080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "distilbert/distilgpt2",
    "prompt": "Once upon a time,",
    "max_tokens": 10
  }'

{
  "id": "cmpl-3346a16163fb48b7ada5e7663d27cdf8",
  "object": "text_completion",
  "created": 1747419610,
  "model": "distilbert/distilgpt2",
  "choices": [
    {
      "index": 0,
      "text": " when the education of our members at Hogwarts University was",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 15,
    "completion_tokens": 10,
    "prompt_tokens_details": null
  }
}

@ahinsutime
Copy link
Contributor

  • I will add some modification for pipeline parallelism document.
  • I will add Kubernetes installation tutorial (2 nodes) document.
  • I will double check my helm chart for possible typo or misconfiguration.

@ahinsutime
Copy link
Contributor

I will add Kubernetes installation tutorial (2 nodes) document.

Documentation is still in progress.

@insukim1994 insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from c40bbfc to af93456 Compare May 17, 2025 14:49
@ahinsutime
Copy link
Contributor

ahinsutime commented May 17, 2025

Initial documentation complete.
What's left are:

  • Applying review comments
  • Elaborate guide and check for typo or any misleading contents
  • Double checking helm chart

@ahinsutime
Copy link
Contributor

ahinsutime commented May 18, 2025

@YuhanLiu11 @haitwang-cloud Thanks for your comments and suggestions!
I just finished the initial implementation and documentation to add pipeline parallelism functionality for production stack.

This PR contains multiple new files and changes such as:

  • New helm chart (to include ray-cluster.yaml) and corresponding values file,
  • Updated pre-existing helm chart (to branch ordinary deployment and ray cluster),
  • Scripts and tutorials (install multi-node k8s cluster, installing container runtime, installing container network interface, pipeline parallelism tutorial)

It took some time for me to test and include tutorial documents (especially initializing K8s cluster with multi nodes as well as installing container runtime and container network interface).

@insukim1994 insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from aa112fe to 29dcb6d Compare May 19, 2025 13:37
@YuhanLiu11
Copy link
Collaborator

@YuhanLiu11 @haitwang-cloud Thanks for your comments and suggestions! I just finished the initial implementation and documentation to add pipeline parallelism functionality for production stack.

This PR contains multiple new files and changes such as:

  • New helm chart (to include ray-cluster.yaml) and corresponding values file,
  • Updated pre-existing helm chart (to branch ordinary deployment and ray cluster),
  • Scripts and tutorials (install multi-node k8s cluster, installing container runtime, installing container network interface, pipeline parallelism tutorial)

It took some time for me to test and include tutorial documents (especially initializing K8s cluster with multi nodes as well as installing container runtime and container network interface).

That's awesome! I'll review it.

@haitwang-cloud
Copy link
Contributor

@insukim1994 LGTM with a few nice to have comments

- Basic understanding of Linux shell commands.

4. **Kubernetes Installation:**
- Follow the instructions in [`00-install-kubernetes-env.md`](00-install-kubernetes-env.md) to set up your Kubernetes environment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to follow tutorials/00-a-install-mulitnode-kubernetes-env.md to install the multi-node k8s cluster before running this tutorial?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are right. I should fix it since what we need is a multi-node K8s cluster. I will also add a comment that installation might not be needed if someone already has it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more explanations on K8s prerequisite for it. Thank you!

@YuhanLiu11
Copy link
Collaborator

@insukim1994 This is awesome! Thanks for this awesome PR again. I only left one minor comment. After you fix that I will merge this PR. Thanks again!

@ahinsutime
Copy link
Contributor

Seems like I need to resolve my conflicts. I will do it and let you know once it is done!

@insukim1994 insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from ccd0e60 to 3c7810f Compare May 23, 2025 08:55
@ahinsutime
Copy link
Contributor

ahinsutime commented May 23, 2025

I fixed my conflicts, but uv.lock seems causing some problem:

uv run pre-commit run --all-files
  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of lmcache==0.2.11 and vllm-router[lmcache] depends on lmcache==0.2.11, we can conclude that vllm-router[lmcache]'s requirements are unsatisfiable.
      And because your project requires vllm-router[lmcache], we can conclude that your project's requirements are unsatisfiable.

I'm looking into it.

It seems like a typo (lmcache==0.2.11 does not exist. Instead, 0.2.1 exists):
image

@ahinsutime
Copy link
Contributor

ahinsutime commented May 23, 2025

Seems like a typo exists on main branch. I will fix it and include it on my PR:

git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 64559c0..e185277 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,7 +39,7 @@ semantic_cache = [
     "huggingface-hub==0.25.2",  # downgrade to 0.25.2 to avoid breaking changes
 ]
 lmcache = [
-    "lmcache==0.2.11",
+    "lmcache==0.2.1",
 ]
 
 [tool.pytest.ini_options]

@YuhanLiu11
Copy link
Collaborator

Seems like a typo exists on main branch. I will fix it and include it on my PR:

git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 64559c0..e185277 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,7 +39,7 @@ semantic_cache = [
     "huggingface-hub==0.25.2",  # downgrade to 0.25.2 to avoid breaking changes
 ]
 lmcache = [
-    "lmcache==0.2.11",
+    "lmcache==0.2.1",
 ]
 
 [tool.pytest.ini_options]

hey @insukim1994 sorry just saw this message. Would be great if you can fix this in your PR too. Thanks!

Copy link
Collaborator

@YuhanLiu11 YuhanLiu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the awesome PR!

@YuhanLiu11 YuhanLiu11 merged commit dca3133 into vllm-project:main May 26, 2025
9 checks passed
@jcrock7
Copy link

jcrock7 commented May 28, 2025

Great work! Does the main Chart.yaml need to be updated to run this? I've followed the instructions and pods create, but it creates in the standard production stack manner. It doesn't create according to the new RayCluster template.

values.yaml

servingEngineSpec:
  enabled: true
  runtimeClassName: ""
  raySpec:
    headNode:
      requestCPU: 2
      requestMemory: "20Gi"
      requestGPU: 2
  modelSpec:
    - name: "distilgpt2"
      repository: "vllm/vllm-openai"
      tag: "latest"
      modelURL: "distilbert/distilgpt2"

      replicaCount: 1

      requestCPU: 2
      requestMemory: "20Gi"
      requestGPU: 1

      vllmConfig:
        tensorParallelSize: 1
        pipelineParallelSize: 2

      shmSize: "20Gi"

kubectl get pods

NAME                                               READY   STATUS    RESTARTS   AGE
kuberay-operator-d474d489f-57znb                   1/1     Running   0          11h
vllm-deployment-router-7bbd9bf65f-rsg4b            1/1     Running   0          138m
vllm-distilgpt2-deployment-vllm-7dcb56c6fc-86tml   1/1     Running   0          75m

@ahinsutime
Copy link
Contributor

@jcrock7 Thank you for letting me know the possible issue. I will check it and will leave a comment here thanks!

@ahinsutime
Copy link
Contributor

@jcrock7 Thank you. I've identified the issue. Seems like vllm helm repo is not synced with it.
I will check it and will sync it if it is not updated.
Before the repo to be synced, you can run tutorial with following command:
helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml

@ahinsutime
Copy link
Contributor

@jcrock7 I should have updated helm chart version, packaged it and be uploaded at repo. I will create a separate issue for it and will solve it at the corresponding PR. Thanks!

@jcrock7
Copy link

jcrock7 commented May 29, 2025

@jcrock7 Thank you. I've identified the issue. Seems like vllm helm repo is not synced with it. I will check it and will sync it if it is not updated. Before the repo to be synced, you can run tutorial with following command: helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml

This worked. Thanks again for your work on this - it really extends the capability for multi-node clusters!

@jcrock7
Copy link

jcrock7 commented May 29, 2025

Just noticed that one of the pods fails to start due to a multi-attach error on the pvc. I believe the pvc.yaml template needs to be updated to ReadWriteMany. I will create separate PR unless you want to include in yours.

@insukim1994
Copy link
Contributor Author

@jcrock7 Thank you! You are right that pvc with RWO option cannot handle cases when it is shared between pods. Yes it will be very nice if you create a PR for it!

@jcrock7
Copy link

jcrock7 commented May 29, 2025

Never mind, I confirmed the existing pvc.yaml works. I had a small typo in my values.yaml file.

JustinDuy pushed a commit to JustinDuy/production-stack-1 that referenced this pull request Jun 13, 2025
* [Feat] Added kuberay installation script via helm. Initial commit.

Signed-off-by: insukim1994 <[email protected]>

* Added initial helm chart template file for ray cluster creation.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added startup probe to check if all ray cluster nodes are up.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added env templates for head and worker spec.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added volumemounts template for head and worker spec.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Adeed templates for resource, probe, port and etc.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Initial working example.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added documentation to run vllm with kuberay for pipeline parallelism.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated tutorial documentation.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed a wording in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Removed unused value from helm chart default value.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Elaborated expression on tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Elaborated expression on tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added missing dashboard related setting and a step for reinstalling ray.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Removed initContainer section that will be overwritted by kuberay operator.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Kuberay operator version updated needed.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Minor fix in tutorial.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added sample gpu usage example for each ray head and worker node.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in basic pipeline parallel tutorial doc.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Reverted unnecessary change.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay install util script.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added utility script to install kubeadm.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added cri-o container runtime installation script & a script to create a control plane node.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added nvidia gpu setup script for each node.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Script modification during testing.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated k8s controlplane initialization and worker node join script.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated basic pipeline parallelism tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated K8s cluster initialization guide and applied a review comment.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Strict total number of ray node checking. Tested helm chart with helm template command.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated important note when applying pipeline parallelism (with ray).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated basic pipeline parallelism tutorial example.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Review updates (elaborated prerequisites for kuberay operator installation).

Signed-off-by: insukim1994 <[email protected]>

* [Bugfix] Fixed version typo of lmcache from toml file.

Signed-off-by: insukim1994 <[email protected]>

---------

Signed-off-by: insukim1994 <[email protected]>
allytotheson pushed a commit to allytotheson/production-stack that referenced this pull request Jun 30, 2025
* [Feat] Added kuberay installation script via helm. Initial commit.

Signed-off-by: insukim1994 <[email protected]>

* Added initial helm chart template file for ray cluster creation.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added startup probe to check if all ray cluster nodes are up.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added env templates for head and worker spec.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added volumemounts template for head and worker spec.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Adeed templates for resource, probe, port and etc.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Initial working example.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added documentation to run vllm with kuberay for pipeline parallelism.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated tutorial documentation.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed a wording in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Removed unused value from helm chart default value.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Elaborated expression on tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Elaborated expression on tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added missing dashboard related setting and a step for reinstalling ray.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Removed initContainer section that will be overwritted by kuberay operator.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Kuberay operator version updated needed.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Minor fix in tutorial.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added sample gpu usage example for each ray head and worker node.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in basic pipeline parallel tutorial doc.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Reverted unnecessary change.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay install util script.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added utility script to install kubeadm.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added cri-o container runtime installation script & a script to create a control plane node.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added nvidia gpu setup script for each node.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Script modification during testing.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated k8s controlplane initialization and worker node join script.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated basic pipeline parallelism tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated K8s cluster initialization guide and applied a review comment.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Strict total number of ray node checking. Tested helm chart with helm template command.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated important note when applying pipeline parallelism (with ray).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated basic pipeline parallelism tutorial example.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Review updates (elaborated prerequisites for kuberay operator installation).

Signed-off-by: insukim1994 <[email protected]>

* [Bugfix] Fixed version typo of lmcache from toml file.

Signed-off-by: insukim1994 <[email protected]>

---------

Signed-off-by: insukim1994 <[email protected]>
Signed-off-by: allytotheson <[email protected]>
allytotheson pushed a commit to allytotheson/production-stack that referenced this pull request Jun 30, 2025
* [Feat] Added kuberay installation script via helm. Initial commit.

Signed-off-by: insukim1994 <[email protected]>

* Added initial helm chart template file for ray cluster creation.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added startup probe to check if all ray cluster nodes are up.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added env templates for head and worker spec.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added volumemounts template for head and worker spec.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Adeed templates for resource, probe, port and etc.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Initial working example.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added documentation to run vllm with kuberay for pipeline parallelism.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated tutorial documentation.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed a wording in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay operator installation tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Removed unused value from helm chart default value.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Elaborated expression on tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Elaborated expression on tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added missing dashboard related setting and a step for reinstalling ray.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Removed initContainer section that will be overwritted by kuberay operator.

Signed-off-by: insukim1994 <[email protected]>

* [Feat] Kuberay operator version updated needed.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Minor fix in tutorial.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added sample gpu usage example for each ray head and worker node.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in basic pipeline parallel tutorial doc.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Reverted unnecessary change.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Fixed typo in kuberay install util script.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added utility script to install kubeadm.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added cri-o container runtime installation script & a script to create a control plane node.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added nvidia gpu setup script for each node.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Script modification during testing.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated k8s controlplane initialization and worker node join script.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated basic pipeline parallelism tutorial document.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated K8s cluster initialization guide and applied a review comment.

Signed-off-by: insukim1994 <[email protected]>

* [Chore] Strict total number of ray node checking. Tested helm chart with helm template command.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated important note when applying pipeline parallelism (with ray).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Elaborated basic pipeline parallelism tutorial example.

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart).

Signed-off-by: insukim1994 <[email protected]>

* [Doc] Review updates (elaborated prerequisites for kuberay operator installation).

Signed-off-by: insukim1994 <[email protected]>

* [Bugfix] Fixed version typo of lmcache from toml file.

Signed-off-by: insukim1994 <[email protected]>

---------

Signed-off-by: insukim1994 <[email protected]>
Signed-off-by: allytotheson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discussion: Pipeline parallelism support

5 participants