Feat/basic pipeline parallelism #422

insukim1994 · 2025-05-11T02:25:43Z

FILL IN THE PR DESCRIPTION HERE

FIX #101 (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

Make sure the code changes pass the pre-commit checks.
Sign-off your commit by using -s when doing git commit
Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].

Detailed Checklist (Click to Expand)

Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
[Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.

insukim1994 · 2025-05-11T17:23:07Z

How it works

The KubeRay operator is used to deploy a Ray cluster utilizing a VLLM image:
- The Ray cluster consists of a head node and multiple worker nodes.
- To enable VLLM with pipeline parallelism, the vllm serve ... --distributed-executor-backend ray command must be executed from the head node (or at one of worker nodes).
- A custom Helm chart template was developed to provision the Ray cluster.
- Minor modifications were made to existing templates to separate resource creation processes.
Additional Implementation Details:
- Although the Ray head and worker nodes eventually become ready, the Ray cluster does not provide a built-in mechanism to determine when all nodes are fully initialized.
  - A Python script was implemented to verify the readiness of all Ray nodes.
  - This script is executed via a startupProbe on the head node to detect when the cluster is fully ready.
- The vllm serve ... --distributed-executor-backend ray command must be executed only after the entire Ray cluster is confirmed to be ready.
- To achieve this, a background shell script is used to periodically check the cluster’s readiness using the aforementioned Python script.
- This shell script also includes the necessary VLLM command and is integrated into the Helm chart.
- It runs the VLLM command when all ray nodes are ready.

insukim1994 · 2025-05-12T22:01:17Z

Example Snippet

servingEngineSpec:
  runtimeClassName: ""
  raySpec:
    headNode:
      requestCPU: 2
      requestMemory: "20Gi"
      requestGPU: 1
  modelSpec:
  - name: "distilgpt2"
    repository: "vllm/vllm-openai"
    tag: "latest"
    modelURL: "distilbert/distilgpt2"

    replicaCount: 1

    requestCPU: 2
    requestMemory: "20Gi"
    requestGPU: 1

    vllmConfig:
      tensorParallelSize: 1
      pipelineParallelSize: 2

    shmSize: "20Gi"

    hf_token: <YOUR HF TOKEN>

   kubectl exec -it vllm-distilgpt2-raycluster-head-xrcgw -- /bin/bash
   root@vllm-distilgpt2-raycluster-head-xrcgw:/vllm-workspace# nvidia-smi
   Mon May 12 14:51:41 2025
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
   | N/A   76C    P0             40W /   72W |   20129MiB /  23034MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |    0   N/A  N/A        13      C   /usr/bin/python3                                0MiB |
   +-----------------------------------------------------------------------------------------+

   ###########################################################################################

   kubectl exec -it vllm-distilgpt2-raycluster-ray-worker-92zrr -- /bin/bash
   root@vllm-distilgpt2-raycluster-ray-worker-92zrr:/vllm-workspace# nvidia-smi
   Mon May 12 14:51:44 2025
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA L4                      Off |   00000000:00:04.0 Off |                    0 |
   | N/A   71C    P0             39W /   72W |   20119MiB /  23034MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   |    0   N/A  N/A       273      C   ray::RayWorkerWrapper                           0MiB |
   +-----------------------------------------------------------------------------------------+

insukim1994 · 2025-05-12T22:09:06Z

Added tutorial documents for:
- Installing Kuberay operator on Kubernetes environment
- Deploying production stack with using Kuberay with pipeline parallel size of 2

utils/install-kuberay.sh

ahinsutime · 2025-05-16T17:36:54Z

I'm going to add tutorials for:

Setting Kubernetes cluster of 2 nodes

Also, I will test pipeline parallelism on the node to confirm multi node distributed inference.

ahinsutime · 2025-05-16T18:25:53Z

Confirmed working from Kubernetes cluster of 2 nodes with 2 gpus:

NAMESPACE          NAME                                                          READY   STATUS      RESTARTS      AGE    IP                NODE                       NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-cccf4bb9f-8lbc7                              1/1     Running     0             55m    192.168.190.7     instance-20250503-060921   <none>           <none>
calico-apiserver   calico-apiserver-cccf4bb9f-knn9c                              1/1     Running     0             55m    192.168.190.4     instance-20250503-060921   <none>           <none>
calico-system      calico-kube-controllers-56dfdbb787-c24gd                      1/1     Running     0             55m    192.168.190.2     instance-20250503-060921   <none>           <none>
calico-system      calico-node-dtbcq                                             1/1     Running     0             55m    10.128.0.37       instance-20250503-060921   <none>           <none>
calico-system      calico-node-vrg6s                                             1/1     Running     0             55m    10.128.15.228     insudevmachine             <none>           <none>
calico-system      calico-typha-b7d75bc58-kfr7j                                  1/1     Running     0             55m    10.128.15.228     insudevmachine             <none>           <none>
calico-system      csi-node-driver-bb7dl                                         2/2     Running     0             55m    192.168.190.1     instance-20250503-060921   <none>           <none>
calico-system      csi-node-driver-g6hmt                                         2/2     Running     0             55m    192.168.165.193   insudevmachine             <none>           <none>
calico-system      goldmane-7b5b4cd5d9-6bk5p                                     1/1     Running     0             55m    192.168.190.6     instance-20250503-060921   <none>           <none>
calico-system      whisker-5dbf545674-hnkpz                                      2/2     Running     0             55m    192.168.190.8     instance-20250503-060921   <none>           <none>
default            kuberay-operator-f89ddb644-858bw                              1/1     Running     0             14m    192.168.165.203   insudevmachine             <none>           <none>
default            vllm-deployment-router-8666bf6464-v97v8                       1/1     Running     0             6m7s   192.168.165.206   insudevmachine             <none>           <none>
default            vllm-distilgpt2-raycluster-head-wvqj5                         1/1     Running     0             6m7s   192.168.190.20    instance-20250503-060921   <none>           <none>
default            vllm-distilgpt2-raycluster-ray-worker-fdvnh                   1/1     Running     0             6m7s   192.168.165.207   insudevmachine             <none>           <none>
gpu-operator       gpu-feature-discovery-psvdk                                   1/1     Running     0             12m    192.168.190.17    instance-20250503-060921   <none>           <none>
gpu-operator       gpu-feature-discovery-wpv52                                   1/1     Running     0             53m    192.168.165.201   insudevmachine             <none>           <none>
gpu-operator       gpu-operator-6c8c8bb855-xw5h7                                 1/1     Running     0             54m    192.168.190.11    instance-20250503-060921   <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-gc-7f6fbc9775-6s7fm       1/1     Running     0             54m    192.168.165.194   insudevmachine             <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-master-6ccd579c8c-lt86f   1/1     Running     0             54m    192.168.190.10    instance-20250503-060921   <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-worker-7p2x6              1/1     Running     0             54m    192.168.190.9     instance-20250503-060921   <none>           <none>
gpu-operator       gpu-operator-node-feature-discovery-worker-x84mm              1/1     Running     0             54m    192.168.165.195   insudevmachine             <none>           <none>
gpu-operator       nvidia-container-toolkit-daemonset-7fwnx                      1/1     Running     0             12m    192.168.190.15    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-container-toolkit-daemonset-mxnxd                      1/1     Running     0             53m    192.168.165.197   insudevmachine             <none>           <none>
gpu-operator       nvidia-cuda-validator-dckfh                                   0/1     Completed   0             12m    192.168.190.18    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-cuda-validator-fv2vr                                   0/1     Completed   0             53m    192.168.165.202   insudevmachine             <none>           <none>
gpu-operator       nvidia-dcgm-exporter-2srrd                                    1/1     Running     0             53m    192.168.165.200   insudevmachine             <none>           <none>
gpu-operator       nvidia-dcgm-exporter-2txh5                                    1/1     Running     0             12m    192.168.190.13    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-device-plugin-daemonset-575nq                          1/1     Running     0             53m    192.168.165.199   insudevmachine             <none>           <none>
gpu-operator       nvidia-device-plugin-daemonset-f2lqw                          1/1     Running     1 (12m ago)   12m    192.168.190.16    instance-20250503-060921   <none>           <none>
gpu-operator       nvidia-operator-validator-dthhx                               1/1     Running     0             53m    192.168.165.198   insudevmachine             <none>           <none>
gpu-operator       nvidia-operator-validator-kcpsf                               1/1     Running     0             12m    192.168.190.12    instance-20250503-060921   <none>           <none>
kube-system        coredns-668d6bf9bc-5hvx7                                      1/1     Running     0             59m    192.168.190.3     instance-20250503-060921   <none>           <none>
kube-system        coredns-668d6bf9bc-wb7qq                                      1/1     Running     0             59m    192.168.190.5     instance-20250503-060921   <none>           <none>
kube-system        etcd-instance-20250503-060921                                 1/1     Running     3             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-apiserver-instance-20250503-060921                       1/1     Running     2             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-controller-manager-instance-20250503-060921              1/1     Running     1             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-proxy-bk7sk                                              1/1     Running     0             59m    10.128.0.37       instance-20250503-060921   <none>           <none>
kube-system        kube-proxy-nm8xn                                              1/1     Running     0             58m    10.128.15.228     insudevmachine             <none>           <none>
kube-system        kube-scheduler-instance-20250503-060921                       1/1     Running     3             60m    10.128.0.37       instance-20250503-060921   <none>           <none>
tigera-operator    tigera-operator-844669ff44-5775m                              1/1     Running     0             56m    10.128.15.228     insudevmachine             <none>           <none>

root@vllm-distilgpt2-raycluster-head-wvqj5:/vllm-workspace# nvidia-smi
Fri May 16 11:22:07 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   63C    P0             31W /   72W |   20313MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      Off |   00000000:00:04.0 Off |                    0 |
| N/A   61C    P0             31W /   72W |   20305MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         8      C   /usr/bin/python3                                0MiB |
|    1   N/A  N/A      1082      C   ray::RayWorkerWrapper                           0MiB |
+-----------------------------------------------------------------------------------------+

root@vllm-distilgpt2-raycluster-ray-worker-fdvnh:/vllm-workspace# nvidia-smi
Fri May 16 11:21:48 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   69C    P0             37W /   72W |   20065MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      Off |   00000000:00:04.0 Off |                    0 |
| N/A   66C    P0             37W /   72W |   20063MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       243      C   ray::RayWorkerWrapper                           0MiB |
|    1   N/A  N/A       244      C   ray::RayWorkerWrapper                           0MiB |
+-----------------------------------------------------------------------------------------+

kubectl port-forward svc/vllm-router-service 30080:80
curl -X POST http://localhost:30080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "distilbert/distilgpt2",
    "prompt": "Once upon a time,",
    "max_tokens": 10
  }'

{
  "id": "cmpl-3346a16163fb48b7ada5e7663d27cdf8",
  "object": "text_completion",
  "created": 1747419610,
  "model": "distilbert/distilgpt2",
  "choices": [
    {
      "index": 0,
      "text": " when the education of our members at Hogwarts University was",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 15,
    "completion_tokens": 10,
    "prompt_tokens_details": null
  }
}

ahinsutime · 2025-05-16T18:28:01Z

I will add some modification for pipeline parallelism document.
I will add Kubernetes installation tutorial (2 nodes) document.
I will double check my helm chart for possible typo or misconfiguration.

ahinsutime · 2025-05-17T07:38:08Z

I will add Kubernetes installation tutorial (2 nodes) document.

Documentation is still in progress.

ahinsutime · 2025-05-17T14:52:25Z

Initial documentation complete.
What's left are:

Applying review comments
Elaborate guide and check for typo or any misleading contents
Double checking helm chart

ahinsutime · 2025-05-18T08:19:27Z

@YuhanLiu11 @haitwang-cloud Thanks for your comments and suggestions!
I just finished the initial implementation and documentation to add pipeline parallelism functionality for production stack.

This PR contains multiple new files and changes such as:

New helm chart (to include ray-cluster.yaml) and corresponding values file,
Updated pre-existing helm chart (to branch ordinary deployment and ray cluster),
Scripts and tutorials (install multi-node k8s cluster, installing container runtime, installing container network interface, pipeline parallelism tutorial)

It took some time for me to test and include tutorial documents (especially initializing K8s cluster with multi nodes as well as installing container runtime and container network interface).

YuhanLiu11 · 2025-05-20T02:12:03Z

@YuhanLiu11 @haitwang-cloud Thanks for your comments and suggestions! I just finished the initial implementation and documentation to add pipeline parallelism functionality for production stack.

This PR contains multiple new files and changes such as:

New helm chart (to include ray-cluster.yaml) and corresponding values file,

Updated pre-existing helm chart (to branch ordinary deployment and ray cluster),

Scripts and tutorials (install multi-node k8s cluster, installing container runtime, installing container network interface, pipeline parallelism tutorial)

It took some time for me to test and include tutorial documents (especially initializing K8s cluster with multi nodes as well as installing container runtime and container network interface).

That's awesome! I'll review it.

utils/init-nvidia-gpu-setup-k8s.sh

haitwang-cloud · 2025-05-20T06:28:43Z

@insukim1994 LGTM with a few nice to have comments

YuhanLiu11 · 2025-05-23T04:56:49Z

tutorials/00-b-install-kuberay-operator.md

+   - Basic understanding of Linux shell commands.
+
+4. **Kubernetes Installation:**
+   - Follow the instructions in [`00-install-kubernetes-env.md`](00-install-kubernetes-env.md) to set up your Kubernetes environment.


Do we need to follow tutorials/00-a-install-mulitnode-kubernetes-env.md to install the multi-node k8s cluster before running this tutorial?

Oh you are right. I should fix it since what we need is a multi-node K8s cluster. I will also add a comment that installation might not be needed if someone already has it.

I added more explanations on K8s prerequisite for it. Thank you!

YuhanLiu11 · 2025-05-23T05:02:36Z

@insukim1994 This is awesome! Thanks for this awesome PR again. I only left one minor comment. After you fix that I will merge this PR. Thanks again!

Signed-off-by: insukim1994 <[email protected]>

…for the helm chart. Signed-off-by: insukim1994 <[email protected]>

…urce creation for kuberay. Signed-off-by: insukim1994 <[email protected]>

ahinsutime · 2025-05-23T08:52:54Z

Seems like I need to resolve my conflicts. I will do it and let you know once it is done!

Signed-off-by: insukim1994 <[email protected]>

…nd script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]>

Signed-off-by: insukim1994 <[email protected]>

…rol and worker). Signed-off-by: insukim1994 <[email protected]>

…w comment. Signed-off-by: insukim1994 <[email protected]>

…ith helm template command. Signed-off-by: insukim1994 <[email protected]>

…ith ray). Signed-off-by: insukim1994 <[email protected]>

Signed-off-by: insukim1994 <[email protected]>

… message of docker restart). Signed-off-by: insukim1994 <[email protected]>

…nstallation). Signed-off-by: insukim1994 <[email protected]>

ahinsutime · 2025-05-23T08:56:15Z

I fixed my conflicts, but uv.lock seems causing some problem:

uv run pre-commit run --all-files
  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of lmcache==0.2.11 and vllm-router[lmcache] depends on lmcache==0.2.11, we can conclude that vllm-router[lmcache]'s requirements are unsatisfiable.
      And because your project requires vllm-router[lmcache], we can conclude that your project's requirements are unsatisfiable.

I'm looking into it.

It seems like a typo (lmcache==0.2.11 does not exist. Instead, 0.2.1 exists):

ahinsutime · 2025-05-23T09:05:50Z

Seems like a typo exists on main branch. I will fix it and include it on my PR:

git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 64559c0..e185277 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,7 +39,7 @@ semantic_cache = [
     "huggingface-hub==0.25.2",  # downgrade to 0.25.2 to avoid breaking changes
 ]
 lmcache = [
-    "lmcache==0.2.11",
+    "lmcache==0.2.1",
 ]
 
 [tool.pytest.ini_options]

Signed-off-by: insukim1994 <[email protected]>

YuhanLiu11 · 2025-05-26T04:28:29Z

Seems like a typo exists on main branch. I will fix it and include it on my PR:

git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 64559c0..e185277 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,7 +39,7 @@ semantic_cache = [
     "huggingface-hub==0.25.2",  # downgrade to 0.25.2 to avoid breaking changes
 ]
 lmcache = [
-    "lmcache==0.2.11",
+    "lmcache==0.2.1",
 ]
 
 [tool.pytest.ini_options]

hey @insukim1994 sorry just saw this message. Would be great if you can fix this in your PR too. Thanks!

YuhanLiu11

LGTM! Thanks for the awesome PR!

jcrock7 · 2025-05-28T20:46:01Z

Great work! Does the main Chart.yaml need to be updated to run this? I've followed the instructions and pods create, but it creates in the standard production stack manner. It doesn't create according to the new RayCluster template.

values.yaml

servingEngineSpec:
  enabled: true
  runtimeClassName: ""
  raySpec:
    headNode:
      requestCPU: 2
      requestMemory: "20Gi"
      requestGPU: 2
  modelSpec:
    - name: "distilgpt2"
      repository: "vllm/vllm-openai"
      tag: "latest"
      modelURL: "distilbert/distilgpt2"

      replicaCount: 1

      requestCPU: 2
      requestMemory: "20Gi"
      requestGPU: 1

      vllmConfig:
        tensorParallelSize: 1
        pipelineParallelSize: 2

      shmSize: "20Gi"

kubectl get pods

NAME                                               READY   STATUS    RESTARTS   AGE
kuberay-operator-d474d489f-57znb                   1/1     Running   0          11h
vllm-deployment-router-7bbd9bf65f-rsg4b            1/1     Running   0          138m
vllm-distilgpt2-deployment-vllm-7dcb56c6fc-86tml   1/1     Running   0          75m

ahinsutime · 2025-05-29T00:54:37Z

@jcrock7 Thank you for letting me know the possible issue. I will check it and will leave a comment here thanks!

ahinsutime · 2025-05-29T01:04:04Z

@jcrock7 Thank you. I've identified the issue. Seems like vllm helm repo is not synced with it.
I will check it and will sync it if it is not updated.
Before the repo to be synced, you can run tutorial with following command:
helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml

ahinsutime · 2025-05-29T01:11:57Z

@jcrock7 I should have updated helm chart version, packaged it and be uploaded at repo. I will create a separate issue for it and will solve it at the corresponding PR. Thanks!

jcrock7 · 2025-05-29T01:17:16Z

@jcrock7 Thank you. I've identified the issue. Seems like vllm helm repo is not synced with it. I will check it and will sync it if it is not updated. Before the repo to be synced, you can run tutorial with following command: helm install vllm ./helm -f tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml

This worked. Thanks again for your work on this - it really extends the capability for multi-node clusters!

jcrock7 · 2025-05-29T01:34:23Z

Just noticed that one of the pods fails to start due to a multi-attach error on the pvc. I believe the pvc.yaml template needs to be updated to ReadWriteMany. I will create separate PR unless you want to include in yours.

insukim1994 · 2025-05-29T01:38:49Z

@jcrock7 Thank you! You are right that pvc with RWO option cannot handle cases when it is shared between pods. Yes it will be very nice if you create a PR for it!

jcrock7 · 2025-05-29T03:17:27Z

Never mind, I confirmed the existing pvc.yaml works. I had a small typo in my values.yaml file.

* [Feat] Added kuberay installation script via helm. Initial commit. Signed-off-by: insukim1994 <[email protected]> * Added initial helm chart template file for ray cluster creation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added startup probe to check if all ray cluster nodes are up. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added env templates for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added volumemounts template for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Adeed templates for resource, probe, port and etc. Signed-off-by: insukim1994 <[email protected]> * [Feat] Initial working example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added documentation to run vllm with kuberay for pipeline parallelism. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated tutorial documentation. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed a wording in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Removed unused value from helm chart default value. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added missing dashboard related setting and a step for reinstalling ray. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed initContainer section that will be overwritted by kuberay operator. Signed-off-by: insukim1994 <[email protected]> * [Feat] Kuberay operator version updated needed. Signed-off-by: insukim1994 <[email protected]> * [Doc] Minor fix in tutorial. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added sample gpu usage example for each ray head and worker node. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in basic pipeline parallel tutorial doc. Signed-off-by: insukim1994 <[email protected]> * [Chore] Reverted unnecessary change. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay install util script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added utility script to install kubeadm. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added cri-o container runtime installation script & a script to create a control plane node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added nvidia gpu setup script for each node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Script modification during testing. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated k8s controlplane initialization and worker node join script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated K8s cluster initialization guide and applied a review comment. Signed-off-by: insukim1994 <[email protected]> * [Chore] Strict total number of ray node checking. Tested helm chart with helm template command. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated important note when applying pipeline parallelism (with ray). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart). Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (elaborated prerequisites for kuberay operator installation). Signed-off-by: insukim1994 <[email protected]> * [Bugfix] Fixed version typo of lmcache from toml file. Signed-off-by: insukim1994 <[email protected]> --------- Signed-off-by: insukim1994 <[email protected]>

* [Feat] Added kuberay installation script via helm. Initial commit. Signed-off-by: insukim1994 <[email protected]> * Added initial helm chart template file for ray cluster creation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added startup probe to check if all ray cluster nodes are up. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added env templates for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added volumemounts template for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Adeed templates for resource, probe, port and etc. Signed-off-by: insukim1994 <[email protected]> * [Feat] Initial working example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added documentation to run vllm with kuberay for pipeline parallelism. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated tutorial documentation. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed a wording in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Removed unused value from helm chart default value. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added missing dashboard related setting and a step for reinstalling ray. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed initContainer section that will be overwritted by kuberay operator. Signed-off-by: insukim1994 <[email protected]> * [Feat] Kuberay operator version updated needed. Signed-off-by: insukim1994 <[email protected]> * [Doc] Minor fix in tutorial. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added sample gpu usage example for each ray head and worker node. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in basic pipeline parallel tutorial doc. Signed-off-by: insukim1994 <[email protected]> * [Chore] Reverted unnecessary change. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay install util script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added utility script to install kubeadm. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added cri-o container runtime installation script & a script to create a control plane node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added nvidia gpu setup script for each node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Script modification during testing. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated k8s controlplane initialization and worker node join script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated K8s cluster initialization guide and applied a review comment. Signed-off-by: insukim1994 <[email protected]> * [Chore] Strict total number of ray node checking. Tested helm chart with helm template command. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated important note when applying pipeline parallelism (with ray). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart). Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (elaborated prerequisites for kuberay operator installation). Signed-off-by: insukim1994 <[email protected]> * [Bugfix] Fixed version typo of lmcache from toml file. Signed-off-by: insukim1994 <[email protected]> --------- Signed-off-by: insukim1994 <[email protected]> Signed-off-by: allytotheson <[email protected]>

insukim1994 marked this pull request as ready for review May 11, 2025 16:15

insukim1994 marked this pull request as draft May 12, 2025 19:52

insukim1994 marked this pull request as ready for review May 12, 2025 21:49

insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from c0c7607 to a671425 Compare May 12, 2025 22:11

haitwang-cloud reviewed May 14, 2025

View reviewed changes

utils/install-kuberay.sh Show resolved Hide resolved

insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from c40bbfc to af93456 Compare May 17, 2025 14:49

insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from aa112fe to 29dcb6d Compare May 19, 2025 13:37

haitwang-cloud reviewed May 20, 2025

View reviewed changes

utils/init-nvidia-gpu-setup-k8s.sh Show resolved Hide resolved

haitwang-cloud reviewed May 20, 2025

View reviewed changes

utils/init-nvidia-gpu-setup-k8s.sh Show resolved Hide resolved

YuhanLiu11 mentioned this pull request May 21, 2025

[Roadmap] vLLM Production Stack roadmap for 2025 Q2 #300

Open

31 tasks

YuhanLiu11 reviewed May 23, 2025

View reviewed changes

insukim1994 added 4 commits May 23, 2025 08:52

[Feat] Added kuberay installation script via helm. Initial commit.

76bf522

Signed-off-by: insukim1994 <[email protected]>

Added initial helm chart template file for ray cluster creation.

3a0a1ce

Signed-off-by: insukim1994 <[email protected]>

[Feat] Fixed typo at ray cluster template file. Added example values …

2b2cdfa

…for the helm chart. Signed-off-by: insukim1994 <[email protected]>

[Feat] Removed unused fields at the moment. Bugfixed conflicting reso…

34aa920

…urce creation for kuberay. Signed-off-by: insukim1994 <[email protected]>

insukim1994 added 2 commits May 23, 2025 08:54

[Feat] Added startup probe to check if all ray cluster nodes are up.

720cebf

Signed-off-by: insukim1994 <[email protected]>

[Feat] Added vllm command composing and execute logic in the backgrou…

c1fa817

…nd script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]>

insukim1994 added 8 commits May 23, 2025 08:55

[Doc] Elaborated basic pipeline parallelism tutorial document.

29fde46

Signed-off-by: insukim1994 <[email protected]>

[Doc] Added guide for settig up kubernetes cluster with 2 nodes (cont…

4516b63

…rol and worker). Signed-off-by: insukim1994 <[email protected]>

[Doc] Elaborated K8s cluster initialization guide and applied a revie…

6d0a8ec

…w comment. Signed-off-by: insukim1994 <[email protected]>

[Chore] Strict total number of ray node checking. Tested helm chart w…

57d3aad

…ith helm template command. Signed-off-by: insukim1994 <[email protected]>

[Doc] Elaborated important note when applying pipeline parallelism (w…

85fc5ce

…ith ray). Signed-off-by: insukim1994 <[email protected]>

[Doc] Elaborated basic pipeline parallelism tutorial example.

9c5d2f8

Signed-off-by: insukim1994 <[email protected]>

[Doc] Review updates (prevent duplicated line appends & added warning…

a657a12

… message of docker restart). Signed-off-by: insukim1994 <[email protected]>

[Doc] Review updates (elaborated prerequisites for kuberay operator i…

3c7810f

…nstallation). Signed-off-by: insukim1994 <[email protected]>

insukim1994 force-pushed the feat/basic-pipeline-parallelism branch from ccd0e60 to 3c7810f Compare May 23, 2025 08:55

[Bugfix] Fixed version typo of lmcache from toml file.

90f3298

Signed-off-by: insukim1994 <[email protected]>

YuhanLiu11 approved these changes May 26, 2025

View reviewed changes

YuhanLiu11 merged commit dca3133 into vllm-project:main May 26, 2025
9 checks passed

This was referenced May 29, 2025

bug: Helm chart version updated required to reflect its changes #467

Closed

Discussion: Pipeline parallelism support #101

Closed

bug: Unable to deploy ray cluster and other deployments at the same time #482

Closed

YuhanLiu11 mentioned this pull request Jun 12, 2025

feature: Need Multi-Node Multi-GPU to deploy one LLM with 671B DeepSeek using vllm and K8S #332

Closed

Feat/basic pipeline parallelism #422

Feat/basic pipeline parallelism #422

Uh oh!

Conversation

insukim1994 commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

DCO and Signed-off-by

What to Expect for the Reviews

Uh oh!

insukim1994 commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Uh oh!

insukim1994 commented May 12, 2025

Example Snippet

Uh oh!

insukim1994 commented May 12, 2025

Uh oh!

Uh oh!

ahinsutime commented May 16, 2025

Uh oh!

ahinsutime commented May 16, 2025

Uh oh!

ahinsutime commented May 16, 2025

Uh oh!

ahinsutime commented May 17, 2025

Uh oh!

ahinsutime commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahinsutime commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuhanLiu11 commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

haitwang-cloud commented May 20, 2025

Uh oh!

YuhanLiu11 May 23, 2025

Choose a reason for hiding this comment

Uh oh!

ahinsutime May 23, 2025

Choose a reason for hiding this comment

Uh oh!

ahinsutime May 23, 2025

Choose a reason for hiding this comment

Uh oh!

YuhanLiu11 commented May 23, 2025

Uh oh!

ahinsutime commented May 23, 2025

Uh oh!

ahinsutime commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahinsutime commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuhanLiu11 commented May 26, 2025

Uh oh!

YuhanLiu11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcrock7 commented May 28, 2025

Uh oh!

ahinsutime commented May 29, 2025

Uh oh!

ahinsutime commented May 29, 2025

Uh oh!

ahinsutime commented May 29, 2025

Uh oh!

jcrock7 commented May 29, 2025

Uh oh!

jcrock7 commented May 29, 2025

Uh oh!

insukim1994 commented May 29, 2025

Uh oh!

insukim1994 commented May 11, 2025 •

edited

Loading

insukim1994 commented May 11, 2025 •

edited

Loading

ahinsutime commented May 17, 2025 •

edited

Loading

ahinsutime commented May 18, 2025 •

edited

Loading

ahinsutime commented May 23, 2025 •

edited

Loading

ahinsutime commented May 23, 2025 •

edited

Loading