-
Notifications
You must be signed in to change notification settings - Fork 309
Feat/basic pipeline parallelism #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/basic pipeline parallelism #422
Conversation
How it works
|
Example SnippetservingEngineSpec:
runtimeClassName: ""
raySpec:
headNode:
requestCPU: 2
requestMemory: "20Gi"
requestGPU: 1
modelSpec:
- name: "distilgpt2"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "distilbert/distilgpt2"
replicaCount: 1
requestCPU: 2
requestMemory: "20Gi"
requestGPU: 1
vllmConfig:
tensorParallelSize: 1
pipelineParallelSize: 2
shmSize: "20Gi"
hf_token: <YOUR HF TOKEN>
|
|
c0c7607
to
a671425
Compare
I'm going to add tutorials for:
Also, I will test pipeline parallelism on the node to confirm multi node distributed inference. |
Confirmed working from Kubernetes cluster of 2 nodes with 2 gpus:
|
|
Documentation is still in progress. |
c40bbfc
to
af93456
Compare
Initial documentation complete.
|
@YuhanLiu11 @haitwang-cloud Thanks for your comments and suggestions! This PR contains multiple new files and changes such as:
It took some time for me to test and include tutorial documents (especially initializing K8s cluster with multi nodes as well as installing container runtime and container network interface). |
aa112fe
to
29dcb6d
Compare
That's awesome! I'll review it. |
@insukim1994 LGTM with a few nice to have comments |
- Basic understanding of Linux shell commands. | ||
|
||
4. **Kubernetes Installation:** | ||
- Follow the instructions in [`00-install-kubernetes-env.md`](00-install-kubernetes-env.md) to set up your Kubernetes environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to follow tutorials/00-a-install-mulitnode-kubernetes-env.md to install the multi-node k8s cluster before running this tutorial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are right. I should fix it since what we need is a multi-node K8s cluster. I will also add a comment that installation might not be needed if someone already has it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added more explanations on K8s prerequisite for it. Thank you!
@insukim1994 This is awesome! Thanks for this awesome PR again. I only left one minor comment. After you fix that I will merge this PR. Thanks again! |
Signed-off-by: insukim1994 <[email protected]>
Signed-off-by: insukim1994 <[email protected]>
…for the helm chart. Signed-off-by: insukim1994 <[email protected]>
…urce creation for kuberay. Signed-off-by: insukim1994 <[email protected]>
Seems like I need to resolve my conflicts. I will do it and let you know once it is done! |
Signed-off-by: insukim1994 <[email protected]>
…nd script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]>
Signed-off-by: insukim1994 <[email protected]>
…rol and worker). Signed-off-by: insukim1994 <[email protected]>
…w comment. Signed-off-by: insukim1994 <[email protected]>
…ith helm template command. Signed-off-by: insukim1994 <[email protected]>
…ith ray). Signed-off-by: insukim1994 <[email protected]>
Signed-off-by: insukim1994 <[email protected]>
… message of docker restart). Signed-off-by: insukim1994 <[email protected]>
…nstallation). Signed-off-by: insukim1994 <[email protected]>
ccd0e60
to
3c7810f
Compare
Seems like a typo exists on main branch. I will fix it and include it on my PR:
|
Signed-off-by: insukim1994 <[email protected]>
hey @insukim1994 sorry just saw this message. Would be great if you can fix this in your PR too. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the awesome PR!
Great work! Does the main Chart.yaml need to be updated to run this? I've followed the instructions and pods create, but it creates in the standard production stack manner. It doesn't create according to the new RayCluster template. values.yaml
kubectl get pods
|
@jcrock7 Thank you for letting me know the possible issue. I will check it and will leave a comment here thanks! |
@jcrock7 Thank you. I've identified the issue. Seems like vllm |
@jcrock7 I should have updated helm chart version, packaged it and be uploaded at repo. I will create a separate issue for it and will solve it at the corresponding PR. Thanks! |
This worked. Thanks again for your work on this - it really extends the capability for multi-node clusters! |
Just noticed that one of the pods fails to start due to a multi-attach error on the pvc. I believe the pvc.yaml template needs to be updated to ReadWriteMany. I will create separate PR unless you want to include in yours. |
@jcrock7 Thank you! You are right that pvc with RWO option cannot handle cases when it is shared between pods. Yes it will be very nice if you create a PR for it! |
Never mind, I confirmed the existing pvc.yaml works. I had a small typo in my values.yaml file. |
* [Feat] Added kuberay installation script via helm. Initial commit. Signed-off-by: insukim1994 <[email protected]> * Added initial helm chart template file for ray cluster creation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added startup probe to check if all ray cluster nodes are up. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added env templates for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added volumemounts template for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Adeed templates for resource, probe, port and etc. Signed-off-by: insukim1994 <[email protected]> * [Feat] Initial working example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added documentation to run vllm with kuberay for pipeline parallelism. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated tutorial documentation. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed a wording in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Removed unused value from helm chart default value. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added missing dashboard related setting and a step for reinstalling ray. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed initContainer section that will be overwritted by kuberay operator. Signed-off-by: insukim1994 <[email protected]> * [Feat] Kuberay operator version updated needed. Signed-off-by: insukim1994 <[email protected]> * [Doc] Minor fix in tutorial. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added sample gpu usage example for each ray head and worker node. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in basic pipeline parallel tutorial doc. Signed-off-by: insukim1994 <[email protected]> * [Chore] Reverted unnecessary change. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay install util script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added utility script to install kubeadm. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added cri-o container runtime installation script & a script to create a control plane node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added nvidia gpu setup script for each node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Script modification during testing. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated k8s controlplane initialization and worker node join script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated K8s cluster initialization guide and applied a review comment. Signed-off-by: insukim1994 <[email protected]> * [Chore] Strict total number of ray node checking. Tested helm chart with helm template command. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated important note when applying pipeline parallelism (with ray). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart). Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (elaborated prerequisites for kuberay operator installation). Signed-off-by: insukim1994 <[email protected]> * [Bugfix] Fixed version typo of lmcache from toml file. Signed-off-by: insukim1994 <[email protected]> --------- Signed-off-by: insukim1994 <[email protected]>
* [Feat] Added kuberay installation script via helm. Initial commit. Signed-off-by: insukim1994 <[email protected]> * Added initial helm chart template file for ray cluster creation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added startup probe to check if all ray cluster nodes are up. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added env templates for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added volumemounts template for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Adeed templates for resource, probe, port and etc. Signed-off-by: insukim1994 <[email protected]> * [Feat] Initial working example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added documentation to run vllm with kuberay for pipeline parallelism. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated tutorial documentation. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed a wording in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Removed unused value from helm chart default value. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added missing dashboard related setting and a step for reinstalling ray. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed initContainer section that will be overwritted by kuberay operator. Signed-off-by: insukim1994 <[email protected]> * [Feat] Kuberay operator version updated needed. Signed-off-by: insukim1994 <[email protected]> * [Doc] Minor fix in tutorial. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added sample gpu usage example for each ray head and worker node. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in basic pipeline parallel tutorial doc. Signed-off-by: insukim1994 <[email protected]> * [Chore] Reverted unnecessary change. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay install util script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added utility script to install kubeadm. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added cri-o container runtime installation script & a script to create a control plane node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added nvidia gpu setup script for each node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Script modification during testing. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated k8s controlplane initialization and worker node join script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated K8s cluster initialization guide and applied a review comment. Signed-off-by: insukim1994 <[email protected]> * [Chore] Strict total number of ray node checking. Tested helm chart with helm template command. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated important note when applying pipeline parallelism (with ray). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart). Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (elaborated prerequisites for kuberay operator installation). Signed-off-by: insukim1994 <[email protected]> * [Bugfix] Fixed version typo of lmcache from toml file. Signed-off-by: insukim1994 <[email protected]> --------- Signed-off-by: insukim1994 <[email protected]> Signed-off-by: allytotheson <[email protected]>
* [Feat] Added kuberay installation script via helm. Initial commit. Signed-off-by: insukim1994 <[email protected]> * Added initial helm chart template file for ray cluster creation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Fixed typo at ray cluster template file. Added example values for the helm chart. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed unused fields at the moment. Bugfixed conflicting resource creation for kuberay. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added startup probe to check if all ray cluster nodes are up. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added vllm command composing and execute logic in the background script due to kuberay operator args concatenation. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added pod relevant settings from servingEngineSpec for both head and worker grouops. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added env templates for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added volumemounts template for head and worker spec. Signed-off-by: insukim1994 <[email protected]> * [Feat] Adeed templates for resource, probe, port and etc. Signed-off-by: insukim1994 <[email protected]> * [Feat] Initial working example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added documentation to run vllm with kuberay for pipeline parallelism. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated tutorial documentation. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed a wording in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay operator installation tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Removed unused value from helm chart default value. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Chore] Elaborated expression on tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Feat] Set readiness httpGet probe for ray head node. Removed unused container ports from ray worker nodes. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray installation step. Signed-off-by: insukim1994 <[email protected]> * [Feat] Added missing dashboard related setting and a step for reinstalling ray. Signed-off-by: insukim1994 <[email protected]> * [Feat] Removed initContainer section that will be overwritted by kuberay operator. Signed-off-by: insukim1994 <[email protected]> * [Feat] Kuberay operator version updated needed. Signed-off-by: insukim1994 <[email protected]> * [Doc] Minor fix in tutorial. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added sample gpu usage example for each ray head and worker node. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in basic pipeline parallel tutorial doc. Signed-off-by: insukim1994 <[email protected]> * [Chore] Reverted unnecessary change. Signed-off-by: insukim1994 <[email protected]> * [Chore] Fixed typo in kuberay install util script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added utility script to install kubeadm. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added cri-o container runtime installation script & a script to create a control plane node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added script to join worker nodes. Elaborated control plane init script and cni installation. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added nvidia gpu setup script for each node. Signed-off-by: insukim1994 <[email protected]> * [Doc] Script modification during testing. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated k8s controlplane initialization and worker node join script. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial document. Signed-off-by: insukim1994 <[email protected]> * [Doc] Added guide for settig up kubernetes cluster with 2 nodes (control and worker). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated K8s cluster initialization guide and applied a review comment. Signed-off-by: insukim1994 <[email protected]> * [Chore] Strict total number of ray node checking. Tested helm chart with helm template command. Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated important note when applying pipeline parallelism (with ray). Signed-off-by: insukim1994 <[email protected]> * [Doc] Elaborated basic pipeline parallelism tutorial example. Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (prevent duplicated line appends & added warning message of docker restart). Signed-off-by: insukim1994 <[email protected]> * [Doc] Review updates (elaborated prerequisites for kuberay operator installation). Signed-off-by: insukim1994 <[email protected]> * [Bugfix] Fixed version typo of lmcache from toml file. Signed-off-by: insukim1994 <[email protected]> --------- Signed-off-by: insukim1994 <[email protected]> Signed-off-by: allytotheson <[email protected]>
FILL IN THE PR DESCRIPTION HERE
FIX #101 (link existing issues this PR will resolve)
BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
-s
when doinggit commit
[Bugfix]
,[Feat]
, and[CI]
.Detailed Checklist (Click to Expand)
Thank you for your contribution to production-stack! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Feat]
for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]
for changes to thevllm_router
(e.g., routing algorithm, router observability, etc.).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-commit
to format your code. SeeREADME.md
for installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:
header which certifies agreement with the terms of the DCO.Using
-s
withgit commit
will automatically add this header.What to Expect for the Reviews
We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of YuhanLiu11
, Shaoting-Feng or ApostaC.