-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Bugfix] Improve GPU validation logging in Ray fallback scenarios #25775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Improve GPU validation logging in Ray fallback scenarios #25775
Conversation
Adds early GPU count validation and clearer Ray placement error messages when tensor_parallel_size exceeds available GPUs to address poor logging and help users diagnose K8s deployment failures. Signed-off-by: Sairam Pillai <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
instead of improving the log, can we just not allow vllm to start? I don't quite understand why the behavior of falling back to ray is something that is needed |
|
I also do think that this silent fallback behavior is not only confusing but also pretty dangerous in the sense that the model server maintainer's small typo in deployment configuration results in complete failure after all the long delay. And as the distribution logic and the user-facing deployment workflow is quite different, users are already well-aware of what distribution backend they are intended to use I think. While it wouldn't be a BC change, I'd +1 on explicit declaration of distribution backend. (I'm not claiming that this would be considered in this PR; just a feel-ya on @robertgshaw2-redhat 's comment above.) |
|
I agree @robertgshaw2-redhat @cjackal, do you think we should close/merge this PR and then discuss with a wider forum to address the fallback scenario? Or should I go ahead and create a new PR for explicit declaration and early stopping? |
|
Hi @sairampillai, fantastic work tracking this bug down to the silent Ray fallback. Your detailed analysis in the PR description is a great example for the community. I've been following the conversation and strongly agree with @robertgshaw2-redhat's suggestion to 'fail fast' by raising an error instead of just issuing a warning. This would prevent confusing timeouts and make the behavior much more robust, especially for users in constrained environments like K8s. The implementation could be a direct change in vllm/config/parallel.py, something along these lines: # In vllm/config/parallel.py, inside __post_init__
if (current_platform.is_cuda()
and cuda_device_count_stateless() < self.world_size):
gpu_count = cuda_device_count_stateless()
raise ValueError(
f"Tensor parallel size ({self.world_size}) cannot be larger than "
f"the number of available GPUs ({gpu_count})."
)Let me know your thoughts. Happy to help in any way to get this important fix finalized and merged! |
|
Let's move forward with the fail fast approach |
|
@hmellor got it! I will implement the fix and push the changes |
Signed-off-by: Sairam Pillai <[email protected]>
|
@hmellor Updated to follow the discussion and fail fast |
|
This pull request has merge conflicts that must be resolved before it can be |
|
LGTM, please fix the conflicts and we should be able to merge |
Signed-off-by: Sairam Pillai <[email protected]>
|
@hmellor Fixed conflicts and ready to merge |
…lm-project#25775) Signed-off-by: Sairam Pillai <[email protected]>
…lm-project#25775) Signed-off-by: Sairam Pillai <[email protected]>
…lm-project#25775) Signed-off-by: Sairam Pillai <[email protected]>
…lm-project#25775) Signed-off-by: Sairam Pillai <[email protected]>
…lm-project#25775) Signed-off-by: Sairam Pillai <[email protected]> Signed-off-by: Eldar Kurtic <[email protected]>
…lm-project#25775) Signed-off-by: Sairam Pillai <[email protected]>
[Bugfix] Improve GPU validation logging in Ray fallback scenarios
Adds early GPU count validation and clearer Ray placement error messages when tensor_parallel_size exceeds available GPUs to address poor logging and help users diagnose K8s deployment failures.
Related Issues
Fixes #25263
Purpose
Fixes poor logging when tensor_parallel_size exceeds available GPUs in Ray fallback scenarios.
When
tensor_parallel_sizeis set higher than the available GPU count (e.g., tensor_parallel_size=4 with only 1 GPU), vLLM silently falls back to Ray executor without adequate warning. This causes confusing error messages in K8s deployments, where users see Ray placement group timeout errors without understanding the root cause.Changes Made
vllm/config/parallel.py: Added warning when tensor parallel size exceeds available GPUs during backend selectionvllm/executor/ray_utils.py: Improved error messages in_wait_until_pg_ready()andinitialize_ray_cluster()functions to provide context about GPU resource mismatchesFiles Modified
vllm/config/parallel.py- Added GPU count validation with clear warningsvllm/executor/ray_utils.py- Enhanced Ray placement group error handlingTest Plan
Scenario Testing
--tensor-parallel-size 4on a system with only 1 available GPUTest Commands
Functional Testing
Test Result
Before Fix
After Fix
Validation Results
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.