-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" #27714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request reverts a previous change that used a pre-built xformers wheel, opting instead to build xformers from source within the Docker image. This is done by adding a RUN command to the Dockerfile for compilation and removing the xformers dependency from requirements/cuda.txt. However, the implementation has a critical flaw: the list of CUDA architectures for the xformers build is too restrictive. My review provides a specific suggestion to expand this list to ensure broader GPU compatibility (notably for Volta GPUs like V100) and improve performance on modern GPUs by including them for ahead-of-time compilation.
| # TODO (huydhn): Remove this once xformers is released for 2.9.0 | ||
| RUN --mount=type=cache,target=/root/.cache/uv bash - <<'BASH' | ||
| . /etc/environment | ||
| export TORCH_CUDA_ARCH_LIST='7.5 8.0+PTX 9.0a' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded TORCH_CUDA_ARCH_LIST is overly restrictive and may cause compatibility and performance issues for users of the Docker image.
- Dropped Architectures: This list removes support for Volta (
7.0), which is used by V100 GPUs. This is a significant regression as V100s are still widely used in cloud environments and research. - Performance on Modern GPUs: It relies on just-in-time (JIT) compilation from PTX for modern architectures like Ada Lovelace (
8.9) and Hopper (9.0), as they are not explicitly listed. This can lead to significant startup delays when vLLM is first run on these GPUs. - Inconsistency: The default
torch_cuda_arch_listdefined as a build argument earlier in this Dockerfile (line 144) is much more comprehensive. While that argument is not available in this build stage, its value serves as a good reference for what architectures are generally supported.
To ensure broad compatibility and optimal performance, I recommend using a more inclusive list of architectures. This suggested list restores Volta support, provides ahead-of-time (AOT) compilation for common modern GPUs, and maintains forward compatibility for future architectures via PTX.
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0a+PTX'
…llm-project#27714) Signed-off-by: Bhagyashri <[email protected]>
…vllm-project#27714) This reverts commit 9007bf5. Signed-off-by: Huy Do <[email protected]>
Reverts #27598
Broke CUDA 13 build. https://buildkite.com/vllm/release/builds/9637/steps/canvas?sid=019a2dfc-911c-4783-b421-9d3acc153e1b