-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Build] Add OpenAI triton_kernels #28788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Build] Add OpenAI triton_kernels #28788
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the build infrastructure to fetch and install OpenAI Triton kernels as a third-party dependency for CUDA builds. The changes are similar to how other external projects are handled.
My review identified two critical issues that need to be addressed:
- In
cmake/external_projects/triton_kernels.cmake, an incorrect path with a trailing slash will cause the kernel files to be installed in the wrong directory, potentially overwriting other files. - In
setup.py, an unconditionalshutil.copytreecall will cause non-CUDA builds to fail because the source directory for Triton kernels will not exist.
Additionally, there appears to be a discrepancy in how the Triton kernels are being installed versus how they are imported in the codebase. The current setup installs them as vllm.third_party.triton_kernels, but existing code seems to expect a top-level triton_kernels package. This should be clarified and aligned to prevent import errors at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
5f68074 to
fa9ee65
Compare
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
|
@codex review |
|
cc @zyongye @robertgshaw2-redhat @simon-mo PTAL! Thanks 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
|
|
||
| @cache | ||
| def import_triton_kernels(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where we decide what triton_kernels module to use. Perhaps we can add a VLLM_FORCE_USE_LOCAL_TRITON_KERNELS to pick the triton_kernels from vllm.third_party.triton_kernels - but I don't see the need for it now.
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # TODO (varun) : Fetch just the triton_kernels directory from Triton | ||
| GIT_REPOSITORY https://github.com/triton-lang/triton.git | ||
| GIT_TAG ${DEFAULT_TRITON_KERNELS_TAG} | ||
| GIT_PROGRESS TRUE | ||
| SOURCE_SUBDIR python/triton_kernels/triton_kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FetchContent triton_kernels lacks CMakeLists guard
The new FetchContent declaration points CMake at python/triton_kernels/triton_kernels without overriding CONFIGURE_COMMAND/BUILD_COMMAND, so FetchContent_MakeAvailable will try to run add_subdirectory on that directory. The upstream triton_kernels package is pure Python and has no CMakeLists.txt in that path, so any CUDA/HIP build will fail during CMake configure before the Python files are installed, preventing triton_kernels from being packaged at all.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets wait for the CI. I had the configure and build commands set to an empty string, but cmake complained (warnings) and I removed it. I could build it locally also.
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
|
Based on a prior test run - https://buildkite.com/vllm/ci/builds/39432#019a94c7-7b73-420b-8d94-25da33d57f4f , I could see
|
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM reading through. Let's make sure we see CI tests dependent on the install running
I believe |
|
Failing tests :
I believe this is good to land - and we can verify the nightly wheels tomorrow. Thanks. |
|
Tested nightly on H100, with I could see vllm using the packaged the triton_kernels and the gpt-oss eval works fine. I installed triton_kernels using I could see vllm using the system |
Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Bhagyashri <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: jiang1.li <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Purpose
This PR adds https://github.com/triton-lang/triton/tree/main/python/triton_kernels to vLLM.
We can't install this package via pip. Please take a look at #27659 . As a result, this PR, injects the
triton_kernelspackage directly into vLLM during build time, similar to the approach we take withvllm_flash_attn. Concretely, we just copy the entiretriton_kernelsfolder in<triton-root>/python/triton_kernels/triton_kernelstovllm/third_party/triton_kernelsduring build time and add the module to thesys.module["triton_kernel"]during run-time.Fixes : #27672
Test Plan
local-build testing on H100 :
python3 setup.py build_ext --inplace/uv pip install -vvv -e . --no-build-isolationpackage-build testing on H100 :
TORCH_CUDA_ARCH_LIST="9.0" python3 setup.py bdist_wheel --dist-dir=distgpt-oss serve command:
vllm serve openai/gpt-oss-20b --tensor-parallel-size 2 --no-enable-prefix-caching --port 9010gpt-oss eval command :
OPENAI_API_KEY=empty python -m gpt_oss.evals --model openai/gpt-oss-20b --eval gpqa --n-threads 128 --reasoning-effort low --base-url http://localhost:9010/v1Test Result
local-build: The build correctly copiestriton_kernelsintovllm/third_party/triton_kernelspackage-build: The wheel when installed hastriton_kernelsin<site-packages>/vllm/third_party/triton_kernelsBoth
local_buildandpackage-buildobtains expected eval score of about0.57.Need to perform further testing with CI wheels.
Thanks @zyongye @daniel-fahey for the insights.