Skip to content

Conversation

@varun-sundar-rabindranath
Copy link
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath commented Nov 15, 2025

Purpose

This PR adds https://github.com/triton-lang/triton/tree/main/python/triton_kernels to vLLM.
We can't install this package via pip. Please take a look at #27659 . As a result, this PR, injects the triton_kernels package directly into vLLM during build time, similar to the approach we take with vllm_flash_attn. Concretely, we just copy the entire triton_kernels folder in <triton-root>/python/triton_kernels/triton_kernels to vllm/third_party/triton_kernelsduring build time and add the module to the sys.module["triton_kernel"] during run-time.

Fixes : #27672

Test Plan

local-build testing on H100 : python3 setup.py build_ext --inplace / uv pip install -vvv -e . --no-build-isolation
package-build testing on H100 : TORCH_CUDA_ARCH_LIST="9.0" python3 setup.py bdist_wheel --dist-dir=dist

gpt-oss serve command: vllm serve openai/gpt-oss-20b --tensor-parallel-size 2 --no-enable-prefix-caching --port 9010
gpt-oss eval command : OPENAI_API_KEY=empty python -m gpt_oss.evals --model openai/gpt-oss-20b --eval gpqa --n-threads 128 --reasoning-effort low --base-url http://localhost:9010/v1

Test Result

local-build : The build correctly copies triton_kernels into vllm/third_party/triton_kernels
package-build : The wheel when installed has triton_kernels in <site-packages>/vllm/third_party/triton_kernels
Both local_build and package-build obtains expected eval score of about 0.57.

Need to perform further testing with CI wheels.

Thanks @zyongye @daniel-fahey for the insights.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the build infrastructure to fetch and install OpenAI Triton kernels as a third-party dependency for CUDA builds. The changes are similar to how other external projects are handled.

My review identified two critical issues that need to be addressed:

  1. In cmake/external_projects/triton_kernels.cmake, an incorrect path with a trailing slash will cause the kernel files to be installed in the wrong directory, potentially overwriting other files.
  2. In setup.py, an unconditional shutil.copytree call will cause non-CUDA builds to fail because the source directory for Triton kernels will not exist.

Additionally, there appears to be a discrepancy in how the Triton kernels are being installed versus how they are imported in the codebase. The current setup installs them as vllm.third_party.triton_kernels, but existing code seems to expect a top-level triton_kernels package. This should be clarified and aligned to prevent import errors at runtime.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Varun Sundar Rabindranath added 4 commits November 17, 2025 20:50
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Varun Sundar Rabindranath added 4 commits November 17, 2025 22:00
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
@varun-sundar-rabindranath
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Varun Sundar Rabindranath added 3 commits November 17, 2025 23:50
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
@varun-sundar-rabindranath
Copy link
Contributor Author

@codex review

@varun-sundar-rabindranath
Copy link
Contributor Author

cc @zyongye @robertgshaw2-redhat @simon-mo PTAL! Thanks 🙌

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".



@cache
def import_triton_kernels():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where we decide what triton_kernels module to use. Perhaps we can add a VLLM_FORCE_USE_LOCAL_TRITON_KERNELS to pick the triton_kernels from vllm.third_party.triton_kernels - but I don't see the need for it now.

Signed-off-by: Varun Sundar Rabindranath <[email protected]>
@varun-sundar-rabindranath
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +19 to +23
# TODO (varun) : Fetch just the triton_kernels directory from Triton
GIT_REPOSITORY https://github.com/triton-lang/triton.git
GIT_TAG ${DEFAULT_TRITON_KERNELS_TAG}
GIT_PROGRESS TRUE
SOURCE_SUBDIR python/triton_kernels/triton_kernels

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge FetchContent triton_kernels lacks CMakeLists guard

The new FetchContent declaration points CMake at python/triton_kernels/triton_kernels without overriding CONFIGURE_COMMAND/BUILD_COMMAND, so FetchContent_MakeAvailable will try to run add_subdirectory on that directory. The upstream triton_kernels package is pure Python and has no CMakeLists.txt in that path, so any CUDA/HIP build will fail during CMake configure before the Python files are installed, preventing triton_kernels from being packaged at all.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets wait for the CI. I had the configure and build commands set to an empty string, but cmake complained (warnings) and I removed it. I could build it locally also.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 18, 2025
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
@varun-sundar-rabindranath
Copy link
Contributor Author

varun-sundar-rabindranath commented Nov 18, 2025

Based on a prior test run - https://buildkite.com/vllm/ci/builds/39432#019a94c7-7b73-420b-8d94-25da33d57f4f , I could see tests/kernels/moe/test_gpt_oss_triton_kernels.py run successfully for CUDA build. (Note that that test will run iff triton_kernels package is present)

Still need to verify if RocM build works for the same test. I see the AMD package being built successfully (https://buildkite.com/vllm/ci/builds/39432#019a94c7-7c06-477c-a056-0715a47c47e5) - but unfortunately, it looks like there aren't any AMD tests running on PRs ? @mgoin anything we can do here ?

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM reading through. Let's make sure we see CI tests dependent on the install running

@varun-sundar-rabindranath
Copy link
Contributor Author

LGTM reading through. Let's make sure we see CI tests dependent on the install running

I believe test_gpt_oss_triton_kernels.py is the only dependent test.

:~/code/vllm/tests (varun/vendor-triton-kernels) $ grep -rin has_triton_kernels ./ 
./kernels/moe/test_gpt_oss_triton_kernels.py:9:from vllm.utils.import_utils import has_triton_kernels
./kernels/moe/test_gpt_oss_triton_kernels.py:11:if not has_triton_kernels():

@varun-sundar-rabindranath
Copy link
Contributor Author

Failing tests :

I believe this is good to land - and we can verify the nightly wheels tomorrow. Thanks.

cc @mgoin @simon-mo @robertgshaw2-redhat

@vllm-bot vllm-bot merged commit 9912b8c into vllm-project:main Nov 19, 2025
86 of 89 checks passed
@varun-sundar-rabindranath
Copy link
Contributor Author

Tested nightly on H100, with

uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
vllm serve openai/gpt-oss-20b --tensor-parallel-size 2  --no-enable-prefix-caching  --port 9010

I could see vllm using the packaged the triton_kernels and the gpt-oss eval works fine.

I installed triton_kernels using triton_kernels @ git+https://github.com/triton-lang/[email protected]#subdirectory=python/triton_kernels after initial test and re-ran the vllm serve * command.

I could see vllm using the system triton_kernels and the gpt-oss eval works fine.

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
bhagyashrigai pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Nov 20, 2025
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Bhagyashri <[email protected]>
bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: jiang1.li <[email protected]>
bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build gpt-oss Related to GPT-OSS models kernel moe ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature]: Adding triton_kernels from Triton repo as a dependency

5 participants