[Build] Add OpenAI triton_kernels #28788

varun-sundar-rabindranath · 2025-11-15T22:37:49Z

Purpose

This PR adds https://github.com/triton-lang/triton/tree/main/python/triton_kernels to vLLM.
We can't install this package via pip. Please take a look at #27659 . As a result, this PR, injects the triton_kernels package directly into vLLM during build time, similar to the approach we take with vllm_flash_attn. Concretely, we just copy the entire triton_kernels folder in <triton-root>/python/triton_kernels/triton_kernels to vllm/third_party/triton_kernelsduring build time and add the module to the sys.module["triton_kernel"] during run-time.

Fixes : #27672

Test Plan

local-build testing on H100 : python3 setup.py build_ext --inplace / uv pip install -vvv -e . --no-build-isolation
package-build testing on H100 : TORCH_CUDA_ARCH_LIST="9.0" python3 setup.py bdist_wheel --dist-dir=dist

gpt-oss serve command: vllm serve openai/gpt-oss-20b --tensor-parallel-size 2 --no-enable-prefix-caching --port 9010
gpt-oss eval command : OPENAI_API_KEY=empty python -m gpt_oss.evals --model openai/gpt-oss-20b --eval gpqa --n-threads 128 --reasoning-effort low --base-url http://localhost:9010/v1

Test Result

local-build : The build correctly copies triton_kernels into vllm/third_party/triton_kernels
package-build : The wheel when installed has triton_kernels in <site-packages>/vllm/third_party/triton_kernels
Both local_build and package-build obtains expected eval score of about 0.57.

Need to perform further testing with CI wheels.

Thanks @zyongye @daniel-fahey for the insights.

gemini-code-assist

Code Review

This pull request adds the build infrastructure to fetch and install OpenAI Triton kernels as a third-party dependency for CUDA builds. The changes are similar to how other external projects are handled.

My review identified two critical issues that need to be addressed:

In cmake/external_projects/triton_kernels.cmake, an incorrect path with a trailing slash will cause the kernel files to be installed in the wrong directory, potentially overwriting other files.
In setup.py, an unconditional shutil.copytree call will cause non-CUDA builds to fail because the source directory for Triton kernels will not exist.

Additionally, there appears to be a discrepancy in how the Triton kernels are being installed versus how they are imported in the codebase. The current setup installs them as vllm.third_party.triton_kernels, but existing code seems to expect a top-level triton_kernels package. This should be clarified and aligned to prevent import errors at runtime.

cmake/external_projects/triton_kernels.cmake

setup.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

cmake/external_projects/triton_kernels.cmake

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath · 2025-11-17T23:27:02Z

@codex review

setup.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/utils/import_utils.py

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath · 2025-11-18T01:10:10Z

@codex review

varun-sundar-rabindranath · 2025-11-18T01:12:16Z

cc @zyongye @robertgshaw2-redhat @simon-mo PTAL! Thanks 🙌

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

setup.py

varun-sundar-rabindranath · 2025-11-18T01:17:30Z

vllm/utils/import_utils.py



+@cache
+def import_triton_kernels():


This is the only place where we decide what triton_kernels module to use. Perhaps we can add a VLLM_FORCE_USE_LOCAL_TRITON_KERNELS to pick the triton_kernels from vllm.third_party.triton_kernels - but I don't see the need for it now.

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath · 2025-11-18T01:35:13Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-18T01:38:21Z

cmake/external_projects/triton_kernels.cmake

+          # TODO (varun) : Fetch just the triton_kernels directory from Triton
+          GIT_REPOSITORY https://github.com/triton-lang/triton.git
+          GIT_TAG ${DEFAULT_TRITON_KERNELS_TAG}
+          GIT_PROGRESS TRUE
+          SOURCE_SUBDIR python/triton_kernels/triton_kernels


FetchContent triton_kernels lacks CMakeLists guard

The new FetchContent declaration points CMake at python/triton_kernels/triton_kernels without overriding CONFIGURE_COMMAND/BUILD_COMMAND, so FetchContent_MakeAvailable will try to run add_subdirectory on that directory. The upstream triton_kernels package is pure Python and has no CMakeLists.txt in that path, so any CUDA/HIP build will fail during CMake configure before the Python files are installed, preventing triton_kernels from being packaged at all.

Useful? React with 👍 / 👎.

Lets wait for the CI. I had the configure and build commands set to an empty string, but cmake complained (warnings) and I removed it. I could build it locally also.

vllm/utils/import_utils.py

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath · 2025-11-18T19:27:50Z

Based on a prior test run - https://buildkite.com/vllm/ci/builds/39432#019a94c7-7b73-420b-8d94-25da33d57f4f , I could see tests/kernels/moe/test_gpt_oss_triton_kernels.py run successfully for CUDA build. (Note that that test will run iff triton_kernels package is present)

~~Still need to verify if RocM build works for the same test.~~ I see the AMD package being built successfully (https://buildkite.com/vllm/ci/builds/39432#019a94c7-7c06-477c-a056-0715a47c47e5) - but unfortunately, it looks like there aren't any AMD tests running on PRs ? @mgoin anything we can do here ?

mgoin

LGTM reading through. Let's make sure we see CI tests dependent on the install running

varun-sundar-rabindranath · 2025-11-18T19:34:51Z

LGTM reading through. Let's make sure we see CI tests dependent on the install running

I believe test_gpt_oss_triton_kernels.py is the only dependent test.

:~/code/vllm/tests (varun/vendor-triton-kernels) $ grep -rin has_triton_kernels ./ 
./kernels/moe/test_gpt_oss_triton_kernels.py:9:from vllm.utils.import_utils import has_triton_kernels
./kernels/moe/test_gpt_oss_triton_kernels.py:11:if not has_triton_kernels():

varun-sundar-rabindranath · 2025-11-18T23:52:25Z

Failing tests :

buildkite/ci/pr/entrypoints-integration-test-api-server similar error on main in commit bf9e1e8
buildkite/ci/pr/examples-test Fails on Nightly.
I verified that the installation dependent test for the CUDA has run successfully.
From the logs, I see that the AMD build has the triton_kernels packaged. (note there are not AMD tests run on the PR)

I believe this is good to land - and we can verify the nightly wheels tomorrow. Thanks.

cc @mgoin @simon-mo @robertgshaw2-redhat

varun-sundar-rabindranath · 2025-11-19T16:07:26Z

Tested nightly on H100, with

uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
vllm serve openai/gpt-oss-20b --tensor-parallel-size 2  --no-enable-prefix-caching  --port 9010

I could see vllm using the packaged the triton_kernels and the gpt-oss eval works fine.

I installed triton_kernels using triton_kernels @ git+https://github.com/triton-lang/[email protected]#subdirectory=python/triton_kernels after initial test and re-ran the vllm serve * command.

I could see vllm using the system triton_kernels and the gpt-oss eval works fine.

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Bhagyashri <[email protected]>

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: jiang1.li <[email protected]>

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath requested review from LucasWilkinson and tlrmchlsmth as code owners November 15, 2025 22:37

varun-sundar-rabindranath marked this pull request as draft November 15, 2025 22:37

mergify bot added the ci/build label Nov 15, 2025

gemini-code-assist bot reviewed Nov 15, 2025

View reviewed changes

cmake/external_projects/triton_kernels.cmake Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 15, 2025

View reviewed changes

cmake/external_projects/triton_kernels.cmake Outdated Show resolved Hide resolved

Varun Sundar Rabindranath added 4 commits November 17, 2025 20:50

add triton_kernels cmake

931f311

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

load triton_kernels

44aa854

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

add logging status

7dfb6f6

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

FetchContent_Populate -> FetchContent_MakeAvailable

fa9ee65

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath force-pushed the varun/vendor-triton-kernels branch from 5f68074 to fa9ee65 Compare November 17, 2025 21:13

Varun Sundar Rabindranath added 4 commits November 17, 2025 22:00

fix destination copy

3e67839

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

copy triton_kernels iff it is cuda build

d842503

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fixes

14e9127

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fixes

9d1fc1e

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath commented Nov 17, 2025

View reviewed changes

setup.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 17, 2025

View reviewed changes

vllm/utils/import_utils.py Outdated Show resolved Hide resolved

Varun Sundar Rabindranath added 3 commits November 17, 2025 23:50

fixesw

31284fe

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fixes

276012c

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

fix lint

e69f09f

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

varun-sundar-rabindranath marked this pull request as ready for review November 18, 2025 01:11

varun-sundar-rabindranath requested review from mgoin, pavanimajety, robertgshaw2-redhat and yewentao256 as code owners November 18, 2025 01:11

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

setup.py Show resolved Hide resolved

varun-sundar-rabindranath commented Nov 18, 2025

View reviewed changes

include cmake for HIP

0253f80

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 18, 2025

varun-sundar-rabindranath commented Nov 18, 2025

View reviewed changes

vllm/utils/import_utils.py Show resolved Hide resolved

robertgshaw2-redhat reviewed Nov 18, 2025

View reviewed changes

vllm/utils/import_utils.py Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Nov 18, 2025

View reviewed changes

vllm/utils/import_utils.py Outdated Show resolved Hide resolved

robertgshaw2-redhat reviewed Nov 18, 2025

View reviewed changes

vllm/utils/import_utils.py Show resolved Hide resolved

info -> debug

8194c8a

Signed-off-by: Varun Sundar Rabindranath <[email protected]>

mgoin approved these changes Nov 18, 2025

View reviewed changes

mgoin added kernel moe gpt-oss Related to GPT-OSS models labels Nov 18, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 18, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 18, 2025

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Nov 18, 2025

tlrmchlsmth approved these changes Nov 18, 2025

View reviewed changes

vllm-bot merged commit 9912b8c into vllm-project:main Nov 19, 2025
86 of 89 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Nov 19, 2025

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025

[Build] Add OpenAI triton_kernels (vllm-project#28788)

d52a10b

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>

mgoin mentioned this pull request Nov 24, 2025

[Bugfix] Only use triton_kernels for MXFP4 on SM90 and SM100 #29339

Merged

5 tasks

bigPYJ1151 pushed a commit that referenced this pull request Nov 25, 2025

[Build] Add OpenAI triton_kernels (#28788)

8494ee5

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: jiang1.li <[email protected]>

bringlein pushed a commit to bringlein/vllm that referenced this pull request Nov 26, 2025

[Build] Add OpenAI triton_kernels (vllm-project#28788)

0bd069b

Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>



		@cache
		def import_triton_kernels():

Uh oh!

[Build] Add OpenAI triton_kernels #28788

[Build] Add OpenAI triton_kernels #28788

Uh oh!

Conversation

varun-sundar-rabindranath commented Nov 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Nov 17, 2025

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Nov 18, 2025

Uh oh!

varun-sundar-rabindranath commented Nov 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

varun-sundar-rabindranath Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath commented Nov 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath commented Nov 18, 2025

Uh oh!

varun-sundar-rabindranath commented Nov 18, 2025

Uh oh!

Uh oh!

varun-sundar-rabindranath commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

varun-sundar-rabindranath commented Nov 15, 2025 •

edited by github-actions bot

Loading

varun-sundar-rabindranath commented Nov 18, 2025 •

edited

Loading