[ROCm] Auto-Select Attention Backend #21366

vllmellm · 2025-07-22T10:22:12Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

The use of environment variables , especially for Aiter kernels on ROCm, has been a pain point for some users as mentioned in #21138.

This PR introduces:

Selection of attention backends for ROCm based on priority(performance) and support instead of environment variables.
Graceful handling of unsupported attention backends.

Additionally, the attention selection logic in this PR maintains the ability to force a backend thorough the VLLM_ATTENTION_BACKEND variable, allowing users to easily switch backends.

Although the selection is implemented for ROCm hardware only, it can be extended to other hardwares in future PRs.

Test Plan

Implement unit test for the backend selection function. To run, use the following command

pytest tests/attention/test_attention_selector.py

Test Result

tests/attention/test_attention_selector.py ................                                                               [100%]

====================================================== 16 passed in 2.85s =======================================================

Signed-off-by: vllmellm <[email protected]>

github-actions · 2025-07-22T10:24:30Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

The pull request refactors the pa rocm code. The changes include adding new parameters to the _IsSupported dataclass and is_attn_backend_supported function in vllm/attention/selector.py, modifying environment variables in vllm/envs.py, and updating attention backend selection logic in vllm/platforms/rocm.py. Additionally, new files and modifications are introduced to handle attention backends in vllm/v1/attention/backends/.

vllm/envs.py

vllm/v1/attention/backends/triton_attn.py

Signed-off-by: vllmellm <[email protected]>

DarkLight1337 · 2025-07-22T10:55:08Z

vllm/v1/attention/backends/__init__.py

+]
+
+
+def choose_attention_backend(


This should be inside selector.py imo. Also it is rather confusing that not all existing backends are considered in this function.

For now, we are only considering attention backends used with rocm on V1. We are considering supporting all attention backends in a future PRs; however, we will add some comments to clarify for other developers.

DarkLight1337 · 2025-07-22T10:57:44Z

vllm/envs.py

+    # and performance comparisons. Currently only affects attentions backends
+    # that run on ROCm (backends: AiterFlashAttentionBackend,
+    # TritonSplitPrefillDecodeAttentionBackend, TritonUnifiedAttentionBackend)
+    "VLLM_DISABLED_BACKENDS":


I think that it is more straightforward to set VLLM_ATTENTION_BACKEND directly. If this is only used to help test the attention selector, we can directly patch global variables.

One of the motivations is to follow kernels abstraction where developer can define which attention backend can run on which hardware based on the dependencies available in the environment. So, we can always pick the fastest backend for a hardware and not to always use Triton implementation as a default.

The abstraction is also to make it clear to developer and user where they should find these custom logic where defines the default behavior of vLLM's attention backend.

…ropriate file Signed-off-by: vllmellm <[email protected]>

Signed-off-by: vllmellm <[email protected]>

mergify · 2025-07-25T09:25:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: vllmellm <[email protected]>

mergify · 2025-07-25T14:51:10Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: vllmellm <[email protected]>

vllmellm · 2025-08-05T09:01:58Z

cc @LucasWilkinson @DarkLight1337 @tjtanaa

DarkLight1337 · 2025-08-05T09:07:05Z

vllm/v1/attention/backends/rocm_aiter_fa.py

+        pass
+
+    @classmethod
+    def validate_device_capabality(cls) -> None:


IMO this part should be handled by the platform

Or at least, it needs to accept the platform being used

Or at least, it needs to accept the platform being used
@DarkLight1337 @vllmellm
I think the second approach is better, it delegates the responsibility to the Attention class itself. All the checks if a backend is supported should be centralized in the class itself.
platform should just a place to retrieve platform information, it should not determine if an attention backend can run or not.

This can improve readability and maintainability.

mergify · 2025-08-07T16:56:30Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllmellm added 4 commits July 22, 2025 09:53

split triton attention

09f1f7c

Signed-off-by: vllmellm <[email protected]>

add backend selector

7951175

Signed-off-by: vllmellm <[email protected]>

raise error on no backend found

7dae663

Signed-off-by: vllmellm <[email protected]>

remove unused env vars

6eae6ac

Signed-off-by: vllmellm <[email protected]>

mergify bot added rocm Related to AMD ROCm v1 labels Jul 22, 2025

gemini-code-assist bot reviewed Jul 22, 2025

View reviewed changes

vllm/envs.py Show resolved Hide resolved

vllm/v1/attention/backends/triton_attn.py Outdated Show resolved Hide resolved

vllm/v1/attention/backends/triton_attn.py Show resolved Hide resolved

update TritonAttentionBackend calls

db7d9d6

Signed-off-by: vllmellm <[email protected]>

DarkLight1337 reviewed Jul 22, 2025

View reviewed changes

vllmellm added 5 commits July 23, 2025 12:56

prefer backend selection over disablement; move selector logic to app…

001f69f

…ropriate file Signed-off-by: vllmellm <[email protected]>

correct names

70a23e3

Signed-off-by: vllmellm <[email protected]>

format

a68e39c

Signed-off-by: vllmellm <[email protected]>

add missing args in cuda path; cosmetic changes

4005871

Signed-off-by: vllmellm <[email protected]>

add unit tests

1c99d73

Signed-off-by: vllmellm <[email protected]>

mergify bot added the ci/build label Jul 25, 2025

check all variables for isSupported; cosmetic changes

4e07fd9

Signed-off-by: vllmellm <[email protected]>

mergify bot added the needs-rebase label Jul 25, 2025

vllmellm added 2 commits July 25, 2025 11:02

log failure reasons

3488d83

Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin/main' into refactor-pa-rocm

6f739d7

Signed-off-by: vllmellm <[email protected]>

mergify bot removed the needs-rebase label Jul 25, 2025

vllmellm added 3 commits July 25, 2025 11:09

add documentation

cf9f41a

Signed-off-by: vllmellm <[email protected]>

fix merge conflict

4c5795c

Signed-off-by: vllmellm <[email protected]>

update attention backend vars for cuda

2f2f9ee

Signed-off-by: vllmellm <[email protected]>

vllmellm marked this pull request as ready for review July 25, 2025 11:37

vllmellm requested review from WoosukKwon, njhill, robertgshaw2-redhat and tlrmchlsmth as code owners July 25, 2025 11:37

mergify bot added the needs-rebase label Jul 25, 2025

vllmellm marked this pull request as draft July 25, 2025 17:11

vllmellm mentioned this pull request Jul 29, 2025

[RFC][Feature]: Unified Auto-Selection Mechanism for Attention Backends #21805

Open

5 tasks

github-project-automation bot added this to Structured Output Aug 4, 2025

mergify bot added the tool-calling label Aug 4, 2025

github-project-automation bot added this to Tool Calling Aug 4, 2025

vllmellm force-pushed the refactor-pa-rocm branch from 1e89f73 to 1ba6982 Compare August 4, 2025 04:49

mergify bot removed the tpu Related to Google TPUs label Aug 4, 2025

vllmellm added 3 commits August 4, 2025 05:12

Merge branch 'main' into refactor-pa-rocm

3c6f9bd

Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin' into refactor-pa-rocm

3aee325

Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin/main' into refactor-pa-rocm

ad0203c

Signed-off-by: vllmellm <[email protected]>

mergify bot removed the needs-rebase label Aug 5, 2025

vllmellm marked this pull request as ready for review August 5, 2025 09:00

DarkLight1337 reviewed Aug 5, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 7, 2025

llsj14 mentioned this pull request Aug 16, 2025

Attention Implementation in vLLM llsj14/vllm#1

Open

Uh oh!

[ROCm] Auto-Select Attention Backend #21366

Are you sure you want to change the base?

[ROCm] Auto-Select Attention Backend #21366

Uh oh!

Conversation

vllmellm commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Jul 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

vllmellm Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jul 25, 2025

Uh oh!

mergify bot commented Jul 25, 2025

Uh oh!

vllmellm commented Aug 5, 2025

Uh oh!

DarkLight1337 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vllmellm commented Jul 22, 2025 •

edited

Loading

tjtanaa Jul 22, 2025 •

edited

Loading