[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490

LucasWilkinson · 2025-09-23T16:46:42Z

@WoosukKwon reported an IMA with FA3 full-CG that was fixed by doing https://github.com/vllm-project/vllm/compare/woosuk/fa3-ima?expand=1

the theory here is that get_scheduler_metadata was being called with a different max_num_splits than what was being passed to FlashAttentionMetadata

this is an alternative solution that doesn't lose the logic to use max_num_splits=0 (i.e. use the heuristic) for batches larger then max_cudagraph_size

we do not currently have a repo so cannot confirm this resolves @WoosukKwon 's IMA but this should be resolved regardless; we should alway make sure the arguments to get_scheduler_metadata and FlashAttentionMetadata are inline

vllm serve meta-llama/Meta-Llama-3-8B-Instruct -O.cudagraph_mode=FULL


lm_eval --model local-completions --model_args "base_url=http://0.0.0.0:8000/v1/completions,model=meta-llama/Meta-Llama-3-8B-Instruct,num_concurrent=256" --tasks gsm8k
...
local-completions (base_url=http://0.0.0.0:8000/v1/completions,model=meta-llama/Meta-Llama-3-8B-Instruct,num_concurrent=256), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7544|±  |0.0119|
|     |       |strict-match    |     5|exact_match|↑  |0.7559|±  |0.0118|

gemini-code-assist

Code Review

This pull request aims to fix a potential Invalid Memory Access in FlashAttention 3 with full CUDA graphs by ensuring the max_num_splits parameter is consistent. The change refactors the logic for setting max_num_splits to a common location. However, the current implementation introduces a critical flaw: it can lead to an UnboundLocalError because max_num_splits is not defined in all code paths. My review provides a fix for this issue to ensure the variable is always initialized. Addressing this will also help achieve the PR's goal of making the parameter consistent.

vllm/v1/attention/backends/flash_attn.py

Signed-off-by: Lucas Wilkinson <[email protected]> fix Signed-off-by: Lucas Wilkinson <[email protected]> fix Signed-off-by: Lucas Wilkinson <[email protected]> comment Signed-off-by: Lucas Wilkinson <[email protected]>

WoosukKwon

Thanks for the fix!

WoosukKwon · 2025-09-23T20:30:26Z

@LucasWilkinson Can you please check the CI again?

Signed-off-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 23, 2025 16:46

mergify bot added the v1 label Sep 23, 2025

gemini-code-assist bot reviewed Sep 23, 2025

View reviewed changes

vllm/v1/attention/backends/flash_attn.py Show resolved Hide resolved

tlrmchlsmth added this to the v0.11.0 milestone Sep 23, 2025

fix

e1c19ca

Signed-off-by: Lucas Wilkinson <[email protected]> fix Signed-off-by: Lucas Wilkinson <[email protected]> fix Signed-off-by: Lucas Wilkinson <[email protected]> comment Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson force-pushed the lwilkinson/potential-full-CG-ima-fix branch from 58df19e to e1c19ca Compare September 23, 2025 16:53

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025

WoosukKwon approved these changes Sep 23, 2025

View reviewed changes

mgoin and others added 2 commits September 23, 2025 21:31

Merge branch 'main' into lwilkinson/potential-full-CG-ima-fix

91f28b4

fix CI

ad60fa8

Signed-off-by: Lucas Wilkinson <[email protected]>

WoosukKwon merged commit 2338daf into main Sep 24, 2025
45 checks passed

WoosukKwon deleted the lwilkinson/potential-full-CG-ima-fix branch September 24, 2025 09:04

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (vllm-project#25490)

26efeb9

Signed-off-by: Lucas Wilkinson <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (#25490)

d1e2d17

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: yewentao256 <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (vllm-project#25490)

67d0ecf

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (vllm-project#25490)

b579140

Signed-off-by: Lucas Wilkinson <[email protected]>

Daisy-Ma-coder mentioned this pull request Oct 17, 2025

[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 #27128

Merged

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (vllm-project#25490)

f4c5416

Signed-off-by: Lucas Wilkinson <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (vllm-project#25490)

1aa1b02

Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[BugFix] Potential Fix for FA3 full-cudagraph IMA (vllm-project#25490)

81c2ab2

Signed-off-by: Lucas Wilkinson <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490

[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490

Uh oh!

LucasWilkinson commented Sep 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

WoosukKwon left a comment

Uh oh!

WoosukKwon commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490

[BugFix] Potential Fix for FA3 full-cudagraph IMA #25490

Uh oh!

Conversation

LucasWilkinson commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LucasWilkinson commented Sep 23, 2025 •

edited by github-actions bot

Loading