Skip to content

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Sep 23, 2025

@WoosukKwon reported an IMA with FA3 full-CG that was fixed by doing https://github.com/vllm-project/vllm/compare/woosuk/fa3-ima?expand=1

the theory here is that get_scheduler_metadata was being called with a different max_num_splits than what was being passed to FlashAttentionMetadata

this is an alternative solution that doesn't lose the logic to use max_num_splits=0 (i.e. use the heuristic) for batches larger then max_cudagraph_size

we do not currently have a repo so cannot confirm this resolves @WoosukKwon 's IMA but this should be resolved regardless; we should alway make sure the arguments to get_scheduler_metadata and FlashAttentionMetadata are inline

vllm serve meta-llama/Meta-Llama-3-8B-Instruct -O.cudagraph_mode=FULL


lm_eval --model local-completions --model_args "base_url=http://0.0.0.0:8000/v1/completions,model=meta-llama/Meta-Llama-3-8B-Instruct,num_concurrent=256" --tasks gsm8k
...
local-completions (base_url=http://0.0.0.0:8000/v1/completions,model=meta-llama/Meta-Llama-3-8B-Instruct,num_concurrent=256), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7544|±  |0.0119|
|     |       |strict-match    |     5|exact_match|↑  |0.7559|±  |0.0118|

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a potential Invalid Memory Access in FlashAttention 3 with full CUDA graphs by ensuring the max_num_splits parameter is consistent. The change refactors the logic for setting max_num_splits to a common location. However, the current implementation introduces a critical flaw: it can lead to an UnboundLocalError because max_num_splits is not defined in all code paths. My review provides a fix for this issue to ensure the variable is always initialized. Addressing this will also help achieve the PR's goal of making the parameter consistent.

@tlrmchlsmth tlrmchlsmth added this to the v0.11.0 milestone Sep 23, 2025
Signed-off-by: Lucas Wilkinson <[email protected]>

fix

Signed-off-by: Lucas Wilkinson <[email protected]>

fix

Signed-off-by: Lucas Wilkinson <[email protected]>

comment

Signed-off-by: Lucas Wilkinson <[email protected]>
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/potential-full-CG-ima-fix branch from 58df19e to e1c19ca Compare September 23, 2025 16:53
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@WoosukKwon
Copy link
Collaborator

@LucasWilkinson Can you please check the CI again?

@WoosukKwon WoosukKwon merged commit 2338daf into main Sep 24, 2025
45 checks passed
@WoosukKwon WoosukKwon deleted the lwilkinson/potential-full-CG-ima-fix branch September 24, 2025 09:04
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: yewentao256 <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants