[Llama4] Enable attention temperature tuning by default for long context (>32k) #16439

sarckk · 2025-04-11T00:42:39Z

Attention temperature tuning (on nope layers) improves accuracy on long context (>32k) tasks. Enabling it by default for long context unless explicitly disabled by the user.

Eval results on RULER with and without attention temperature tuning (@luccafong):

cc: @ywang96 @simon-mo @DarkLight1337

Signed-off-by: Ye (Charlotte) Qi <[email protected]>

github-actions · 2025-04-11T00:42:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

yeqcharlotte · 2025-04-11T00:48:58Z

we probably need to update blog accordingly and tell users how to disable instead lol also cc: @astonzhang

yeqcharlotte

Thanks for all the evals!

houseroad

Looks good, thanks!

Since we auto-enable this with max-model-len > 32 in PR vllm-project/vllm#16439, this tip can be removed to avoid confusion.

…ext (>32k) (vllm-project#16439) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Yang Wang <[email protected]>

…ext (>32k) (vllm-project#16439) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]>

…ext (>32k) (vllm-project#16439) Signed-off-by: Ye (Charlotte) Qi <[email protected]> Co-authored-by: Ye (Charlotte) Qi <[email protected]> Signed-off-by: Mu Huai <[email protected]>

yeqcharlotte added 2 commits April 9, 2025 17:26

turn on attn temp tuning by default

7fe8f46

Signed-off-by: Ye (Charlotte) Qi <[email protected]>

default attn temp tuning for >32k only

7575935

Signed-off-by: Ye (Charlotte) Qi <[email protected]>

sarckk requested a review from houseroad April 11, 2025 00:42

sarckk mentioned this pull request Apr 11, 2025

[Llama4] Enable attention temperature tuning by default #16417

Closed

yeqcharlotte approved these changes Apr 11, 2025

View reviewed changes

houseroad approved these changes Apr 11, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 11, 2025

mgoin approved these changes Apr 11, 2025

View reviewed changes

vllm-bot merged commit 99ef59c into vllm-project:main Apr 11, 2025
59 of 63 checks passed

luccafong mentioned this pull request Apr 15, 2025

remove tips for attn_temperature_tuning in llama4 blog vllm-project/vllm-project.github.io#51

Merged

yeqcharlotte mentioned this pull request Apr 26, 2025

[Model] set default attn tmp scaling to True for llama4 #16216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Llama4] Enable attention temperature tuning by default for long context (>32k) #16439

[Llama4] Enable attention temperature tuning by default for long context (>32k) #16439

Uh oh!

sarckk commented Apr 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 11, 2025

Uh oh!

yeqcharlotte commented Apr 11, 2025

Uh oh!

yeqcharlotte left a comment

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Llama4] Enable attention temperature tuning by default for long context (>32k) #16439

[Llama4] Enable attention temperature tuning by default for long context (>32k) #16439

Uh oh!

Conversation

sarckk commented Apr 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 11, 2025

Uh oh!

yeqcharlotte commented Apr 11, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sarckk commented Apr 11, 2025 •

edited by github-actions bot

Loading