Refactor sliding window configuration to Transformers best practice #21927

hmellor · 2025-07-30T11:14:15Z

Uses layer_types instead of sliding_window_pattern
Removes custom interleaved_sliding_window as it should not be necessary

Unlike the last time I tried to remove sliding_window: list[int] (#18494 (comment)), this PR correctly maps Mistral format sliding window configuration to the new HF style sliding window configuration.

This PR should also fix the issues that everyone has been seeing with Gemma 3.

Prior to this PR, vLLM was unable to detect that some Gemma 3 models were interleaved sliding attention models. This is because vLLM relied on the presence of sliding_window_pattern in the Gemma3Config. In earlier versions of Transformers this field was present and it was either read directly from config.json or derived based on the known pattern in Gemma 3.

However, in newer versions of Transformers sliding_window_pattern has been replaced with layer_types. This means that there is no longer a derived sliding_window_pattern in Gemma3Config and it will only appear if it is explicitly included in config.json. As you can see from this sample of Gemma 3 config.jsons, it was not always included:

present https://huggingface.co/google/gemma-3-1b-it/blob/main/config.json
not present https://huggingface.co/google/gemma-3-4b-it/blob/main/config.json
not present https://huggingface.co/google/gemma-3-12b-it/blob/main/config.json
not present https://huggingface.co/google/gemma-3-27b-it/blob/main/config.json

Since these configs contained sliding_window and vLLM was unable to recognise that they were interleaved, it treated Gemma 3 as an all sliding attention model. The lack of full attention layers can explain the decrease in task performance people saw.

PR updates vLLM to rely on the newer layer_types field, so it should fix the Gemma 3 models that were not working properly in vLLM. #20541 is a related PR that did add layer_types support locally in vLLM's Gemma 3 modelling code, but this PR applies it globally to all of vLLM so new models (and the Transformers backend) shouldn't have this issue.

Fixes huggingface/transformers#40017
Fixes #15752
Fixes #17689
Fixes #20341
Fixes #22270
Fixes #22475

Signed-off-by: Harry Mellor <[email protected]>

gemini-code-assist

Code Review

This pull request refactors the sliding window configuration to align with Transformers' best practices by using layer_types instead of custom attributes like sliding_window_pattern and interleaved_sliding_window. This is a positive change that simplifies the codebase. However, I've identified two critical issues. Firstly, there's an inconsistency in the string used to identify sliding attention layers ('sliding_attention' vs. 'sliding_window') between the main configuration file and the model implementations, which could lead to incorrect behavior. Secondly, the refactoring seems to have removed support for per-layer sliding window sizes when provided as a list, which could be a regression and cause runtime errors.

vllm/config.py

vllm/model_executor/models/phi4flash.py

github-actions · 2025-07-30T11:16:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Harry Mellor <[email protected]>

hmellor · 2025-07-30T17:42:17Z

(sorry everyone...)

Signed-off-by: Harry Mellor <[email protected]>

hmellor · 2025-08-08T10:33:56Z

I have updated the description to detail the consequences this PR has on Gemma 3 models

DarkLight1337

Thanks for fixing!

DarkLight1337 · 2025-08-09T06:18:45Z

Need to merge from main again

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Paul Pak <[email protected]>

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Diego-Castan <[email protected]>

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]>

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]>

Refactor sliding window configuration to Transformers best practice

5dc61b4

Signed-off-by: Harry Mellor <[email protected]>

mergify bot added documentation Improvements or additions to documentation llama Related to Llama models labels Jul 30, 2025

hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 30, 2025

gemini-code-assist bot reviewed Jul 30, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/model_executor/models/phi4flash.py Outdated Show resolved Hide resolved

hmellor added 2 commits July 30, 2025 14:01

Fix loading of mistral format configs

7c2fa94

Signed-off-by: Harry Mellor <[email protected]>

Phi4flash is a custom model that misuses sliding_window

b083a60

Signed-off-by: Harry Mellor <[email protected]>

hmellor marked this pull request as ready for review July 30, 2025 12:09

hmellor requested review from WoosukKwon, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth and youkaichao as code owners July 30, 2025 12:09

hmellor requested review from DarkLight1337, aarnphm, alexm-redhat, bigPYJ1151, comaniac, njhill, sighingnow and ywang96 as code owners July 30, 2025 17:34

mergify bot added ci/build frontend performance Performance-related issues qwen Related to Qwen models v1 labels Jul 30, 2025

hmellor added 5 commits August 6, 2025 12:27

Less ambiguous variable naming

ac5f9bd

Signed-off-by: Harry Mellor <[email protected]>

Merge branch 'main' into refactor-sliding-window-config

ad289a5

Merge branch 'main' into refactor-sliding-window-config

e1d837a

Merge branch 'main' into refactor-sliding-window-config

215626c

Merge branch 'main' into refactor-sliding-window-config

9cd23ea

DarkLight1337 approved these changes Aug 8, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 8, 2025 10:43

Merge branch 'main' into refactor-sliding-window-config

068e134

vllm-bot merged commit c498483 into vllm-project:main Aug 10, 2025
36 of 44 checks passed

hmellor deleted the refactor-sliding-window-config branch August 10, 2025 06:40

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

Refactor sliding window configuration to Transformers best practice (v…

479f950

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Paul Pak <[email protected]>

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

Refactor sliding window configuration to Transformers best practice (v…

fbe4c54

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Refactor sliding window configuration to Transformers best practice (v…

732746d

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Refactor sliding window configuration to Transformers best practice (v…

82eed1f

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Refactor sliding window configuration to Transformers best practice (v…

cf1fa0b

…llm-project#21927) Signed-off-by: Harry Mellor <[email protected]>

hmellor added this to Transformers backend Sep 24, 2025

hmellor moved this to Done in Transformers backend Sep 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor sliding window configuration to Transformers best practice #21927

Refactor sliding window configuration to Transformers best practice #21927

Uh oh!

hmellor commented Jul 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

hmellor commented Jul 30, 2025

Uh oh!

hmellor commented Aug 8, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

DarkLight1337 commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Refactor sliding window configuration to Transformers best practice #21927

Refactor sliding window configuration to Transformers best practice #21927

Uh oh!

Conversation

hmellor commented Jul 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 30, 2025

Uh oh!

hmellor commented Jul 30, 2025

Uh oh!

hmellor commented Aug 8, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hmellor commented Jul 30, 2025 •

edited by github-actions bot

Loading