Skip to content

Conversation

hmellor
Copy link
Member

@hmellor hmellor commented Jul 30, 2025

  • Uses layer_types instead of sliding_window_pattern
  • Removes custom interleaved_sliding_window as it should not be necessary

Unlike the last time I tried to remove sliding_window: list[int] (#18494 (comment)), this PR correctly maps Mistral format sliding window configuration to the new HF style sliding window configuration.


This PR should also fix the issues that everyone has been seeing with Gemma 3.

Prior to this PR, vLLM was unable to detect that some Gemma 3 models were interleaved sliding attention models. This is because vLLM relied on the presence of sliding_window_pattern in the Gemma3Config. In earlier versions of Transformers this field was present and it was either read directly from config.json or derived based on the known pattern in Gemma 3.

However, in newer versions of Transformers sliding_window_pattern has been replaced with layer_types. This means that there is no longer a derived sliding_window_pattern in Gemma3Config and it will only appear if it is explicitly included in config.json. As you can see from this sample of Gemma 3 config.jsons, it was not always included:

Since these configs contained sliding_window and vLLM was unable to recognise that they were interleaved, it treated Gemma 3 as an all sliding attention model. The lack of full attention layers can explain the decrease in task performance people saw.

PR updates vLLM to rely on the newer layer_types field, so it should fix the Gemma 3 models that were not working properly in vLLM. #20541 is a related PR that did add layer_types support locally in vLLM's Gemma 3 modelling code, but this PR applies it globally to all of vLLM so new models (and the Transformers backend) shouldn't have this issue.

Fixes huggingface/transformers#40017
Fixes #15752
Fixes #17689
Fixes #20341
Fixes #22270
Fixes #22475

@mergify mergify bot added documentation Improvements or additions to documentation llama Related to Llama models labels Jul 30, 2025
@hmellor hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 30, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the sliding window configuration to align with Transformers' best practices by using layer_types instead of custom attributes like sliding_window_pattern and interleaved_sliding_window. This is a positive change that simplifies the codebase. However, I've identified two critical issues. Firstly, there's an inconsistency in the string used to identify sliding attention layers ('sliding_attention' vs. 'sliding_window') between the main configuration file and the model implementations, which could lead to incorrect behavior. Secondly, the refactoring seems to have removed support for per-layer sliding window sizes when provided as a list, which could be a regression and cause runtime errors.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@hmellor
Copy link
Member Author

hmellor commented Jul 30, 2025

(sorry everyone...)

@hmellor
Copy link
Member Author

hmellor commented Aug 8, 2025

I have updated the description to detail the consequences this PR has on Gemma 3 models

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) August 8, 2025 10:43
@DarkLight1337
Copy link
Member

Need to merge from main again

@vllm-bot vllm-bot merged commit c498483 into vllm-project:main Aug 10, 2025
36 of 44 checks passed
@hmellor hmellor deleted the refactor-sliding-window-config branch August 10, 2025 06:40
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
@hmellor hmellor moved this to Done in Transformers backend Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

4 participants