Skip to content

Conversation

hmellor
Copy link
Member

@hmellor hmellor commented Sep 26, 2025

This PR:

  • Adds the ability to ignore unexpected suffixes to the AutoWeightLoader
  • In the Transformers backend, ignore the suffix ".bias" if the quant method is gptq, as is done in many of the non-auto weight loaders in models/
  • Adds a small AWQ & GPTQ model to the Transformers backend quantization test

This method could be leveraged in other model implementations to reduce the need to manually implement weight loading. For now, this is left as a future task.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the GPTQ model loading issue in the Transformers backend by introducing a mechanism to ignore unexpected weight suffixes, specifically .bias for GPTQ models. The changes are well-implemented, extending the AutoWeightLoader to support suffix-based ignoring. Additionally, the inclusion of new quantization tests for AWQ and GPTQ models, along with refactoring the ROCm skip logic, significantly improves the test suite's coverage and robustness. Overall, this is a solid contribution that enhances model compatibility.

Signed-off-by: Harry Mellor <[email protected]>
@hmellor hmellor moved this to Todo in Transformers backend Sep 26, 2025
@hmellor hmellor moved this from Todo to In Progress in Transformers backend Sep 26, 2025
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing!

@Isotr0py Isotr0py enabled auto-merge (squash) September 27, 2025 09:20
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 27, 2025
@hmellor
Copy link
Member Author

hmellor commented Sep 27, 2025

I've cancelled the build to prevent wasted CI time.

It's strange that https://buildkite.com/vllm/ci/builds/32718/steps/canvas?jid=01998a79-e1e5-495a-8ed4-750f01bf0250 failed, it did not fail locally for me.

@Isotr0py Isotr0py merged commit ec152c8 into vllm-project:main Sep 27, 2025
48 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Transformers backend Sep 27, 2025
@hmellor hmellor deleted the transformers-backend-gptq branch September 27, 2025 12:23
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Harry Mellor <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: yewentao256 <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Harry Mellor <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants