Fix GPTQ model loading in Transformers backend #25770

hmellor · 2025-09-26T16:03:58Z

This PR:

Adds the ability to ignore unexpected suffixes to the AutoWeightLoader
In the Transformers backend, ignore the suffix ".bias" if the quant method is gptq, as is done in many of the non-auto weight loaders in models/
Adds a small AWQ & GPTQ model to the Transformers backend quantization test

This method could be leveraged in other model implementations to reduce the need to manually implement weight loading. For now, this is left as a future task.

Signed-off-by: Harry Mellor <[email protected]>

gemini-code-assist

Code Review

This pull request effectively addresses the GPTQ model loading issue in the Transformers backend by introducing a mechanism to ignore unexpected weight suffixes, specifically .bias for GPTQ models. The changes are well-implemented, extending the AutoWeightLoader to support suffix-based ignoring. Additionally, the inclusion of new quantization tests for AWQ and GPTQ models, along with refactoring the ROCm skip logic, significantly improves the test suite's coverage and robustness. Overall, this is a solid contribution that enhances model compatibility.

tests/models/test_transformers.py

Signed-off-by: Harry Mellor <[email protected]>

Isotr0py

LGTM, thanks for fixing!

hmellor · 2025-09-27T10:03:17Z

I've cancelled the build to prevent wasted CI time.

It's strange that https://buildkite.com/vllm/ci/builds/32718/steps/canvas?jid=01998a79-e1e5-495a-8ed4-750f01bf0250 failed, it did not fail locally for me.

Signed-off-by: Harry Mellor <[email protected]>

Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]>

Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Fix GPTQ model loading in Transformers backend

bc5a629

Signed-off-by: Harry Mellor <[email protected]>

hmellor requested a review from Isotr0py September 26, 2025 16:05

hmellor added this to Transformers backend Sep 26, 2025

gemini-code-assist bot reviewed Sep 26, 2025

View reviewed changes

tests/models/test_transformers.py Outdated Show resolved Hide resolved

Gemini comment

82c20d9

Signed-off-by: Harry Mellor <[email protected]>

hmellor moved this to Todo in Transformers backend Sep 26, 2025

hmellor moved this from Todo to In Progress in Transformers backend Sep 26, 2025

hmellor mentioned this pull request Sep 26, 2025

FusedMoE support for the Transformers backend #22650

Merged

2 tasks

Isotr0py approved these changes Sep 27, 2025

View reviewed changes

Merge branch 'main' into transformers-backend-gptq

7f468c3

Isotr0py enabled auto-merge (squash) September 27, 2025 09:20

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 27, 2025

Fix prefix so gptq_utils.get_linear_quant_method works

cb627cf

Signed-off-by: Harry Mellor <[email protected]>

jeejeelee approved these changes Sep 27, 2025

View reviewed changes

Isotr0py merged commit ec152c8 into vllm-project:main Sep 27, 2025
48 checks passed

github-project-automation bot moved this from In Progress to Done in Transformers backend Sep 27, 2025

hmellor deleted the transformers-backend-gptq branch September 27, 2025 12:23

pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025

Fix GPTQ model loading in Transformers backend (vllm-project#25770)

d85bd46

Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

Fix GPTQ model loading in Transformers backend (#25770)

c7ae7ed

Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: yewentao256 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix GPTQ model loading in Transformers backend #25770

Fix GPTQ model loading in Transformers backend #25770

Uh oh!

hmellor commented Sep 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Isotr0py left a comment

Uh oh!

hmellor commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix GPTQ model loading in Transformers backend #25770

Fix GPTQ model loading in Transformers backend #25770

Uh oh!

Conversation

hmellor commented Sep 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Sep 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants