-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Fix GPTQ model loading in Transformers backend #25770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPTQ model loading in Transformers backend #25770
Conversation
Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the GPTQ model loading issue in the Transformers backend by introducing a mechanism to ignore unexpected weight suffixes, specifically .bias
for GPTQ models. The changes are well-implemented, extending the AutoWeightLoader
to support suffix-based ignoring. Additionally, the inclusion of new quantization tests for AWQ and GPTQ models, along with refactoring the ROCm skip logic, significantly improves the test suite's coverage and robustness. Overall, this is a solid contribution that enhances model compatibility.
Signed-off-by: Harry Mellor <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing!
I've cancelled the build to prevent wasted CI time. It's strange that https://buildkite.com/vllm/ci/builds/32718/steps/canvas?jid=01998a79-e1e5-495a-8ed4-750f01bf0250 failed, it did not fail locally for me. |
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: yewentao256 <[email protected]>
Signed-off-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
This PR:
AutoWeightLoader
".bias"
if the quant method isgptq
, as is done in many of the non-auto weight loaders inmodels/
This method could be leveraged in other model implementations to reduce the need to manually implement weight loading. For now, this is left as a future task.