Support AutoAWQ in `awq-py`

# Feature Description

To support the AutoAWQ models, the proposal is simple. Load the model through `AutoModelForCausalLM.from_pretrained()` and convert the WQLinear modules.

# Motivation

AutoAWQ is an improved version of `llm-awq`. We have made quantizing and working with the quantized models much easier, resulting in integrations into vLLM, transformers, OpenNMT, and other frameworks. On Huggingface, you can currently find [~1200 INT4 models](https://huggingface.co/models?search=awq) that are made with AutoAWQ, primarily provided by TheBloke.

AutoAWQ does not store the scales because they are redundant for running inference. Instead, we store the real quantized model weights in a one-step process. This means the process will be much easier for `llama.cpp` users since they can just grab a model from the hub and export it to GGUF, resulting in lower perplexity and better models to chat with.

# Possible Implementation

Solution 1: One possible implementation is to unpack the weights to FP16 and convert them to GGUF. I am unsure if this will introduce any unpacking error.

- [`dequantize_weights_cuda`](https://github.com/casper-hansen/AutoAWQ_kernels): `awq_ext.dequantize_weights_cuda(qweight, scales, qzeros, 1, 0, 0, False)`. This is quite simple to call, just install the kernels package.
- [`unpack_awq`](https://github.com/PanQiWei/AutoGPTQ/blob/awq-support/auto_gptq/modeling/_utils.py#L416): This feature is being introduced into AutoGPTQ in order to unpack the weights of AWQ. This may be another solution for unpacking.

Other solutions include directly converting the weights to GGUF. The main problem is that the packing for AWQ models is a bit complicated, and I am not sure you can directly convert it to another format.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support AutoAWQ in `awq-py` #4701

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support AutoAWQ in awq-py #4701

Description

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support AutoAWQ in `awq-py` #4701