[BUG] `Marlin` kernel incorrectly selected in `backend.AUTO` code path

**Describe the bug**

When using older GPUs not supported by Marlin, Marlin still gets chosen as backend, causing error like this:
```
    model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
    hf_quantizer.postprocess_model(model, config=config)
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/quantizers/base.py", line 207, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/quantizers/quantizer_gptq.py", line 107, in _process_model_after_weight_loading
    model = self.optimum_quantizer.post_init_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 738, in post_init_model
    model = gptq_post_init(model, use_act_order=self.desc_act)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/utils/model.py", line 494, in hf_gptqmodel_post_init
    return gptqmodel_post_init(model, use_act_order, quantize_config, max_input_length)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/utils/model.py", line 614, in gptqmodel_post_init
    submodule.post_init()
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 339, in post_init
    replace_tensor(self, "qweight", marlin_qweight)
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 95, in replace_tensor
    getattr(layer, name).copy_(new_t)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.`
```
After I commented off [this line](https://github.com/ModelCloud/GPTQModel/blob/ede890cfa87264106c58bc82533ff3ec5df2d06d/gptqmodel/utils/importer.py#L171), it works fine, as Marlin is no longer used. 


**GPU Info**

I'm using a 2080 Super, with a Compute Capability of 7.5, while Marlin requires >=8.0

```
nvidia-smi
Fri Jan 17 07:46:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 ...    On  |   00000000:08:00.0  On |                  N/A |
|  0%   50C    P8              8W /  250W |     538MiB /   8192MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2502      G   /usr/lib/xorg/Xorg                            167MiB |
|    0   N/A  N/A      2808      G   /usr/bin/gnome-shell                           66MiB |
|    0   N/A  N/A      3869      G   ...irefox/5561/usr/lib/firefox/firefox        204MiB |
|    0   N/A  N/A      5590      G   ...erProcess --variations-seed-version         12MiB |
|    0   N/A  N/A      5985      G   ...nglingPtr --variations-seed-version         34MiB |
|    0   N/A  N/A     10966      G   ...erProcess --variations-seed-version         46MiB |
+-----------------------------------------------------------------------------------------+
```

**Software Info**

Operation System/Version + Python Version
Ubuntu 22.04, Python 3.11.10

```
pip show gptqmodel torch transformers accelerate triton
Name: gptqmodel
Version: 1.7.1.dev0  # Built from source
...
---
Name: torch
Version: 2.5.1
...
---
Name: transformers
Version: 4.49.0.dev0 # Built from source
...
---
Name: accelerate
Version: 1.2.1
...
---
Name: triton
Version: 3.1.0
...
```

**Expected behavior**

Hopefully this package checks before using Marlin as backend, or provides some way to specify choice of backend while loading GPTQ models via transformers.
AutoGPTQ has such check: [Marlin kernel can be built against any compute capability by fxmarty · Pull Request #540 · AutoGPTQ/AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ/pull/540/files)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] `Marlin` kernel incorrectly selected in `backend.AUTO` code path #1092

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Marlin kernel incorrectly selected in backend.AUTO code path #1092

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] `Marlin` kernel incorrectly selected in `backend.AUTO` code path #1092