Skip to content

[BUG] Marlin kernel incorrectly selected in backend.AUTO code path #1092

@chplushsieh

Description

@chplushsieh

Describe the bug

When using older GPUs not supported by Marlin, Marlin still gets chosen as backend, causing error like this:

    model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
    hf_quantizer.postprocess_model(model, config=config)
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/quantizers/base.py", line 207, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/quantizers/quantizer_gptq.py", line 107, in _process_model_after_weight_loading
    model = self.optimum_quantizer.post_init_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 738, in post_init_model
    model = gptq_post_init(model, use_act_order=self.desc_act)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/utils/model.py", line 494, in hf_gptqmodel_post_init
    return gptqmodel_post_init(model, use_act_order, quantize_config, max_input_length)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/utils/model.py", line 614, in gptqmodel_post_init
    submodule.post_init()
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 339, in post_init
    replace_tensor(self, "qweight", marlin_qweight)
  File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 95, in replace_tensor
    getattr(layer, name).copy_(new_t)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.`

After I commented off this line, it works fine, as Marlin is no longer used.

GPU Info

I'm using a 2080 Super, with a Compute Capability of 7.5, while Marlin requires >=8.0

nvidia-smi
Fri Jan 17 07:46:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 ...    On  |   00000000:08:00.0  On |                  N/A |
|  0%   50C    P8              8W /  250W |     538MiB /   8192MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2502      G   /usr/lib/xorg/Xorg                            167MiB |
|    0   N/A  N/A      2808      G   /usr/bin/gnome-shell                           66MiB |
|    0   N/A  N/A      3869      G   ...irefox/5561/usr/lib/firefox/firefox        204MiB |
|    0   N/A  N/A      5590      G   ...erProcess --variations-seed-version         12MiB |
|    0   N/A  N/A      5985      G   ...nglingPtr --variations-seed-version         34MiB |
|    0   N/A  N/A     10966      G   ...erProcess --variations-seed-version         46MiB |
+-----------------------------------------------------------------------------------------+

Software Info

Operation System/Version + Python Version
Ubuntu 22.04, Python 3.11.10

pip show gptqmodel torch transformers accelerate triton
Name: gptqmodel
Version: 1.7.1.dev0  # Built from source
...
---
Name: torch
Version: 2.5.1
...
---
Name: transformers
Version: 4.49.0.dev0 # Built from source
...
---
Name: accelerate
Version: 1.2.1
...
---
Name: triton
Version: 3.1.0
...

Expected behavior

Hopefully this package checks before using Marlin as backend, or provides some way to specify choice of backend while loading GPTQ models via transformers.
AutoGPTQ has such check: Marlin kernel can be built against any compute capability by fxmarty · Pull Request #540 · AutoGPTQ/AutoGPTQ

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions