-
Notifications
You must be signed in to change notification settings - Fork 124
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When using older GPUs not supported by Marlin, Marlin still gets chosen as backend, causing error like this:
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
hf_quantizer.postprocess_model(model, config=config)
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/quantizers/base.py", line 207, in postprocess_model
return self._process_model_after_weight_loading(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/transformers/quantizers/quantizer_gptq.py", line 107, in _process_model_after_weight_loading
model = self.optimum_quantizer.post_init_model(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 738, in post_init_model
model = gptq_post_init(model, use_act_order=self.desc_act)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/utils/model.py", line 494, in hf_gptqmodel_post_init
return gptqmodel_post_init(model, use_act_order, quantize_config, max_input_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/utils/model.py", line 614, in gptqmodel_post_init
submodule.post_init()
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 339, in post_init
replace_tensor(self, "qweight", marlin_qweight)
File "/home/hao/.pyenv/versions/3.11.10/envs/ai-fr/lib/python3.11/site-packages/gptqmodel/nn_modules/qlinear/marlin.py", line 95, in replace_tensor
getattr(layer, name).copy_(new_t)
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.`
After I commented off this line, it works fine, as Marlin is no longer used.
GPU Info
I'm using a 2080 Super, with a Compute Capability of 7.5, while Marlin requires >=8.0
nvidia-smi
Fri Jan 17 07:46:00 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 ... On | 00000000:08:00.0 On | N/A |
| 0% 50C P8 8W / 250W | 538MiB / 8192MiB | 5% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2502 G /usr/lib/xorg/Xorg 167MiB |
| 0 N/A N/A 2808 G /usr/bin/gnome-shell 66MiB |
| 0 N/A N/A 3869 G ...irefox/5561/usr/lib/firefox/firefox 204MiB |
| 0 N/A N/A 5590 G ...erProcess --variations-seed-version 12MiB |
| 0 N/A N/A 5985 G ...nglingPtr --variations-seed-version 34MiB |
| 0 N/A N/A 10966 G ...erProcess --variations-seed-version 46MiB |
+-----------------------------------------------------------------------------------------+
Software Info
Operation System/Version + Python Version
Ubuntu 22.04, Python 3.11.10
pip show gptqmodel torch transformers accelerate triton
Name: gptqmodel
Version: 1.7.1.dev0 # Built from source
...
---
Name: torch
Version: 2.5.1
...
---
Name: transformers
Version: 4.49.0.dev0 # Built from source
...
---
Name: accelerate
Version: 1.2.1
...
---
Name: triton
Version: 3.1.0
...
Expected behavior
Hopefully this package checks before using Marlin as backend, or provides some way to specify choice of backend while loading GPTQ models via transformers.
AutoGPTQ has such check: Marlin kernel can be built against any compute capability by fxmarty · Pull Request #540 · AutoGPTQ/AutoGPTQ
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working