Skip to content
Merged

Eora #1302

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
381 commits
Select commit Hold shift + click to select a range
dba585e
fix override
Qubitium Feb 13, 2025
38880f4
simplify
Qubitium Feb 13, 2025
437c939
fix missing `modules` item
Qubitium Feb 13, 2025
9321b5b
breaking: fix module.state update
Qubitium Feb 13, 2025
5556f87
fix state should contain both W and WQ
Qubitium Feb 13, 2025
879b464
fix no super() for class obj
CSY-ModelCloud Feb 14, 2025
47840e4
remove get attr
CSY-ModelCloud Feb 14, 2025
89bf739
call LoopProcessor.post_process()
ZX-ModelCloud Feb 14, 2025
0930b52
Merge remote-tracking branch 'origin/eora' into eora
ZX-ModelCloud Feb 14, 2025
d01b6fb
call processor.finalize
Qubitium Feb 14, 2025
e8ede3a
Correctly call methods from self.gptq_model
ZX-ModelCloud Feb 14, 2025
07f2918
Merge remote-tracking branch 'origin/eora' into eora
ZX-ModelCloud Feb 14, 2025
ed7496d
rename to calibration_data
Qubitium Feb 14, 2025
503b753
cleanup pack()..no need to clone weights..use T instead of t()
Qubitium Feb 14, 2025
238b2d3
LoopProcessor add model_finalize()
ZX-ModelCloud Feb 14, 2025
51a4945
Merge remote-tracking branch 'origin/eora' into eora
ZX-ModelCloud Feb 14, 2025
aa59e4b
cleanup pack()..rename var for clarity
Qubitium Feb 14, 2025
c322b95
pop wq from state
Qubitium Feb 14, 2025
74fd176
clean code..de-indent logic
Qubitium Feb 14, 2025
cf2fef1
add safety code to store original in/out features of W in NamedModule…
Qubitium Feb 14, 2025
9d0273c
add stats() api and stats fields to processor
Qubitium Feb 14, 2025
e38c9ed
ruff
Qubitium Feb 14, 2025
fb42630
Fix circular import
ZX-ModelCloud Feb 14, 2025
17ee762
add license
Qubitium Feb 14, 2025
b9d2f63
Merge remote-tracking branch 'origin/eora' into eora
ZX-ModelCloud Feb 14, 2025
f4221c1
Merge remote-tracking branch 'origin/eora' into eora
ZX-ModelCloud Feb 14, 2025
8bbdf47
add clearml back
CSY-ModelCloud Feb 14, 2025
4d98b3b
fix NamedModule.__getattr__() error
ZX-ModelCloud Feb 14, 2025
9872e7f
add `require_fwd` property to processor
Qubitium Feb 14, 2025
5db8f02
simplify
Qubitium Feb 14, 2025
d4c0688
fix canot set weight.data to None
Qubitium Feb 14, 2025
19d7be5
fix the error that tasks is empty
ZX-ModelCloud Feb 14, 2025
4e897a8
add todo
Qubitium Feb 14, 2025
fc4733c
fix parameter position & name
CSY-ModelCloud Feb 14, 2025
0b1dfcf
fix import
CSY-ModelCloud Feb 14, 2025
bbaadf8
fix named module override
Qubitium Feb 14, 2025
cc32b9d
fix __dict__ name error
ZX-ModelCloud Feb 14, 2025
93c0608
fix module type error
ZX-ModelCloud Feb 14, 2025
208d9c7
fix layer_inputs index out of range
ZX-ModelCloud Feb 14, 2025
4cac3d5
rename
Qubitium Feb 14, 2025
a38a029
add lm_head quantize config
ZX-ModelCloud Feb 14, 2025
9d35bf8
pop `w` at submodule finalize
Qubitium Feb 14, 2025
f479764
simplify...quantize should only be called once
Qubitium Feb 14, 2025
f216137
release quantizer for module on post_process
Qubitium Feb 14, 2025
d68933d
cleanup
ZX-ModelCloud Feb 14, 2025
3c6aef5
refractor
Qubitium Feb 14, 2025
b7a9f1d
cleanup
Qubitium Feb 14, 2025
99916ba
fix circular import
ZX-ModelCloud Feb 14, 2025
897bc25
refractor quantize() args and override
Qubitium Feb 14, 2025
aa0851d
Fix GPTQProcessor log
ZX-ModelCloud Feb 14, 2025
12a1c0d
fix wrong damp_percent returned
Qubitium Feb 14, 2025
9ae8647
return log
ZX-ModelCloud Feb 14, 2025
fa45299
fix hf api compat
CSY-ModelCloud Feb 14, 2025
febadab
use const, not str
Qubitium Feb 14, 2025
7846b15
rename to `finalize`
Qubitium Feb 14, 2025
e04a2b0
fix import
CSY-ModelCloud Feb 14, 2025
0a85e01
rename quantize() to quantize_old()
ZX-ModelCloud Feb 14, 2025
b52c782
fix import
CSY-ModelCloud Feb 14, 2025
7302e15
If calibration_dataset is None or Empty, the input_cache of the previ…
ZX-ModelCloud Feb 14, 2025
20648b5
add fixme for hf api compat of fasterquant
CSY-ModelCloud Feb 14, 2025
50596ec
add EoraConfig
ZX-ModelCloud Feb 14, 2025
b374e85
remove .module
CSY-ModelCloud Feb 14, 2025
f1453ca
add eora processor
Qubitium Feb 14, 2025
7a785c2
fix misc
Qubitium Feb 14, 2025
6cad64b
fix misc
Qubitium Feb 14, 2025
49f74a6
fix isinstance can't check subclass
CSY-ModelCloud Feb 14, 2025
4dff173
fix lora config storage
Qubitium Feb 14, 2025
d438c36
cleanup
ZX-ModelCloud Feb 14, 2025
12e6b63
change name to class method
Qubitium Feb 14, 2025
6675caa
cleanup
ZX-ModelCloud Feb 14, 2025
935cc91
format
Qubitium Feb 14, 2025
ae21520
fix adapter.name() should be classmethod
Qubitium Feb 14, 2025
dc2773b
fix eora logging
Qubitium Feb 14, 2025
8a6042e
move all eora test code into eora_test (pending removal)
Qubitium Feb 14, 2025
c269e87
move eora algorithm to nvidia licensed eora file
Qubitium Feb 15, 2025
5a97ad5
remove unused
Qubitium Feb 15, 2025
4b5348c
fix hf api compat for quantize()
CSY-ModelCloud Feb 15, 2025
8541388
use EoraProcessor()
ZX-ModelCloud Feb 15, 2025
88a61cb
fix processor.num_batches setting
ZX-ModelCloud Feb 15, 2025
c4fac1e
async move wq to cpu
Qubitium Feb 15, 2025
dd7560d
fix not a python package
CSY-ModelCloud Feb 15, 2025
d750484
fix exllama was not compiled
CSY-ModelCloud Feb 15, 2025
35ca144
add async move for gptq processor
Qubitium Feb 15, 2025
37183d7
move prepare_dataset() to LoopProcessor
ZX-ModelCloud Feb 15, 2025
dad0c68
add release_calibration_dataset()
ZX-ModelCloud Feb 15, 2025
faa501d
update error for lm_head and model with tied_weights=True
Qubitium Feb 15, 2025
149d364
consolidate dynamic skipped logic
Qubitium Feb 15, 2025
a3371ae
Fix eigen_scaling_diag_matrix not initialized
ZX-ModelCloud Feb 15, 2025
0f59410
Fix subset repeated quantization
ZX-ModelCloud Feb 15, 2025
4ea26e8
add processed_subset
ZX-ModelCloud Feb 15, 2025
0a2bee6
Fix the error that the type of wq obtained is tuple
ZX-ModelCloud Feb 15, 2025
5de0644
fix weight.data should not be moved to cpu for process code
Qubitium Feb 15, 2025
0631f96
del and overwrite is the same for gc
Qubitium Feb 15, 2025
e6372c1
Fix layer_inputs where the last layer is emtpy
ZX-ModelCloud Feb 15, 2025
fc3ef54
cleanup
Qubitium Feb 15, 2025
f427020
use Lora.name() class method for mapping
Qubitium Feb 15, 2025
f6bb765
fix adapter save and load
ZX-ModelCloud Feb 15, 2025
d5972e4
move `quant_result` from gptq_process to base loop_process as `_results`
Qubitium Feb 15, 2025
47ba3d7
add `stream: bool` toggle in `move_to` r Tensors type only
Qubitium Feb 15, 2025
c089851
format
Qubitium Feb 15, 2025
72298d8
compat: make sure lora key can found for all HF AutoModel api
Qubitium Feb 15, 2025
f9fa9f1
save eora and test
Qubitium Feb 15, 2025
6ba2737
fix streaming
Qubitium Feb 15, 2025
370716a
fix compat loading for hf names
Qubitium Feb 15, 2025
03a0c22
fix BitBLASQuantLinear's adapter argument error
ZX-ModelCloud Feb 16, 2025
3d34f87
fix ugly mess in lm_eval integration, vars mismatch, type mis-match
Qubitium Feb 16, 2025
cece581
remove util.eval calls.. always use GPTQModel.eval()
Qubitium Feb 16, 2025
e47c48e
rename eval backend to llm_backend and add real gptqmodel specific ba…
Qubitium Feb 16, 2025
e09c389
add gen_kwargs
CSY-ModelCloud Feb 16, 2025
a49cfbb
use ellama v2 for lm-eval and use acc_norm only
Qubitium Feb 16, 2025
f428286
use ellama v2 for lm-eval and use acc_norm only
Qubitium Feb 16, 2025
4e67c13
fix ci test
Qubitium Feb 16, 2025
b865851
comment out special kernels
Qubitium Feb 16, 2025
0e10440
fix Lora.apply() error when batched generate
ZX-ModelCloud Feb 16, 2025
0381c6f
fix compile
Qubitium Feb 16, 2025
763e409
cleanup
ZX-ModelCloud Feb 16, 2025
7efa1f1
fix `generate()` not applying correct pad_token_id from tokenizer
Qubitium Feb 16, 2025
d061d2d
protect against null (Optinoal) tokenizer
Qubitium Feb 16, 2025
03e8d01
cleanup compile
Qubitium Feb 16, 2025
27cf67f
cleanup
ZX-ModelCloud Feb 16, 2025
46502e5
fix cuda kernel
Qubitium Feb 16, 2025
a0deeef
disable eora kernels except for torch
Qubitium Feb 16, 2025
f506f76
add `adapter` control/override in `quantize()`
Qubitium Feb 16, 2025
5c694e1
remove quantize_config.eora_dataset property
Qubitium Feb 16, 2025
6ff16e3
patch evalplus to allow passing a model directly
CSY-ModelCloud Feb 16, 2025
3e7302c
change test to pass adapter on GPTQModel.load(). Since `adapter` conf…
Qubitium Feb 16, 2025
7bf0c46
Fix module.bias not being able to be assigned
ZX-ModelCloud Feb 16, 2025
e16e34d
comment
Qubitium Feb 16, 2025
c4419f3
print Adapter loaded post-init so user knows adapter is correctly loa…
Qubitium Feb 16, 2025
1dfacb6
fix evalplus oom
CSY-ModelCloud Feb 16, 2025
9406090
fix ci tests..random seed consolidated into one var
Qubitium Feb 16, 2025
7ce3fbc
fix ci tests
Qubitium Feb 16, 2025
22a3486
disable streaming and fix ci test
Qubitium Feb 16, 2025
83616bf
add base vs eora arc-challenge benchmarks to eora test
Qubitium Feb 16, 2025
11a60dc
fix module.compile overriding nn.module compile. rename to `g_compile`
Qubitium Feb 17, 2025
5d99ca7
cleanup
ZX-ModelCloud Feb 17, 2025
f851d9c
rename `g_compile` to `opimize`
Qubitium Feb 17, 2025
d58f518
cleanup
ZX-ModelCloud Feb 17, 2025
02e25b4
refactor eora_generate()
ZX-ModelCloud Feb 17, 2025
0c97aa4
fix argument error
ZX-ModelCloud Feb 17, 2025
68021ae
add `kernels()` api to use so which kernels have been loaded at end o…
Qubitium Feb 17, 2025
bf3edd3
add DequantizeProcessor
Qubitium Feb 17, 2025
98b61dc
add DequantizeProcessor
Qubitium Feb 17, 2025
e52ae7d
refractor add `retrain_w` option to GPTQProcessor
Qubitium Feb 17, 2025
145ecfb
cleanup
Qubitium Feb 17, 2025
e844f0f
comments
Qubitium Feb 17, 2025
c908654
cleanup
ZX-ModelCloud Feb 17, 2025
84f16f9
Fix Assignment Error
ZX-ModelCloud Feb 17, 2025
104f2ed
DequantizeProcessor does not perform any operations on dataset
ZX-ModelCloud Feb 17, 2025
d05ceb7
refractor: upcast w to float32 before delta calculation in case of bf…
Qubitium Feb 17, 2025
7750b6e
fix wrong assert (reversed)
Qubitium Feb 17, 2025
bd54c6f
cleanup
Qubitium Feb 17, 2025
2917d68
fix summary log
ZX-ModelCloud Feb 17, 2025
019820f
call eora_save()
ZX-ModelCloud Feb 17, 2025
34eb94c
fix argument name error
ZX-ModelCloud Feb 17, 2025
c2da02f
add code for assert eora weight
ZX-ModelCloud Feb 17, 2025
2ecc90c
cleanup
ZX-ModelCloud Feb 17, 2025
7f0e431
add test_eora_post_quant()
ZX-ModelCloud Feb 17, 2025
ce13122
clean up `test_quant_erao` so we have config at top and print config …
Qubitium Feb 17, 2025
aab3c6c
add test_eora_post_quant.py
ZX-ModelCloud Feb 17, 2025
3fdc0b2
default to group_size 128 for test. group_size 64 has strange regression
Qubitium Feb 17, 2025
ea9a9a5
rename
Qubitium Feb 17, 2025
c1f67f4
refractor api to `GPTQModel.adapter.generate`
Qubitium Feb 17, 2025
67d8482
cleanup
Qubitium Feb 17, 2025
43692af
cleanup
Qubitium Feb 17, 2025
9894b04
avoid converting to scalar via item() as torch.compile doesn't like it
Qubitium Feb 17, 2025
0ea863d
try to speed things for eora gen with compile
Qubitium Feb 17, 2025
a0cb206
increase cache and disable scalar captures
Qubitium Feb 17, 2025
b966ba6
use local model path
CSY-ModelCloud Feb 18, 2025
8a581a7
revert making adapter a module
Qubitium Feb 18, 2025
edf3056
use torch_compile helper instead torch.compile
Qubitium Feb 18, 2025
9b90b67
use torch_compile helper instead torch.compile
Qubitium Feb 18, 2025
b5d311d
move dequantize_weight() to PackableQuantLinear
ZX-ModelCloud Feb 18, 2025
f599394
bump intel_extension_for_pytorch to 2.6.0 & remove pack() for ipex & …
CSY-ModelCloud Feb 18, 2025
87ada81
Revert "move dequantize_weight() to PackableQuantLinear"
ZX-ModelCloud Feb 18, 2025
6eec4a5
merge main's eval() changes
CSY-ModelCloud Feb 18, 2025
ef39975
push `wf` and dequantize code into packable. refractor ipex to be bas…
Qubitium Feb 18, 2025
030e6e6
Merge branch 'main' into eora
CSY-ModelCloud Feb 18, 2025
32c5b3c
eora has been moved to eora-copy branch
CSY-ModelCloud Feb 18, 2025
fbbc1bb
fix test didn't pass any model
CSY-ModelCloud Feb 18, 2025
b66d82f
add register_buffers to init
CSY-ModelCloud Feb 18, 2025
9572f59
remove unused args
CSY-ModelCloud Feb 18, 2025
b199f5d
revert register_buffers changes
CSY-ModelCloud Feb 18, 2025
eb3d41e
revert deleting eora dir
CSY-ModelCloud Feb 18, 2025
4f96140
remove eora test code
Qubitium Feb 18, 2025
49fbef3
update eora license to apache and attribute nvidia/arxiv
Qubitium Feb 18, 2025
75c9582
Eora_main branch merge to Eora (#1301)
Qubitium Feb 19, 2025
0137749
remove unused eora kernel
Qubitium Feb 19, 2025
9e84aea
remove unused eora kernel
Qubitium Feb 19, 2025
db12235
Merge branch 'main' into eora
ZX-ModelCloud Feb 19, 2025
bfd9cc9
apply bias after eora adapter
Qubitium Feb 19, 2025
de392a7
add new bits test
CSY-ModelCloud Feb 19, 2025
4bf0d8b
revert bad commit. cannot use logic true/false on self.bias directly …
Qubitium Feb 19, 2025
5bc48f1
revert bad commit. cannot use logic true/false on self.bias directly …
Qubitium Feb 19, 2025
c42b720
not do pad
ZX-ModelCloud Feb 19, 2025
0f69938
fix var name not exists
CSY-ModelCloud Feb 19, 2025
95d0df4
missed pad code removal
Qubitium Feb 19, 2025
a0a1e53
removing padding code like torch kernel for triton
Qubitium Feb 19, 2025
82308af
fix var rename
Qubitium Feb 19, 2025
ae51d18
start deprecation of DynamicCuda kernel. Do not allow it to be auto-s…
Qubitium Feb 19, 2025
567bc1f
do not log too verbose json result on cli
Qubitium Feb 19, 2025
af93e5d
Fix `do_sample` config errors on load (also fixed config save)
Qubitium Feb 20, 2025
26ec28c
log only class simple name
Qubitium Feb 20, 2025
07fa973
fix old transformer compat
Qubitium Feb 20, 2025
80332b3
fix vllm doesn't have can_generate
CSY-ModelCloud Feb 20, 2025
d2e1884
refract: hf auto config fix
Qubitium Feb 20, 2025
e7bb8a8
log txt changes
Qubitium Feb 20, 2025
a13e17d
disable auto-padding in exllama kernels
Qubitium Feb 20, 2025
8d81280
falcon is merged into HF, does not need trust_remote=True
Qubitium Feb 20, 2025
0259449
fix deepseek2-lite ci test, add `layer_modules_strict: bool` control …
Qubitium Feb 20, 2025
9ba6ae5
fix deepseek v2-lite again: do not process already processed module
Qubitium Feb 20, 2025
227c9b8
merge deepseek v2 possible layer_modules into single def
Qubitium Feb 20, 2025
21a51ad
revert partil looper change now that deepseek v2 layer_modules are me…
Qubitium Feb 20, 2025
ddd1fb3
set default data size to 256
CSY-ModelCloud Feb 20, 2025
73ca45a
fix self.in_features was not set
CSY-ModelCloud Feb 20, 2025
aee67f2
[CI] use latest CI docker image
CSY-ModelCloud Feb 20, 2025
4ee98ed
[CI] install colorlog
CSY-ModelCloud Feb 20, 2025
ba42f30
Correctly use torch.no_grad() to avoid OOM when quantize VL Model
ZX-ModelCloud Feb 20, 2025
e67aec1
fix vllm doesn't have named_children()
CSY-ModelCloud Feb 20, 2025
9d55f56
[CI] pass exclusive for gpu service
CSY-ModelCloud Feb 20, 2025
b5ac4e6
revert module check for vllm
CSY-ModelCloud Feb 20, 2025
6b52116
if model is not a nn.Module, skip finding
CSY-ModelCloud Feb 20, 2025
f90eb14
fix checking
CSY-ModelCloud Feb 20, 2025
ecb9c53
fix env must be before torch imports
Qubitium Feb 20, 2025
55ce173
move PYTORCH_ENABLE_MPS_FALLBACK to top
CSY-ModelCloud Feb 20, 2025
a048815
ovis model require transformers<=4.48.3
ZX-ModelCloud Feb 20, 2025
d04a9a3
print expected value
CSY-ModelCloud Feb 20, 2025
b470f9a
[CI] fix names
CSY-ModelCloud Feb 20, 2025
36d4a13
[CI] fix xpu env reinstalled torch
CSY-ModelCloud Feb 20, 2025
b5e4820
torch kernel will enable compile optimizations by default for torch 2…
Qubitium Feb 20, 2025
fc0c518
fix transformers compat
Qubitium Feb 20, 2025
d709924
disable exllama kernel from quantization (remove from packable)
Qubitium Feb 20, 2025
96ca366
fix evalplus try toString a Decoder
CSY-ModelCloud Feb 20, 2025
ac7596e
replace subprocess run by raising an error
CSY-ModelCloud Feb 20, 2025
f5ec991
fix ci test_dynamic scores
Qubitium Feb 20, 2025
d27422b
cleanup eora test
Qubitium Feb 20, 2025
59eeca5
fix sglang' transformers error
CSY-ModelCloud Feb 20, 2025
65969b3
OVIS is compatible with transformers v4.49.0
ZX-ModelCloud Feb 20, 2025
9a3b6fc
move ipex to new test files
CSY-ModelCloud Feb 20, 2025
13d7f43
Update ovis.py
Qubitium Feb 20, 2025
6f4e35d
decrease batch to 16
CSY-ModelCloud Feb 20, 2025
d00412f
Merge branch 'main' into eora
Qubitium Feb 20, 2025
94ff1b7
format
Qubitium Feb 20, 2025
83ba0ca
logs
Qubitium Feb 20, 2025
762cf4e
fix ci lora config test
Qubitium Feb 20, 2025
e52c356
fix ci: dynamic
Qubitium Feb 21, 2025
2b30708
fix ci: opt expects exllama when triton is used for quant
Qubitium Feb 21, 2025
d36a645
fix ci: transformers test oom
Qubitium Feb 21, 2025
5d2e5c0
Add some comments to eora.py
nbasyl Feb 21, 2025
406037c
add comments to eora.py
nbasyl Feb 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 61 additions & 31 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,7 @@ env:
PYTORCH_CUDA_ALLOC_CONF: 'expandable_segments:True'
MAX_JOBS: 8
RUNNER: 10.0.13.31
TRANSFORMERS_DIFF_TESTS: "models/test_internlm.py,models/test_internlm2_5.py,models/test_xverse.py"
TORCH_2_5_TESTS: "test_evalplus.py,test_perplexity.py,test_q4_ipex.py,test_ipex_xpu.py,test_save_loaded_quantized_model.py,test_quant_formats.py,models/test_hymba.py"
LEGACY_TESTS: "models/test_internlm.py,models/test_internlm2_5.py,models/test_xverse.py"
IGNORED_TEST_FILES: "test_tgi.py,test_gptneox.py,models/test_mixtral.py,models/test_phi_3_moe.py"
GPTQMODEL_FORCE_BUILD: 1
repo: ${{ github.event.inputs.repo || github.repository }}
Expand Down Expand Up @@ -139,15 +138,15 @@ jobs:
import os
import re

TRANSFORMERS_DIFF_TESTS = '${TRANSFORMERS_DIFF_TESTS}'
LEGACY_TESTS = '${LEGACY_TESTS}'
IGNORED_TEST_FILES = '${IGNORED_TEST_FILES}'

TEST_NAMES='${{ github.event.inputs.test_names }}'
TEST_REGEX='${{ github.event.inputs.test_regex }}'

input_test_files_list = [f.strip().removesuffix('.py') for f in TEST_NAMES.split(',') if f.strip()]

transformers_test_files = [f.strip().removesuffix('.py') for f in f'{TRANSFORMERS_DIFF_TESTS}'.split(',') if f.strip()]
transformers_test_files = [f.strip().removesuffix('.py') for f in f'{LEGACY_TESTS}'.split(',') if f.strip()]
transformers_test_files = [f for f in transformers_test_files if not input_test_files_list or f in input_test_files_list]

all_tests = [f.removesuffix('.py') for f in os.listdir('tests/') if f.startswith('test_') and f.endswith('.py') and f.strip().removesuffix('py') not in f'{IGNORED_TEST_FILES}']
Expand Down Expand Up @@ -190,8 +189,8 @@ jobs:

echo "Conditions:"
echo "will build run: ${{ github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.torch-files != '[]' && needs.list-test-files.outputs.transformers-files != '[]' && !(needs.list-test-files.outputs.m4-files == '[]' && needs.list-test-files.outputs.m4-files == '[]') }}"
echo "will transformers_diff run: ${{ (needs.build.result == 'success' || github.event.inputs.artifact_id != '') && github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.transformers-files != '[]' }}"
echo "will torch2_5 run: ${{ (needs.build.result == 'success' || github.event.inputs.artifact_id != '') && github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.torch-files != '[]' }}"
echo "will legacy run: ${{ (needs.build.result == 'success' || github.event.inputs.artifact_id != '') && github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.transformers-files != '[]' }}"
echo "will torch run: ${{ (needs.build.result == 'success' || github.event.inputs.artifact_id != '') && github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.torch-files != '[]' }}"
echo "will m4 run: ${{ (github.event.inputs.test_names == '' || contains(github.event.inputs.test_names, 'apple') || contains(github.event.inputs.test_names, 'mlx') ) && (needs.list-test-files.outputs.m4-files != '' || needs.list-test-files.outputs.m4-files != '[]') }}"

build:
Expand All @@ -201,7 +200,13 @@ jobs:
- list-test-files
if: github.event.inputs.m4-only != 'true' && (needs.list-test-files.outputs.torch-files != '[]' || needs.list-test-files.outputs.transformers-files != '[]')
container:
image: ${{ needs.check-vm.outputs.ip }}:5000/modelcloud/gptqmodel:github-ci-v5
image: ${{ needs.check-vm.outputs.ip }}:5000/modelcloud/gptqmodel:github-ci-v7
options: --device /dev/dri --ipc=host --runtime=nvidia --gpus all
volumes:
- /dev/dri/by-path:/dev/dri/by-path
- /home/ci/models:/monster/data/model
- /home/ci/models/huggingface:/github/home/.cache/huggingface

steps:
- name: Checkout Codes
uses: actions/checkout@v4
Expand Down Expand Up @@ -286,15 +291,15 @@ jobs:
if: always()
run: pip cache purge && uv cache clean && rm -rf ./* ./.*

transformers_diff:
legacy:
needs:
- build
- list-test-files
- check-vm
runs-on: [ self-hosted, xeon5 ]
if: always() && !cancelled() && (needs.build.result == 'success' || github.event.inputs.artifact_id != '') && github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.transformers-files != '[]'
container:
image: ${{ needs.check-vm.outputs.ip }}:5000/modelcloud/gptqmodel:github-ci-v5
image: ${{ needs.check-vm.outputs.ip }}:5000/modelcloud/gptqmodel:github-ci-v7
volumes:
- /home/ci/models:/monster/data/model
- /home/ci/models/huggingface:/github/home/.cache/huggingface
Expand Down Expand Up @@ -383,7 +388,7 @@ jobs:

- name: Install wheel
run: |
uv pip install git+https://github.com/ModelCloud/Tokenicer -U
uv pip install colorlog git+https://github.com/ModelCloud/Tokenicer -U
echo "===== install optimum bitblas parameterized uvicorn ====="
uv pip install optimum bitblas==0.0.1.dev13 parameterized uvicorn -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
echo "===== install dist/whl ====="
Expand All @@ -407,10 +412,10 @@ jobs:
gpu_id=-1

while [ "$gpu_id" -lt 0 ]; do
gpu_id=$(curl -s "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}")
gpu_id=$(curl -s "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}&exclusive=${{ github.event.inputs.exclusive-gpu }}")

if [ "$gpu_id" -lt 0 ]; then
echo "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME} returned $gpu_id"
echo "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}&exclusive=${{ github.event.inputs.exclusive-gpu }} returned $gpu_id"
echo "No available GPU, waiting 5 seconds..."
sleep 5
else
Expand Down Expand Up @@ -441,15 +446,15 @@ jobs:
if: always()
run: pip cache purge && uv cache clean && rm -rf ./* ./.*

torch2_5:
torch:
needs:
- build
- list-test-files
- check-vm
runs-on: [ self-hosted, xeon5 ]
if: always() && !cancelled() && (needs.build.result == 'success' || github.event.inputs.artifact_id != '') && github.event.inputs.m4-only != 'true' && needs.list-test-files.outputs.torch-files != '[]'
container:
image: ${{ needs.check-vm.outputs.ip }}:5000/modelcloud/gptqmodel:github-ci-v5
image: ${{ needs.check-vm.outputs.ip }}:5000/modelcloud/gptqmodel:github-ci-v7
options: --device /dev/dri --ipc=host --runtime=nvidia --gpus all
volumes:
- /dev/dri/by-path:/dev/dri/by-path
Expand Down Expand Up @@ -541,52 +546,75 @@ jobs:

- name: Install wheel
run: |
if [ "${{ matrix.test_script }}" == "test_quant_formats" ] || [ "${{ matrix.test_script }}" == "test_perplexity" ]; then
echo "===== install auto_round ====="
uv pip install auto_round -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
uv pip install -U transformers colorlog
if [ "${{ matrix.test_script }}" == "test_quant_formats" ] || [ "${{ matrix.test_script }}" == "test_perplexity" ] || [ "${{ matrix.test_script }}" == "test_q4_bitblas" ]; then
echo "===== install auto_round bitblas==0.0.1.dev13 ====="
uv pip install auto_round bitblas==0.0.1.dev13 -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
fi

if [ "${{ matrix.test_script }}" == "models/test_cohere2" ] || [ "${{ matrix.test_script }}" == "models/test_gemma" ]; then
echo "===== install transformers from git ====="
uv pip install -U git+https://github.com/huggingface/transformers.git -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
uv pip install -U transformers -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
fi

if [[ "${{ matrix.test_script }}" == *xpu* ]]; then
echo "===== switching to xpu env ====="
source /etc/profile.d/pyenv.sh && pyenv activate xpu
uv pip install colorlog
fi

if [[ "${{ matrix.test_script }}" == "test_sglang.py" ]]; then
uv pip install transformers==4.48.3
fi

if [[ "${{ matrix.test_script }}" == *ipex* ]] && [[ "${{ matrix.test_script }}" != *xpu* ]]; then
uv pip uninstall torchvision torch flash_attn # fix ipex can't be used with torch+cu126
uv pip install torchvision torch
uv pip install -U intel_extension_for_pytorch -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
fi

if [[ "${{ matrix.test_script }}" == *"mlx"* ]]; then
uv pip install mlx_lm --no-build-isolation -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
fi

if [[ "${{ matrix.test_script }}" == "test_modelscope" ]]; then
echo "===== installing modelscope ====="
uv pip install modelscope --no-build-isolation -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
fi

echo "===== install dist/whl ====="
uv pip install git+https://github.com/ModelCloud/Tokenicer -U
uv pip install dist/*.whl -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple

# ipex doesn't need to compile kernels. xpu can't install cuda package
if [[ "${{ matrix.test_script }}" != *ipex* && "${{ matrix.test_script }}" != *xpu* ]]; then
echo "===== install dist/whl ====="
uv pip install dist/*.whl -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }} --extra-index-url https://pypi.org/simple
else
echo "===== install with local files for xpu env ====="
export CUDA_VISIBLE_DEVICES=""
unset TORCH_CUDA_ARCH_LIST
uv pip install . --no-build-isolation
fi

if [ "${{ matrix.test_script }}" == "test_transformers" ]; then
echo "===== install optimum from git ====="
uv pip install -U git+https://github.com/huggingface/optimum.git -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }}
echo "===== install transformers from git ====="
uv pip install -U git+https://github.com/huggingface/transformers.git -i http://${{ needs.check-vm.outputs.ip }}/simple/ --trusted-host ${{ needs.check-vm.outputs.ip }}
uv pip install torch==2.5.1 # fix optimum will install torch 2.6.0
fi

if [[ "${{ matrix.test_script }}" == "test_sglang" ]]; then
uv pip install numpy==1.26.3
fi

- name: Find suitable GPU
if: ${{ !contains(matrix.test_script, 'ipex') && !cancelled() }}
if: ${{ !contains(matrix.test_script, 'ipex') && !contains(matrix.test_script, 'xpu') && !cancelled() }}
run: |
timestamp=$(date +%s%3N)
gpu_id=-1

while [ "$gpu_id" -lt 0 ]; do
gpu_id=$(curl -s "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}")
gpu_id=$(curl -s "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}&exclusive=${{ github.event.inputs.exclusive-gpu }}")

if [ "$gpu_id" -lt 0 ]; then
echo "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME} returned $gpu_id"
echo "http://${{ needs.check-vm.outputs.ip }}/gpu/get?id=${{ github.run_id }}&timestamp=$timestamp&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}&exclusive=${{ github.event.inputs.exclusive-gpu }} returned $gpu_id"
echo "No available GPU, waiting 5 seconds..."
sleep 5
else
Expand Down Expand Up @@ -617,21 +645,23 @@ jobs:
curl "http://${{ needs.check-vm.outputs.ip }}/gpu/log_test_vram?id=${{ github.run_id }}&gpu=${{ env.CUDA_VISIBLE_DEVICES }}&range=$execution_time&unit=second&test=${{ matrix.test_script }}"

- name: Release GPU
if: always() && !contains(matrix.test_script, 'ipex')
if: always() && !contains(matrix.test_script, 'ipex') && !contains(matrix.test_script, 'xpu')
run: curl -X GET "http://${{ needs.check-vm.outputs.ip }}/gpu/release?id=${{ github.run_id }}&gpu=${{ env.CUDA_VISIBLE_DEVICES }}&timestamp=${{ env.STEP_TIMESTAMP }}&test=${{ matrix.test_script }}&runner=${RUNNER_NAME}"

- name: Clean cache
if: always()
run: pip cache purge && uv cache clean && rm -rf ./* ./.*
run: |
rm ~/.cache/evalplus/*pkl || true
pip cache purge && uv cache clean && rm -rf ./* ./.*

show-statistics:
runs-on: [ self-hosted, xeon5 ]
if: github.event.inputs.exclusive-gpu != 'true'
container:
image: modelcloud/gptqmodel:alpine-ci-v1
needs:
- transformers_diff
- torch2_5
- legacy
- torch
steps:
- name: Print statistics
run: curl "http://10.0.14.248/gpu/get_vram_logs?id=${{ github.run_id }}"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

## News
* 02/12/2025 [1.9.0](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.9.0): ⚡ Offload `tokenizer` fixes to [Toke(n)icer](https://github.com/modelcloud/tokenicer) pkg. Optimized `lm_head` quant time and vram usage.
Optimized `DeekSeek v3/R1` model quant vram usage. Fixed `Optimum` compat regresion in `v1.8.1`. 3x speed-up for `Torch` kernel when using Pytorch >= 2.5.0 with `model.compile()`. New `calibration_dataset_concat_size` option to enable calibration data `concat` mode to mimic original GPTQ data packing strategy which may improve quant speed and accuracy for datasets like `wikitext2`.
Optimized `DeekSeek v3/R1` model quant vram usage. Fixed `Optimum` compat regresion in `v1.8.1`. 3x speed-up for `Torch` kernel when using Pytorch >= 2.5.0 with `model.optimize()`. New `calibration_dataset_concat_size` option to enable calibration data `concat` mode to mimic original GPTQ data packing strategy which may improve quant speed and accuracy for datasets like `wikitext2`.
* 02/08/2025 [1.8.1](https://github.com/ModelCloud/GPTQModel/releases/tag/v1.8.1): ⚡ `DeekSeek v3/R1` model support. New flexible weight `packing`: allow quantized weights to be packed to `[int32, int16, int8]` dtypes.
`Triton` and `Torch` kernels supports full range of new `QuantizeConfig.pack_dtype`.
New `auto_gc: bool` control in `quantize()` which can reduce quantization time for small model with no chance of oom.
Expand Down
6 changes: 3 additions & 3 deletions examples/benchmark/generation_speed.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,8 @@ def load_model_tokenizer(
def benchmark_generation_speed(model, tokenizer, examples, generation_config):
generation_time_list = []
num_generated_tokens_list = []
progress_bar = ProgressBar(examples)
for example in progress_bar:
pb = ProgressBar(examples)
for example in pb:
input_ids = example["input_ids"].to(model.device)

start = time.time()
Expand All @@ -217,7 +217,7 @@ def benchmark_generation_speed(model, tokenizer, examples, generation_config):
)
num_generated_tokens_list.append(num_generated_tokens)

progress_bar.set_postfix(
pb.set_postfix(
num_tokens=num_generated_tokens_list[-1],
time=generation_time_list[-1],
speed=f"{num_generated_tokens_list[-1] / generation_time_list[-1]:.3f} tokens/s",
Expand Down
2 changes: 1 addition & 1 deletion format/format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
cd "$(dirname "$0")" || exit

# force ruff/isort to be same version as setup.py
pip install -U ruff==0.9.5 isort==6.0.0
pip install -U gptqmodel["quality"]

ruff check ../gptqmodel/models ../gptqmodel/nn_modules ../gptqmodel/quantization ../gptqmodel/utils ../gptqmodel/__init__.py ../examples ../tests ../setup.py --fix --unsafe-fixes
ruff_status=$?
Expand Down
3 changes: 2 additions & 1 deletion gptqmodel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,14 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import os

from .models import GPTQModel, get_best_device
from .quantization import BaseQuantizeConfig, QuantizeConfig
from .utils import BACKEND
from .utils.exllama import exllama_set_max_input_length
from .version import __version__

import os
if os.getenv('GPTQMODEL_USE_MODELSCOPE', 'False').lower() in ['true', '1']:
try:
from modelscope.utils.hf_util.patcher import patch_hub
Expand Down
Empty file added gptqmodel/adapter/__init__.py
Empty file.
Loading
Loading