Skip to content

Conversation

@ZX-ModelCloud
Copy link
Collaborator

No description provided.

@ZX-ModelCloud ZX-ModelCloud changed the title Save model and config files with empty state dict [SAVE] Save model and config files with empty state dict Feb 18, 2025
@ZX-ModelCloud ZX-ModelCloud changed the title [SAVE] Save model and config files with empty state dict [SAVE] Save config files with empty state dict Feb 18, 2025
@Qubitium Qubitium merged commit 0f8269a into main Feb 18, 2025
3 checks passed
@Qubitium Qubitium deleted the zx_save_more_config_files branch February 18, 2025 10:33
Qubitium added a commit that referenced this pull request Feb 19, 2025
* fix type hint

* update warning msg

* update eora license to apache and attribute nvidia/arxiv

* remove early eora test files

* ipex doesn't need to pass  register_buffers to Torch

* refractor ipex

* refractor ipex2

* fix typo

* make ipex packable & add missing register_buffers

* cleanup ipex, add lora + bias check

* remove duplicated codes

* ignore two folders for pytest

* fix test lora. fix wrong tokenizer type

* compile adapter

* Fix `generation_config.json` not auto-saved (#1292)

* Fix `generation_config.json` not auto-saved

* Update writer.py

* update transformers 4.49.0

* [CI] update ci for requirements installation

* [CI] don't update intel_extension_for_pytorch for now

* [CI] remove ipex

* correct name backend to exllama_eora

* use hf save hack to fix config saves

* fix param name changed

* [SAVE] Save config files with empty state dict (#1293)

* Save model and config files with empty state dict

* cleanup

* cleanup

* print lora adapter loaded count vs total number of of quantized modules

* print lora adapter loaded count vs total number of of quantized modules

* fix wrong model.save

* Test GSM8K

* patch __repr__ for evalplus

* Save processor related config files. For example: preprocessor_config.json, chat_template.json (#1295)

* Fix adapter/eora for ipex kernel

* Fix eora for ipex/marlin

* Clean eora for exllama v1/v2

* fix shape does not match in Backend.Marlin

* add comment

* type hint use torch.dtype instead of torch.float32

* get _supports_flash_attn_2 from transformers

* fix prepare_dataset() error

* add color to logs

* fix ci: lm_head test

* fix pb and logging conflicting on output

* refractor logging/pb

* move wf_ buffer to post_init

* fix logger + pb compat

* rename pb.set_description to pb.info

* fix progressbar padding so cli ui width is stable

* add progressbar test

* fix progressbar display at close()/end

* todo fixme for pb

* fix pb display at end of iterable

* fix pb: reserve 1 char for cursor and remove external dependency

* fix pb: render end

* fix minicpm layer_modules error

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix sharded models were deleted

* fix wrong order of config save causing sharded tensors to be removed (#1297)

* fix wrong order of config save causing zero tensors

* add processor to config block

* check for ProcessorMixin before calling save

* sync with main..fix save

* clean logs

* [CI] install color log

* fix hf is doing config validation on save which cause model save failure

* [FIX] not pack when group_size=-1 (#1298)

* Fix skipping pack() when group_size = -1

* assert len(qModules) > 0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Qubitium-ModelCloud <[email protected]>

* disable eora kernel until validated

* [CI] clean evalplus cache

* [CI] fix colorlog for xpu

* fix merge error

* ruff

---------

Signed-off-by: ZX-ModelCloud <[email protected]>
Co-authored-by: CSY <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Qubitium added a commit that referenced this pull request Feb 21, 2025
* fix override

* simplify

* fix missing `modules` item

* breaking: fix module.state update

* fix state should contain both W and WQ

* fix no super() for class obj

* remove get attr

* call LoopProcessor.post_process()

Signed-off-by: ZX-ModelCloud <[email protected]>

* call processor.finalize

* Correctly call methods from self.gptq_model

Signed-off-by: ZX-ModelCloud <[email protected]>

* rename to calibration_data

* cleanup pack()..no need to clone weights..use T instead of t()

* LoopProcessor add model_finalize()

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup pack()..rename var for clarity

* pop wq from state

* clean code..de-indent logic

* add safety code to store original in/out features of W in NamedModule state since the weight will be heavily changed during quant

* add stats() api and stats fields to processor

* ruff

* Fix circular import

Signed-off-by: ZX-ModelCloud <[email protected]>

* add license

* add clearml back

* fix NamedModule.__getattr__() error

Signed-off-by: ZX-ModelCloud <[email protected]>

* add `require_fwd` property to processor

* simplify

* fix canot set weight.data to None

* fix the error that tasks is empty

Signed-off-by: ZX-ModelCloud <[email protected]>

* add todo

* fix parameter position & name

* fix import

* fix named module override

* fix __dict__ name error

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix module type error

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix layer_inputs index out of range

Signed-off-by: ZX-ModelCloud <[email protected]>

* rename

* add lm_head quantize config

Signed-off-by: ZX-ModelCloud <[email protected]>

* pop `w` at submodule finalize

* simplify...quantize should only be called once

* release quantizer for module on post_process

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* refractor

* cleanup

* fix circular import

Signed-off-by: ZX-ModelCloud <[email protected]>

* refractor quantize() args and override

* Fix GPTQProcessor log

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix wrong damp_percent returned

* return log

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix hf api compat

* use const, not str

* rename to `finalize`

* fix import

* rename quantize() to quantize_old()

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix import

* If calibration_dataset is None or Empty, the input_cache of the previous processor is used

Signed-off-by: ZX-ModelCloud <[email protected]>

* add fixme for hf api compat of fasterquant

* add EoraConfig

Signed-off-by: ZX-ModelCloud <[email protected]>

* remove .module

* add eora processor

* fix misc

* fix misc

* fix isinstance can't check subclass

* fix lora config storage

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* change name to class method

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* format

* fix adapter.name() should be classmethod

* fix eora logging

* move all eora test code into eora_test (pending removal)

* move eora algorithm to nvidia licensed eora file

* remove unused

* fix hf api compat for quantize()

* use EoraProcessor()

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix processor.num_batches setting

Signed-off-by: ZX-ModelCloud <[email protected]>

* async move wq to cpu

* fix not a python package

* fix exllama was not compiled

* add async move for gptq processor

* move prepare_dataset() to LoopProcessor

Signed-off-by: ZX-ModelCloud <[email protected]>

* add release_calibration_dataset()

Signed-off-by: ZX-ModelCloud <[email protected]>

* update error for lm_head and model with tied_weights=True

* consolidate dynamic skipped logic

* Fix eigen_scaling_diag_matrix not initialized

Signed-off-by: ZX-ModelCloud <[email protected]>

* Fix subset repeated quantization

Signed-off-by: ZX-ModelCloud <[email protected]>

* add processed_subset

Signed-off-by: ZX-ModelCloud <[email protected]>

* Fix the error that the type of wq obtained is tuple

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix weight.data should not be moved to cpu for process code

* del and overwrite is the same for gc

* Fix layer_inputs where the last layer is emtpy

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

* use Lora.name() class method for mapping

* fix adapter save and load

Signed-off-by: ZX-ModelCloud <[email protected]>

* move `quant_result` from gptq_process to base loop_process as `_results`

* add `stream: bool` toggle in `move_to` r Tensors type only

* format

* compat: make sure lora key can found for all HF AutoModel api

* save eora and test

* fix streaming

* fix compat loading for hf names

* fix BitBLASQuantLinear's adapter argument error

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix ugly mess in lm_eval integration, vars mismatch, type mis-match

* remove util.eval calls.. always use GPTQModel.eval()

* rename eval backend to llm_backend and add real gptqmodel specific backend var

* add gen_kwargs

* use ellama v2 for lm-eval and use acc_norm only

* use ellama v2 for lm-eval and use acc_norm only

* fix ci test

* comment out special kernels

* fix Lora.apply() error when batched generate

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix compile

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix `generate()` not applying correct pad_token_id from tokenizer

* protect against null (Optinoal) tokenizer

* cleanup compile

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix cuda kernel

* disable eora kernels except for torch

* add `adapter` control/override in `quantize()`

* remove quantize_config.eora_dataset property

* patch evalplus to allow passing a model directly

* change test to pass adapter on GPTQModel.load(). Since `adapter` config is not saved in model config.json and quantize_config.json, we need to always pass `adapter` to enable gptq/lora/eora

* Fix module.bias not being able to be assigned

Signed-off-by: ZX-ModelCloud <[email protected]>

* comment

* print Adapter loaded post-init so user knows adapter is correctly loaded from disk

* fix evalplus oom

* fix ci tests..random seed consolidated into one var

* fix ci tests

* disable streaming and fix ci test

* add base vs eora arc-challenge benchmarks to eora test

* fix module.compile overriding nn.module compile. rename to `g_compile`

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* rename `g_compile` to `opimize`

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* refactor eora_generate()

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix argument error

Signed-off-by: ZX-ModelCloud <[email protected]>

* add `kernels()` api to use so which kernels have been loaded at end of model load

* add DequantizeProcessor

* add DequantizeProcessor

* refractor add `retrain_w` option to GPTQProcessor

* cleanup

* comments

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* Fix Assignment Error

Signed-off-by: ZX-ModelCloud <[email protected]>

* DequantizeProcessor does not perform any operations on dataset

Signed-off-by: ZX-ModelCloud <[email protected]>

* refractor: upcast w to float32 before delta calculation in case of bfloat16 and float16 mismatch

* fix wrong assert (reversed)

* cleanup

* fix summary log

Signed-off-by: ZX-ModelCloud <[email protected]>

* call eora_save()

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix argument name error

Signed-off-by: ZX-ModelCloud <[email protected]>

* add code for assert eora weight

Signed-off-by: ZX-ModelCloud <[email protected]>

* cleanup

Signed-off-by: ZX-ModelCloud <[email protected]>

* add test_eora_post_quant()

Signed-off-by: ZX-ModelCloud <[email protected]>

* clean up `test_quant_erao` so we have config at top and print config before lm-eval results

# Conflicts:
#	tests/test_quant_and_eora.py

* add test_eora_post_quant.py

Signed-off-by: ZX-ModelCloud <[email protected]>

* default to group_size 128 for test. group_size 64 has strange regression

* rename

* refractor api to `GPTQModel.adapter.generate`

* cleanup

* cleanup

* avoid converting to scalar via item() as torch.compile doesn't like it

* try to speed things for eora gen with compile

* increase cache and disable scalar captures

* use local model path

* revert making adapter a module

* use torch_compile helper instead torch.compile

* use torch_compile helper instead torch.compile

* move dequantize_weight() to PackableQuantLinear

Signed-off-by: ZX-ModelCloud <[email protected]>

* bump intel_extension_for_pytorch to 2.6.0 & remove pack() for ipex & remove xpu check for fp16

* Revert "move dequantize_weight() to PackableQuantLinear"

This reverts commit b5d311d.

* merge main's eval() changes

* push `wf` and dequantize code into packable. refractor ipex to be based on torch kernel

# Conflicts:
#	gptqmodel/nn_modules/qlinear/ipex.py

* eora has been moved to eora-copy branch

* fix test didn't pass any model

* add register_buffers to init

* remove unused args

* revert register_buffers changes

* revert deleting eora dir

* remove eora test code

* update eora license to apache and attribute nvidia/arxiv

* Eora_main branch merge to Eora (#1301)

* fix type hint

* update warning msg

* update eora license to apache and attribute nvidia/arxiv

* remove early eora test files

* ipex doesn't need to pass  register_buffers to Torch

* refractor ipex

* refractor ipex2

* fix typo

* make ipex packable & add missing register_buffers

* cleanup ipex, add lora + bias check

* remove duplicated codes

* ignore two folders for pytest

* fix test lora. fix wrong tokenizer type

* compile adapter

* Fix `generation_config.json` not auto-saved (#1292)

* Fix `generation_config.json` not auto-saved

* Update writer.py

* update transformers 4.49.0

* [CI] update ci for requirements installation

* [CI] don't update intel_extension_for_pytorch for now

* [CI] remove ipex

* correct name backend to exllama_eora

* use hf save hack to fix config saves

* fix param name changed

* [SAVE] Save config files with empty state dict (#1293)

* Save model and config files with empty state dict

* cleanup

* cleanup

* print lora adapter loaded count vs total number of of quantized modules

* print lora adapter loaded count vs total number of of quantized modules

* fix wrong model.save

* Test GSM8K

* patch __repr__ for evalplus

* Save processor related config files. For example: preprocessor_config.json, chat_template.json (#1295)

* Fix adapter/eora for ipex kernel

* Fix eora for ipex/marlin

* Clean eora for exllama v1/v2

* fix shape does not match in Backend.Marlin

* add comment

* type hint use torch.dtype instead of torch.float32

* get _supports_flash_attn_2 from transformers

* fix prepare_dataset() error

* add color to logs

* fix ci: lm_head test

* fix pb and logging conflicting on output

* refractor logging/pb

* move wf_ buffer to post_init

* fix logger + pb compat

* rename pb.set_description to pb.info

* fix progressbar padding so cli ui width is stable

* add progressbar test

* fix progressbar display at close()/end

* todo fixme for pb

* fix pb display at end of iterable

* fix pb: reserve 1 char for cursor and remove external dependency

* fix pb: render end

* fix minicpm layer_modules error

Signed-off-by: ZX-ModelCloud <[email protected]>

* fix sharded models were deleted

* fix wrong order of config save causing sharded tensors to be removed (#1297)

* fix wrong order of config save causing zero tensors

* add processor to config block

* check for ProcessorMixin before calling save

* sync with main..fix save

* clean logs

* [CI] install color log

* fix hf is doing config validation on save which cause model save failure

* [FIX] not pack when group_size=-1 (#1298)

* Fix skipping pack() when group_size = -1

* assert len(qModules) > 0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Qubitium-ModelCloud <[email protected]>

* disable eora kernel until validated

* [CI] clean evalplus cache

* [CI] fix colorlog for xpu

* fix merge error

* ruff

---------

Signed-off-by: ZX-ModelCloud <[email protected]>
Co-authored-by: CSY <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>

* remove unused eora kernel

Signed-off-by: Qubitium <[email protected]>

* remove unused eora kernel

Signed-off-by: Qubitium <[email protected]>

* apply bias after eora adapter

Signed-off-by: Qubitium <[email protected]>

* add new bits test

* revert bad commit. cannot use logic true/false on self.bias directly since boolean tensor (multi-value) is not supported (conflicting)

Signed-off-by: Qubitium <[email protected]>

* revert bad commit. cannot use logic true/false on self.bias directly since boolean tensor (multi-value) is not supported (conflicting)

Signed-off-by: Qubitium <[email protected]>

* not do pad

* fix var name not exists

* missed pad code removal

Signed-off-by: Qubitium <[email protected]>

* removing padding code like torch kernel for triton

Signed-off-by: Qubitium <[email protected]>

* fix var rename

Signed-off-by: Qubitium <[email protected]>

* start deprecation of DynamicCuda kernel. Do not allow it to be auto-selected.

Signed-off-by: Qubitium <[email protected]>

* do not log too verbose json result on cli

Signed-off-by: Qubitium <[email protected]>

* Fix `do_sample` config errors on load (also fixed config save)
Fix `generation_config.json` is not loaded post-quantization

Signed-off-by: Qubitium <[email protected]>

* log only class simple name

Signed-off-by: Qubitium <[email protected]>

* fix old transformer compat

Signed-off-by: Qubitium <[email protected]>

* fix vllm doesn't have can_generate

* refract: hf auto config fix

Signed-off-by: Qubitium <[email protected]>

* log txt changes

Signed-off-by: Qubitium <[email protected]>

* disable auto-padding in exllama kernels

Signed-off-by: Qubitium <[email protected]>

* falcon is merged into HF, does not need trust_remote=True

Signed-off-by: Qubitium <[email protected]>

* fix deepseek2-lite ci test, add `layer_modules_strict: bool` control to model defs

Signed-off-by: Qubitium <[email protected]>

* fix deepseek v2-lite again: do not process already processed module

Signed-off-by: Qubitium <[email protected]>

* merge deepseek v2 possible layer_modules into single def

Signed-off-by: Qubitium <[email protected]>

* revert partil looper change now that deepseek v2 layer_modules are merged

Signed-off-by: Qubitium <[email protected]>

* set default data size to 256

* fix self.in_features was not set

* [CI] use latest CI docker image

* [CI] install colorlog

* Correctly use torch.no_grad() to avoid OOM when quantize VL Model

* fix vllm doesn't have named_children()

* [CI] pass exclusive for gpu service

* revert module check for vllm

* if model is not a nn.Module, skip finding

* fix checking

* fix env must be before torch imports

Signed-off-by: Qubitium <[email protected]>

* move PYTORCH_ENABLE_MPS_FALLBACK to top

* ovis model require transformers<=4.48.3

* print expected value

* [CI] fix names

* [CI] fix xpu env reinstalled torch

* torch kernel will enable compile optimizations by default for torch 2.6.0

Signed-off-by: Qubitium <[email protected]>

* fix transformers compat

Signed-off-by: Qubitium <[email protected]>

* disable exllama kernel from quantization (remove from packable)

Signed-off-by: Qubitium <[email protected]>

* fix evalplus try toString a Decoder

* replace subprocess run by raising an error

* fix ci test_dynamic scores

Signed-off-by: Qubitium <[email protected]>

* cleanup eora test

Signed-off-by: Qubitium <[email protected]>

* fix sglang' transformers error

* OVIS is compatible with transformers v4.49.0

* move ipex to new test files

* Update ovis.py

* decrease batch to 16

* format

Signed-off-by: Qubitium <[email protected]>

* logs

Signed-off-by: Qubitium <[email protected]>

* fix ci lora config test

Signed-off-by: Qubitium <[email protected]>

* fix ci: dynamic

Signed-off-by: Qubitium <[email protected]>

* fix ci: opt expects exllama when triton is used for quant

Signed-off-by: Qubitium <[email protected]>

* fix ci: transformers test oom

Signed-off-by: Qubitium <[email protected]>

* Add some comments to eora.py

* add comments to eora.py

---------

Signed-off-by: ZX-ModelCloud <[email protected]>
Signed-off-by: Qubitium <[email protected]>
Co-authored-by: CSY <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: LIU, Shih-Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants