[SAVE] Save config files with empty state dict #1293

ZX-ModelCloud · 2025-02-18T10:17:23Z

No description provided.

* fix type hint * update warning msg * update eora license to apache and attribute nvidia/arxiv * remove early eora test files * ipex doesn't need to pass register_buffers to Torch * refractor ipex * refractor ipex2 * fix typo * make ipex packable & add missing register_buffers * cleanup ipex, add lora + bias check * remove duplicated codes * ignore two folders for pytest * fix test lora. fix wrong tokenizer type * compile adapter * Fix `generation_config.json` not auto-saved (#1292) * Fix `generation_config.json` not auto-saved * Update writer.py * update transformers 4.49.0 * [CI] update ci for requirements installation * [CI] don't update intel_extension_for_pytorch for now * [CI] remove ipex * correct name backend to exllama_eora * use hf save hack to fix config saves * fix param name changed * [SAVE] Save config files with empty state dict (#1293) * Save model and config files with empty state dict * cleanup * cleanup * print lora adapter loaded count vs total number of of quantized modules * print lora adapter loaded count vs total number of of quantized modules * fix wrong model.save * Test GSM8K * patch __repr__ for evalplus * Save processor related config files. For example: preprocessor_config.json, chat_template.json (#1295) * Fix adapter/eora for ipex kernel * Fix eora for ipex/marlin * Clean eora for exllama v1/v2 * fix shape does not match in Backend.Marlin * add comment * type hint use torch.dtype instead of torch.float32 * get _supports_flash_attn_2 from transformers * fix prepare_dataset() error * add color to logs * fix ci: lm_head test * fix pb and logging conflicting on output * refractor logging/pb * move wf_ buffer to post_init * fix logger + pb compat * rename pb.set_description to pb.info * fix progressbar padding so cli ui width is stable * add progressbar test * fix progressbar display at close()/end * todo fixme for pb * fix pb display at end of iterable * fix pb: reserve 1 char for cursor and remove external dependency * fix pb: render end * fix minicpm layer_modules error Signed-off-by: ZX-ModelCloud <[email protected]> * fix sharded models were deleted * fix wrong order of config save causing sharded tensors to be removed (#1297) * fix wrong order of config save causing zero tensors * add processor to config block * check for ProcessorMixin before calling save * sync with main..fix save * clean logs * [CI] install color log * fix hf is doing config validation on save which cause model save failure * [FIX] not pack when group_size=-1 (#1298) * Fix skipping pack() when group_size = -1 * assert len(qModules) > 0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * disable eora kernel until validated * [CI] clean evalplus cache * [CI] fix colorlog for xpu * fix merge error * ruff --------- Signed-off-by: ZX-ModelCloud <[email protected]> Co-authored-by: CSY <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]>

* fix override * simplify * fix missing `modules` item * breaking: fix module.state update * fix state should contain both W and WQ * fix no super() for class obj * remove get attr * call LoopProcessor.post_process() Signed-off-by: ZX-ModelCloud <[email protected]> * call processor.finalize * Correctly call methods from self.gptq_model Signed-off-by: ZX-ModelCloud <[email protected]> * rename to calibration_data * cleanup pack()..no need to clone weights..use T instead of t() * LoopProcessor add model_finalize() Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup pack()..rename var for clarity * pop wq from state * clean code..de-indent logic * add safety code to store original in/out features of W in NamedModule state since the weight will be heavily changed during quant * add stats() api and stats fields to processor * ruff * Fix circular import Signed-off-by: ZX-ModelCloud <[email protected]> * add license * add clearml back * fix NamedModule.__getattr__() error Signed-off-by: ZX-ModelCloud <[email protected]> * add `require_fwd` property to processor * simplify * fix canot set weight.data to None * fix the error that tasks is empty Signed-off-by: ZX-ModelCloud <[email protected]> * add todo * fix parameter position & name * fix import * fix named module override * fix __dict__ name error Signed-off-by: ZX-ModelCloud <[email protected]> * fix module type error Signed-off-by: ZX-ModelCloud <[email protected]> * fix layer_inputs index out of range Signed-off-by: ZX-ModelCloud <[email protected]> * rename * add lm_head quantize config Signed-off-by: ZX-ModelCloud <[email protected]> * pop `w` at submodule finalize * simplify...quantize should only be called once * release quantizer for module on post_process * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * refractor * cleanup * fix circular import Signed-off-by: ZX-ModelCloud <[email protected]> * refractor quantize() args and override * Fix GPTQProcessor log Signed-off-by: ZX-ModelCloud <[email protected]> * fix wrong damp_percent returned * return log Signed-off-by: ZX-ModelCloud <[email protected]> * fix hf api compat * use const, not str * rename to `finalize` * fix import * rename quantize() to quantize_old() Signed-off-by: ZX-ModelCloud <[email protected]> * fix import * If calibration_dataset is None or Empty, the input_cache of the previous processor is used Signed-off-by: ZX-ModelCloud <[email protected]> * add fixme for hf api compat of fasterquant * add EoraConfig Signed-off-by: ZX-ModelCloud <[email protected]> * remove .module * add eora processor * fix misc * fix misc * fix isinstance can't check subclass * fix lora config storage * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * change name to class method * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * format * fix adapter.name() should be classmethod * fix eora logging * move all eora test code into eora_test (pending removal) * move eora algorithm to nvidia licensed eora file * remove unused * fix hf api compat for quantize() * use EoraProcessor() Signed-off-by: ZX-ModelCloud <[email protected]> * fix processor.num_batches setting Signed-off-by: ZX-ModelCloud <[email protected]> * async move wq to cpu * fix not a python package * fix exllama was not compiled * add async move for gptq processor * move prepare_dataset() to LoopProcessor Signed-off-by: ZX-ModelCloud <[email protected]> * add release_calibration_dataset() Signed-off-by: ZX-ModelCloud <[email protected]> * update error for lm_head and model with tied_weights=True * consolidate dynamic skipped logic * Fix eigen_scaling_diag_matrix not initialized Signed-off-by: ZX-ModelCloud <[email protected]> * Fix subset repeated quantization Signed-off-by: ZX-ModelCloud <[email protected]> * add processed_subset Signed-off-by: ZX-ModelCloud <[email protected]> * Fix the error that the type of wq obtained is tuple Signed-off-by: ZX-ModelCloud <[email protected]> * fix weight.data should not be moved to cpu for process code * del and overwrite is the same for gc * Fix layer_inputs where the last layer is emtpy Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup * use Lora.name() class method for mapping * fix adapter save and load Signed-off-by: ZX-ModelCloud <[email protected]> * move `quant_result` from gptq_process to base loop_process as `_results` * add `stream: bool` toggle in `move_to` r Tensors type only * format * compat: make sure lora key can found for all HF AutoModel api * save eora and test * fix streaming * fix compat loading for hf names * fix BitBLASQuantLinear's adapter argument error Signed-off-by: ZX-ModelCloud <[email protected]> * fix ugly mess in lm_eval integration, vars mismatch, type mis-match * remove util.eval calls.. always use GPTQModel.eval() * rename eval backend to llm_backend and add real gptqmodel specific backend var * add gen_kwargs * use ellama v2 for lm-eval and use acc_norm only * use ellama v2 for lm-eval and use acc_norm only * fix ci test * comment out special kernels * fix Lora.apply() error when batched generate Signed-off-by: ZX-ModelCloud <[email protected]> * fix compile * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * fix `generate()` not applying correct pad_token_id from tokenizer * protect against null (Optinoal) tokenizer * cleanup compile * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * fix cuda kernel * disable eora kernels except for torch * add `adapter` control/override in `quantize()` * remove quantize_config.eora_dataset property * patch evalplus to allow passing a model directly * change test to pass adapter on GPTQModel.load(). Since `adapter` config is not saved in model config.json and quantize_config.json, we need to always pass `adapter` to enable gptq/lora/eora * Fix module.bias not being able to be assigned Signed-off-by: ZX-ModelCloud <[email protected]> * comment * print Adapter loaded post-init so user knows adapter is correctly loaded from disk * fix evalplus oom * fix ci tests..random seed consolidated into one var * fix ci tests * disable streaming and fix ci test * add base vs eora arc-challenge benchmarks to eora test * fix module.compile overriding nn.module compile. rename to `g_compile` * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * rename `g_compile` to `opimize` * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * refactor eora_generate() Signed-off-by: ZX-ModelCloud <[email protected]> * fix argument error Signed-off-by: ZX-ModelCloud <[email protected]> * add `kernels()` api to use so which kernels have been loaded at end of model load * add DequantizeProcessor * add DequantizeProcessor * refractor add `retrain_w` option to GPTQProcessor * cleanup * comments * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * Fix Assignment Error Signed-off-by: ZX-ModelCloud <[email protected]> * DequantizeProcessor does not perform any operations on dataset Signed-off-by: ZX-ModelCloud <[email protected]> * refractor: upcast w to float32 before delta calculation in case of bfloat16 and float16 mismatch * fix wrong assert (reversed) * cleanup * fix summary log Signed-off-by: ZX-ModelCloud <[email protected]> * call eora_save() Signed-off-by: ZX-ModelCloud <[email protected]> * fix argument name error Signed-off-by: ZX-ModelCloud <[email protected]> * add code for assert eora weight Signed-off-by: ZX-ModelCloud <[email protected]> * cleanup Signed-off-by: ZX-ModelCloud <[email protected]> * add test_eora_post_quant() Signed-off-by: ZX-ModelCloud <[email protected]> * clean up `test_quant_erao` so we have config at top and print config before lm-eval results # Conflicts: # tests/test_quant_and_eora.py * add test_eora_post_quant.py Signed-off-by: ZX-ModelCloud <[email protected]> * default to group_size 128 for test. group_size 64 has strange regression * rename * refractor api to `GPTQModel.adapter.generate` * cleanup * cleanup * avoid converting to scalar via item() as torch.compile doesn't like it * try to speed things for eora gen with compile * increase cache and disable scalar captures * use local model path * revert making adapter a module * use torch_compile helper instead torch.compile * use torch_compile helper instead torch.compile * move dequantize_weight() to PackableQuantLinear Signed-off-by: ZX-ModelCloud <[email protected]> * bump intel_extension_for_pytorch to 2.6.0 & remove pack() for ipex & remove xpu check for fp16 * Revert "move dequantize_weight() to PackableQuantLinear" This reverts commit b5d311d. * merge main's eval() changes * push `wf` and dequantize code into packable. refractor ipex to be based on torch kernel # Conflicts: # gptqmodel/nn_modules/qlinear/ipex.py * eora has been moved to eora-copy branch * fix test didn't pass any model * add register_buffers to init * remove unused args * revert register_buffers changes * revert deleting eora dir * remove eora test code * update eora license to apache and attribute nvidia/arxiv * Eora_main branch merge to Eora (#1301) * fix type hint * update warning msg * update eora license to apache and attribute nvidia/arxiv * remove early eora test files * ipex doesn't need to pass register_buffers to Torch * refractor ipex * refractor ipex2 * fix typo * make ipex packable & add missing register_buffers * cleanup ipex, add lora + bias check * remove duplicated codes * ignore two folders for pytest * fix test lora. fix wrong tokenizer type * compile adapter * Fix `generation_config.json` not auto-saved (#1292) * Fix `generation_config.json` not auto-saved * Update writer.py * update transformers 4.49.0 * [CI] update ci for requirements installation * [CI] don't update intel_extension_for_pytorch for now * [CI] remove ipex * correct name backend to exllama_eora * use hf save hack to fix config saves * fix param name changed * [SAVE] Save config files with empty state dict (#1293) * Save model and config files with empty state dict * cleanup * cleanup * print lora adapter loaded count vs total number of of quantized modules * print lora adapter loaded count vs total number of of quantized modules * fix wrong model.save * Test GSM8K * patch __repr__ for evalplus * Save processor related config files. For example: preprocessor_config.json, chat_template.json (#1295) * Fix adapter/eora for ipex kernel * Fix eora for ipex/marlin * Clean eora for exllama v1/v2 * fix shape does not match in Backend.Marlin * add comment * type hint use torch.dtype instead of torch.float32 * get _supports_flash_attn_2 from transformers * fix prepare_dataset() error * add color to logs * fix ci: lm_head test * fix pb and logging conflicting on output * refractor logging/pb * move wf_ buffer to post_init * fix logger + pb compat * rename pb.set_description to pb.info * fix progressbar padding so cli ui width is stable * add progressbar test * fix progressbar display at close()/end * todo fixme for pb * fix pb display at end of iterable * fix pb: reserve 1 char for cursor and remove external dependency * fix pb: render end * fix minicpm layer_modules error Signed-off-by: ZX-ModelCloud <[email protected]> * fix sharded models were deleted * fix wrong order of config save causing sharded tensors to be removed (#1297) * fix wrong order of config save causing zero tensors * add processor to config block * check for ProcessorMixin before calling save * sync with main..fix save * clean logs * [CI] install color log * fix hf is doing config validation on save which cause model save failure * [FIX] not pack when group_size=-1 (#1298) * Fix skipping pack() when group_size = -1 * assert len(qModules) > 0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * disable eora kernel until validated * [CI] clean evalplus cache * [CI] fix colorlog for xpu * fix merge error * ruff --------- Signed-off-by: ZX-ModelCloud <[email protected]> Co-authored-by: CSY <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> * remove unused eora kernel Signed-off-by: Qubitium <[email protected]> * remove unused eora kernel Signed-off-by: Qubitium <[email protected]> * apply bias after eora adapter Signed-off-by: Qubitium <[email protected]> * add new bits test * revert bad commit. cannot use logic true/false on self.bias directly since boolean tensor (multi-value) is not supported (conflicting) Signed-off-by: Qubitium <[email protected]> * revert bad commit. cannot use logic true/false on self.bias directly since boolean tensor (multi-value) is not supported (conflicting) Signed-off-by: Qubitium <[email protected]> * not do pad * fix var name not exists * missed pad code removal Signed-off-by: Qubitium <[email protected]> * removing padding code like torch kernel for triton Signed-off-by: Qubitium <[email protected]> * fix var rename Signed-off-by: Qubitium <[email protected]> * start deprecation of DynamicCuda kernel. Do not allow it to be auto-selected. Signed-off-by: Qubitium <[email protected]> * do not log too verbose json result on cli Signed-off-by: Qubitium <[email protected]> * Fix `do_sample` config errors on load (also fixed config save) Fix `generation_config.json` is not loaded post-quantization Signed-off-by: Qubitium <[email protected]> * log only class simple name Signed-off-by: Qubitium <[email protected]> * fix old transformer compat Signed-off-by: Qubitium <[email protected]> * fix vllm doesn't have can_generate * refract: hf auto config fix Signed-off-by: Qubitium <[email protected]> * log txt changes Signed-off-by: Qubitium <[email protected]> * disable auto-padding in exllama kernels Signed-off-by: Qubitium <[email protected]> * falcon is merged into HF, does not need trust_remote=True Signed-off-by: Qubitium <[email protected]> * fix deepseek2-lite ci test, add `layer_modules_strict: bool` control to model defs Signed-off-by: Qubitium <[email protected]> * fix deepseek v2-lite again: do not process already processed module Signed-off-by: Qubitium <[email protected]> * merge deepseek v2 possible layer_modules into single def Signed-off-by: Qubitium <[email protected]> * revert partil looper change now that deepseek v2 layer_modules are merged Signed-off-by: Qubitium <[email protected]> * set default data size to 256 * fix self.in_features was not set * [CI] use latest CI docker image * [CI] install colorlog * Correctly use torch.no_grad() to avoid OOM when quantize VL Model * fix vllm doesn't have named_children() * [CI] pass exclusive for gpu service * revert module check for vllm * if model is not a nn.Module, skip finding * fix checking * fix env must be before torch imports Signed-off-by: Qubitium <[email protected]> * move PYTORCH_ENABLE_MPS_FALLBACK to top * ovis model require transformers<=4.48.3 * print expected value * [CI] fix names * [CI] fix xpu env reinstalled torch * torch kernel will enable compile optimizations by default for torch 2.6.0 Signed-off-by: Qubitium <[email protected]> * fix transformers compat Signed-off-by: Qubitium <[email protected]> * disable exllama kernel from quantization (remove from packable) Signed-off-by: Qubitium <[email protected]> * fix evalplus try toString a Decoder * replace subprocess run by raising an error * fix ci test_dynamic scores Signed-off-by: Qubitium <[email protected]> * cleanup eora test Signed-off-by: Qubitium <[email protected]> * fix sglang' transformers error * OVIS is compatible with transformers v4.49.0 * move ipex to new test files * Update ovis.py * decrease batch to 16 * format Signed-off-by: Qubitium <[email protected]> * logs Signed-off-by: Qubitium <[email protected]> * fix ci lora config test Signed-off-by: Qubitium <[email protected]> * fix ci: dynamic Signed-off-by: Qubitium <[email protected]> * fix ci: opt expects exllama when triton is used for quant Signed-off-by: Qubitium <[email protected]> * fix ci: transformers test oom Signed-off-by: Qubitium <[email protected]> * Add some comments to eora.py * add comments to eora.py --------- Signed-off-by: ZX-ModelCloud <[email protected]> Signed-off-by: Qubitium <[email protected]> Co-authored-by: CSY <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LIU, Shih-Yang <[email protected]>

ZX-ModelCloud added 2 commits February 18, 2025 18:16

Save model and config files with empty state dict

ed2e756

cleanup

397e08b

ZX-ModelCloud changed the title ~~Save model and config files with empty state dict~~ [SAVE] Save model and config files with empty state dict Feb 18, 2025

cleanup

22aae27

ZX-ModelCloud changed the title ~~[SAVE] Save model and config files with empty state dict~~ [SAVE] Save config files with empty state dict Feb 18, 2025

Qubitium merged commit 0f8269a into main Feb 18, 2025
3 checks passed

Qubitium deleted the zx_save_more_config_files branch February 18, 2025 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SAVE] Save config files with empty state dict #1293

[SAVE] Save config files with empty state dict #1293

Uh oh!

ZX-ModelCloud commented Feb 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SAVE] Save config files with empty state dict #1293

[SAVE] Save config files with empty state dict #1293

Uh oh!

Conversation

ZX-ModelCloud commented Feb 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants