convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

amygbAI · 2024-01-12T14:16:20Z

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

NVIDIA T4 GPU along with Intel(R) Xeon(R) ( AWS -> g4dn.xlarge , 4 vCPUs )

i use the simple https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py with some custom data and llama-2-7b-hf as the base model. Post training , it invokes trainer.save_model and the output dir has the following contents

-rw-rw-r-- 1 ubuntu ubuntu 5100 Jan 12 14:04 README.md
-rw-rw-r-- 1 ubuntu ubuntu 134235048 Jan 12 14:04 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 576 Jan 12 14:04 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 1092 Jan 12 14:04 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 552 Jan 12 14:04 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1842948 Jan 12 14:04 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 4219 Jan 12 14:04 training_args.bin
-rw-rw-r-- 1 ubuntu ubuntu 4827151012 Jan 12 14:04 adapter_model.bin

as you can see it has no model.safetensors as required by convert.py .. i tried a bunch of other options to save the model ( trainer.model.save_pretrained , for example ) but the file was always adapter_model.safetensors.

i tried convert-hf-to-gguf.py as well and it too complains about model.safetensors ( and that too after suppressing the error which complains about causalLLAMA architecture not supported )

Is there any other convert script that handles such adapter safetensors ( i guess all models finetuned via peft will definitely be called adapter**_ ) ? when i went through the code i also noticed that the MODEL_ARCH only accomodates "LLAMA" and not "LLAMA2" ..is that why it also fails to find param names from adapter_safetensors in the MODEL_ARCH tmap methods ?

ggerganov · 2024-01-12T17:53:38Z

Have you tried using the convert-hf-to-gguf.py script instead?

vndee · 2024-01-13T10:19:50Z

Hi, I am facing a similar issue, I've tried to convert this finetuned model to gguf but got the error as in the screenshot below:

amygbAI · 2024-01-15T13:33:59Z

ok, my apologies ..there's nothing wrong with convert or any other utils from this awesome package. The documentation on finetuning HF models is a little sparse and hard to understand for first time users. I am writing down some steps below for peeps that happen to chance upon the same error

ASSUMPTION

you are finetuning a model from HF using PEFT ..i have NOT tried any other mode of finetuning

STEPS

Finetune the model with PEFT
It should generate an "adapter_config.json" ; "adapter_model.safetensors" and "README.md"
what HF does here is ONLY generate weights for the adapter ( LORA in this case ) .. this alone will NOT suffice if you want to quantize the model using either llama.cpp OR other packages since the base model ( LLAMA2 7B ) has its own weights that are NOT a part of the adapter weights
now if you want to simply infer the model using just adapter weights you can easily just use HF APIs and do so .. just that it will run extremely slow on CPUs ( if at all ) since they arent quantized
the additional step that caused me 3 days of lost searches is the merging of the adapter weights + base mode weights
found this beautiful blog https://medium.com/@oxenai/how-to-run-llama-2-on-cpu-after-fine-tuning-with-lora-53fac38dfbca
the only issue was that it required a "adapter_model.bin" , which will need to be specifically generated using torch.save(trainer.model.state_dict(), f"{script_args.output_dir}/adapter_model.bin"), along with the adapter weights being saved using trainer.model.save_pretrained
now simply follow the blog above to merge the weights and voila, convert.py works flawlessly as advertised in the README .. i am quite sure this will help @vndee as well

finally a big vote of thanks to @ggerganov 💯 for his service to the community .. i hope someone gets inspired by this and somehow figures out a way to even train the darned LLMs on CPUs 👍

amygbAI · 2024-01-15T13:37:03Z

as explained in the lengthy post above .. non issue

amygbAI added the bug-unconfirmed label Jan 12, 2024

amygbAI closed this as completed Jan 15, 2024

amygbAI mentioned this issue Jan 16, 2024

Issue in convert-lora-to-ggml.py #4940

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

amygbAI commented Jan 12, 2024 •

edited

Loading

ggerganov commented Jan 12, 2024

Uh oh!

vndee commented Jan 13, 2024

Uh oh!

amygbAI commented Jan 15, 2024 •

edited

Loading

Uh oh!

amygbAI commented Jan 15, 2024

Uh oh!

convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

Comments

amygbAI commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ggerganov commented Jan 12, 2024

Uh oh!

vndee commented Jan 13, 2024

Uh oh!

amygbAI commented Jan 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ASSUMPTION

STEPS

Uh oh!

amygbAI commented Jan 15, 2024

Uh oh!

amygbAI commented Jan 12, 2024 •

edited

Loading

amygbAI commented Jan 15, 2024 •

edited

Loading