Skip to content

convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amygbAI opened this issue Jan 12, 2024 · 4 comments
Closed

convert.py fails for finetuned llama2 models ( via HF trl library ) #4896

amygbAI opened this issue Jan 12, 2024 · 4 comments

Comments

@amygbAI
Copy link

amygbAI commented Jan 12, 2024

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

NVIDIA T4 GPU along with Intel(R) Xeon(R) ( AWS -> g4dn.xlarge , 4 vCPUs )

i use the simple https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py with some custom data and llama-2-7b-hf as the base model. Post training , it invokes trainer.save_model and the output dir has the following contents

-rw-rw-r-- 1 ubuntu ubuntu 5100 Jan 12 14:04 README.md
-rw-rw-r-- 1 ubuntu ubuntu 134235048 Jan 12 14:04 adapter_model.safetensors
-rw-rw-r-- 1 ubuntu ubuntu 576 Jan 12 14:04 adapter_config.json
-rw-rw-r-- 1 ubuntu ubuntu 1092 Jan 12 14:04 tokenizer_config.json
-rw-rw-r-- 1 ubuntu ubuntu 552 Jan 12 14:04 special_tokens_map.json
-rw-rw-r-- 1 ubuntu ubuntu 1842948 Jan 12 14:04 tokenizer.json
-rw-rw-r-- 1 ubuntu ubuntu 4219 Jan 12 14:04 training_args.bin
-rw-rw-r-- 1 ubuntu ubuntu 4827151012 Jan 12 14:04 adapter_model.bin

as you can see it has no model.safetensors as required by convert.py .. i tried a bunch of other options to save the model ( trainer.model.save_pretrained , for example ) but the file was always adapter_model.safetensors.

i tried convert-hf-to-gguf.py as well and it too complains about model.safetensors ( and that too after suppressing the error which complains about causalLLAMA architecture not supported )

Is there any other convert script that handles such adapter safetensors ( i guess all models finetuned via peft will definitely be called adapter**_ ) ? when i went through the code i also noticed that the MODEL_ARCH only accomodates "LLAMA" and not "LLAMA2" ..is that why it also fails to find param names from adapter_safetensors in the MODEL_ARCH tmap methods ?

@ggerganov
Copy link
Member

Have you tried using the convert-hf-to-gguf.py script instead?

@vndee
Copy link

vndee commented Jan 13, 2024

Hi, I am facing a similar issue, I've tried to convert this finetuned model to gguf but got the error as in the screenshot below:
Screenshot 2024-01-13 at 17 19 01

@amygbAI
Copy link
Author

amygbAI commented Jan 15, 2024

ok, my apologies ..there's nothing wrong with convert or any other utils from this awesome package. The documentation on finetuning HF models is a little sparse and hard to understand for first time users. I am writing down some steps below for peeps that happen to chance upon the same error

ASSUMPTION

you are finetuning a model from HF using PEFT ..i have NOT tried any other mode of finetuning

STEPS

  1. Finetune the model with PEFT
  2. It should generate an "adapter_config.json" ; "adapter_model.safetensors" and "README.md"
  3. what HF does here is ONLY generate weights for the adapter ( LORA in this case ) .. this alone will NOT suffice if you want to quantize the model using either llama.cpp OR other packages since the base model ( LLAMA2 7B ) has its own weights that are NOT a part of the adapter weights
  4. now if you want to simply infer the model using just adapter weights you can easily just use HF APIs and do so .. just that it will run extremely slow on CPUs ( if at all ) since they arent quantized
  5. the additional step that caused me 3 days of lost searches is the merging of the adapter weights + base mode weights
  6. found this beautiful blog https://medium.com/@oxenai/how-to-run-llama-2-on-cpu-after-fine-tuning-with-lora-53fac38dfbca
  7. the only issue was that it required a "adapter_model.bin" , which will need to be specifically generated using torch.save(trainer.model.state_dict(), f"{script_args.output_dir}/adapter_model.bin"), along with the adapter weights being saved using trainer.model.save_pretrained
  8. now simply follow the blog above to merge the weights and voila, convert.py works flawlessly as advertised in the README .. i am quite sure this will help @vndee as well

finally a big vote of thanks to @ggerganov 💯 for his service to the community .. i hope someone gets inspired by this and somehow figures out a way to even train the darned LLMs on CPUs 👍

@amygbAI amygbAI closed this as completed Jan 15, 2024
@amygbAI
Copy link
Author

amygbAI commented Jan 15, 2024

as explained in the lengthy post above .. non issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants