Description
I was attempting to merge alpaca-lora from https://huggingface.co/tloen/alpaca-lora-7b and the original llama-7B from https://huggingface.co/decapoda-research/llama-7b-hf, also tried to quantize the model and run main file in llama.cpp.
The merge code is from https://github.com/clcarwin/alpaca-weight
It was almost successful until final phase to run the main file in llam.cpp. I had no problems with merge and quantization.
Then it raised an error like this:
llama_model_load: llama_model_load: unknown tensor 'model.embed_tokens.weight' in model file
main: failed to load model from './models/7B/ggml-model-q4_0.bin'
I will share my logs in my repository. The code I used in colab to merge and quantize the model is there too: https://github.com/taiyou2000/personal_experimant
I'm not machine learning expert and I have not checked entire llama.cpp code, but in my theory maybe the quantized model contains weights and some of them has names that main.cpp doesn't expect to see. As you can see in quantization_log.txt and pth_to_ggml_log.txt from my repository, it has names like "model.layers.0.self_attn.q_proj.weight", and probably it should be like "model.layers.0.attention.wq.weight" for main.cpp.
I can run llama.cpp without any problems on my local computer and the model is quantized from torrent version. I guess huggingface version has something different from it.