-
Notifications
You must be signed in to change notification settings - Fork 12k
llama : add support for EXAONE tied word embeddings #12451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I was adding a weight-copy to Would it make sense to remove the copying for other models? |
Which model is copying the weight? AFAIK if preferable not to copy the weight if the model uses tied word embd, otherwise it defeat the whole point of reducing memory usage 😂 |
For sure, but the following converted models have this: Bloom llama.cpp/convert_hf_to_gguf.py Lines 1102 to 1107 in 7dfad38
GPT2 llama.cpp/convert_hf_to_gguf.py Lines 2407 to 2409 in 7dfad38
CodeShell llama.cpp/convert_hf_to_gguf.py Lines 2747 to 2752 in 7dfad38
Mamba (this one is interesting, I guess it's handled in model loading) llama.cpp/convert_hf_to_gguf.py Lines 3798 to 3804 in 7dfad38
|
For bloom, gpt-2 and codeshell, yes seems like we can remove it and update
|
Edit: the code for mamba does not duplicate the tensor, it only handles the case where these 2 tensors are the same ; if that the case, it does not write the same tensor twice, which is what we expect. |
Yep, but that means it's actually handled already for Mamba? Edit: Indeed it is Lines 2628 to 2632 in 7dfad38
|
Please note tested: I can however create GGUFS using EXONE's f16.gguf (not created with Llamacpp?) without issue: |
@David-AU-github Make sure you are using llama-quantize from 99aa304 (b4915) or later. |
Fix #12448
Tested and confirm to work with https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B