Skip to content

Add chat_template to exist gguf file #5897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bruceunx opened this issue Mar 6, 2024 · 10 comments · Fixed by #6588
Closed

Add chat_template to exist gguf file #5897

bruceunx opened this issue Mar 6, 2024 · 10 comments · Fixed by #6588
Labels
enhancement New feature or request stale

Comments

@bruceunx
Copy link

bruceunx commented Mar 6, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ x] I carefully followed the README.md.
  • [ x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

add chat template to exist gguf file

Motivation

I can parse chat template directly, not from extra info.

Possible Implementation

may be implement a additional function in gguf.GGUFReader class for add new field in metadata?

@bruceunx bruceunx added the enhancement New feature or request label Mar 6, 2024
@ggerganov
Copy link
Member

Chat templates are already added

@bruceunx
Copy link
Author

bruceunx commented Mar 6, 2024

I mean add chat templates to existed gguf file?

@ngxson
Copy link
Collaborator

ngxson commented Mar 6, 2024

Currently there is no example that can does what you ask for. In fact, what you ask can be rephrased as "How to modify KV metadata of gguf"

However it's totally prossible with gguf-py, something like gguf.add_string("tokenizer.chat_template", "...")

https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_writer.py#L149

@bruceunx
Copy link
Author

bruceunx commented Mar 6, 2024

Currently there is no example that can does what you ask for. In fact, what you ask can be rephrased as "How to modify KV metadata of gguf"

However it's totally prossible with gguf-py, something like gguf.add_string("tokenizer.chat_template", "...")

https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_writer.py#L149

Yes, but the add_string is from gguf.GGUFWriter, and I just only want to add chat_template for existed gguf model. like set_metadata with new field name and new field data

def set_metadata(reader: GGUFReader, args: argparse.Namespace) -> None:

https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-set-metadata.py

@ngxson
Copy link
Collaborator

ngxson commented Mar 6, 2024

I'm not sure if it's possible, because the offset of the whole file changes when you modify the metadata (correct me if I'm wrong; Most of the time I use the gguf inside cpp code, not the python version)

The safe way is to read gguf file, add metadata, then write a new gguf file with the new metadata. You can then copy tensors one by one to the new file.

@bruceunx
Copy link
Author

bruceunx commented Mar 6, 2024

I guess so, thanks

@bruceunx bruceunx closed this as completed Mar 6, 2024
@bruceunx
Copy link
Author

bruceunx commented Mar 7, 2024

import struct
import numpy as np
from gguf import GGUFReader, GGUFValueType, GGUF_DEFAULT_ALIGNMENT

#
file_path = "../openhermes.gguf"
new_file_path = "../add_chat_model.gguf"
#
reader = GGUFReader(file_path, "r+")
#

# get chat template
CHAT_TEMPLATE = "tokenizer.chat_template"
chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"

# generate and adjust struct pack value from chat template with alignment
alignment = GGUF_DEFAULT_ALIGNMENT
new_align = reader.fields.get('general.alignment')
if new_align is not None:
    alignment = new_align.parts[-1][0]

add_data = bytearray()

name_data = CHAT_TEMPLATE.encode("utf-8")
add_data += struct.pack("Q", len(name_data))
add_data += name_data
add_data += struct.pack("I", GGUFValueType.STRING.value)

raw_len = len(add_data) + 8 + len(chat_template)
add_len = alignment - (raw_len % alignment)
if add_len != 0:
    chat_template += " " * add_len

raw_data = chat_template.encode("utf-8")
add_data += struct.pack("Q", len(raw_data))
add_data += raw_data

# insert raw bytes into file
# find insert index
kv = reader.fields
last_field = list(kv.values())[-1]
insert_offset = last_field.offset

# copy original data
new_data = reader.data.copy()
new_data = np.concatenate(
    (new_data[:insert_offset], add_data, new_data[insert_offset:]))

# add kv_count
kv_count_idx = reader.fields["GGUF.kv_count"].parts[0][0]
new_data[kv_count_idx] += 1

# save file
with open(new_file_path, "wb") as file:
    file.write(new_data.tobytes())

I just implement add chat template to existed gguf, is it ok to add to the scripts in gguf-py or add to GGUF.Reader? or just here in case some one like me want to use this script?

@bruceunx bruceunx reopened this Mar 7, 2024
@ngxson
Copy link
Collaborator

ngxson commented Mar 7, 2024

Thanks for the solution. Yeah I think you can push it as a new script gguf-modify-metadata.py. Can you add the ability to modify any key of any types (string, number, array,...) ? Maybe someone will need this script, for example to modify chat template, bos/eos token,...

@bruceunx
Copy link
Author

bruceunx commented Mar 7, 2024

Yes, but other types like number, array and closely bound to the model, the reason I want to add chat template is many ggufs in hf now have no chat templates in kv, which is a little inconvenience.

@404-xianjin
Copy link

Yes, but other types like number, array and closely bound to the model, the reason I want to add chat template is many ggufs in hf now have no chat templates in kv, which is a little inconvenience.

I tried using this code to modify the metadata in my model, but I got an error when I executed it.The CHAT_TEMPLATE in the template can be successfully executed,However, I encountered the following error when using the model:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA RTX A5000, compute capability 8.6, VMM: yes
  Device 1: NVIDIA RTX A5000, compute capability 8.6, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX A5000) - 23281 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA RTX A5000) - 23282 MiB free
gguf_init_from_file_impl: encountered bad_alloc error while reading key 30
gguf_init_from_file_impl: failed to read key-value pairs
llama_model_load: error loading model: llama_model_loader: failed to load model from D:\gemma-3-27b-tools-Q4_K_M.gguf

llama_model_load_from_file_impl: failed to load model

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants