Skip to content

Commit 83e633c

Browse files
llama : differentiate the KV dims in the attention (#4657)
* Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <[email protected]> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <[email protected]>
1 parent 32866c5 commit 83e633c

File tree

3 files changed

+202
-79
lines changed

3 files changed

+202
-79
lines changed

gguf-py/gguf/constants.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,8 @@ class Attention:
4646
HEAD_COUNT_KV = "{arch}.attention.head_count_kv"
4747
MAX_ALIBI_BIAS = "{arch}.attention.max_alibi_bias"
4848
CLAMP_KQV = "{arch}.attention.clamp_kqv"
49+
KEY_LENGTH = "{arch}.attention.key_length"
50+
VALUE_LENGTH = "{arch}.attention.value_length"
4951
LAYERNORM_EPS = "{arch}.attention.layer_norm_epsilon"
5052
LAYERNORM_RMS_EPS = "{arch}.attention.layer_norm_rms_epsilon"
5153

gguf-py/gguf/gguf_writer.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,12 @@ def add_head_count(self, count: int) -> None:
333333
def add_head_count_kv(self, count: int) -> None:
334334
self.add_uint32(Keys.Attention.HEAD_COUNT_KV.format(arch=self.arch), count)
335335

336+
def add_key_length(self, length: int) -> None:
337+
self.add_uint32(Keys.Attention.KEY_LENGTH.format(arch=self.arch), length)
338+
339+
def add_value_length(self, length: int) -> None:
340+
self.add_uint32(Keys.Attention.VALUE_LENGTH.format(arch=self.arch), length)
341+
336342
def add_max_alibi_bias(self, bias: float) -> None:
337343
self.add_float32(Keys.Attention.MAX_ALIBI_BIAS.format(arch=self.arch), bias)
338344

0 commit comments

Comments
 (0)