Skip to content

Commit 905e864

Browse files
Complete removal or f16_kv, add offload_kqv field
This addresses two issues: - #995 which just requests to add the KV cache offloading param - #1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)
1 parent 8e44a32 commit 905e864

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

llama_cpp/llama_cpp.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -429,9 +429,9 @@ class llama_context_params(Structure):
429429
type_k (int): data type for K cache
430430
type_v (int): data type for V cache
431431
mul_mat_q (bool): if true, use experimental mul_mat_q kernels (DEPRECATED - always true)
432-
f16_kv (bool): use fp16 for KV cache, fp32 otherwise
433432
logits_all (bool): the llama_eval() call computes all logits, not just the last one
434-
embedding (bool): embedding mode only"""
433+
embedding (bool): embedding mode only
434+
offload_kqv (bool): whether to offload the KQV ops (including the KV cache) to GPU"""
435435
_fields_ = [
436436
("seed", c_uint32),
437437
("n_ctx", c_uint32),
@@ -449,9 +449,9 @@ class llama_context_params(Structure):
449449
("type_k", c_int),
450450
("type_v", c_int),
451451
("mul_mat_q", c_bool),
452-
("f16_kv", c_bool),
453452
("logits_all", c_bool),
454453
("embedding", c_bool),
454+
("offload_kqv", c_bool),
455455
]
456456

457457

0 commit comments

Comments
 (0)