Skip to content

Commit 7b88584

Browse files
authored
Scale buf_size linearly with n_ctx
This appear to solve ggml-org#153 where error of "ggml_new_tensor_impl: not enough space in the context's memory pool" is thrown in interactive mode. At least the out of memory error come from `ctx0` used here. Although I am not familiar with the code base enough to tell if this is indeed the cause.
1 parent 7213110 commit 7b88584

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

main.cpp

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -549,9 +549,7 @@ bool llama_eval(
549549

550550
const int d_key = n_embd/n_head;
551551

552-
// TODO: check if this size scales with n_ctx linearly and remove constant. somehow I feel it wasn't the case
553-
// static size_t buf_size = hparams.n_ctx*1024*1024;
554-
static size_t buf_size = 512u*1024*1024;
552+
static size_t buf_size = (size_t)hparams.n_ctx*1024*1024;
555553
static void * buf = malloc(buf_size);
556554

557555
if (mem_per_token > 0 && mem_per_token*N > buf_size) {

0 commit comments

Comments
 (0)