llama : optimize memory buffers #2325

ggerganov · 2023-07-22T16:18:55Z

MEM_REQ_KV_SELF is now a function of hparams (needed for llama : grouped-query attention + LLaMAv2 70B support #2276 )
MEM_REQ_EVAL no longer needed as it is dynamically computed for each graph
fix --mtest arg to work correctly

ghost · 2023-07-22T19:21:57Z

Hi,

has overall RAM decreased? Here's ./main prior to this commit:

main: build = 878 (b5fe67f)
main: seed  = 1690051538
llama.cpp: loading model from /data/data/com.termux/files/home/llama-2-7b-chat-codeCherryPop.ggmlv3.q4_0.bin
...
llama_model_load_internal: mem required  = 5287.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB

Here's after this commit:

main: build = 879 (b47b8a9)
main: seed  = 1690053156
llama.cpp: loading model from /data/data/com.termux/files/home/llama-2-7b-chat-codeCherryPop.ggmlv3.q4_0.bin
...
llama_model_load_internal: mem required  = 4013.72 MB (+ 1024.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB

1.2GB freed? Wow?

ggerganov · 2023-07-23T11:19:33Z

Yes, the allocated buffers should fit more tightly now. There is one dynamic "work" buffer which size is not currently displayed. It is relatively small when not using BLAS and a bit bigger when BLAS is enabled.

Anyway, all this memory management stuff will be further improved in the future.

llama : optimize memory buffers

b793fa9

ggerganov merged commit b47b8a9 into master Jul 22, 2023

ggerganov deleted the mem-opt branch July 22, 2023 18:18

ggerganov mentioned this pull request Jul 22, 2023

llama : grouped-query attention + LLaMAv2 70B support #2276

Merged

LostRuins mentioned this pull request Jul 23, 2023

KoboldCpp v1.36 - I can't use 13B models anymore? LostRuins/koboldcpp#331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : optimize memory buffers #2325

llama : optimize memory buffers #2325

Uh oh!

ggerganov commented Jul 22, 2023

Uh oh!

ghost commented Jul 22, 2023

Uh oh!

ggerganov commented Jul 23, 2023

Uh oh!

Uh oh!

llama : optimize memory buffers #2325

llama : optimize memory buffers #2325

Uh oh!

Conversation

ggerganov commented Jul 22, 2023

Uh oh!

ghost commented Jul 22, 2023

Uh oh!

ggerganov commented Jul 23, 2023

Uh oh!

Uh oh!