Skip to content

warning: failed to mlock NNNNNN-byte buffer (after previously locking 0 bytes): Cannot allocate memory #254

Closed
@AnonymousAmalgrams

Description

@AnonymousAmalgrams

I'm getting the following output when running the web server from the git clone:

llama.cpp: loading model from ./vendor/llama.cpp/models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4017.34 MB
llama_model_load_internal: mem required  = 5809.34 MB (+ 17592185987986.00 MB per state)
warning: failed to mlock 4212486144-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

I manually built the libllama.so file and dumped it into the directory where it checks for it. I tried building it both following #30 and also with make libllama.so, both maybe as expected give the same result. However oddly enough, the pip install seems to work fine (not sure what it's doing differently) and gives the same "normal" ctx size (around 70KB) as running the model directly within vendor/llama.cpp and the -n 128 suggested for testing. Any suggestions for how to get a working libllama.so would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions