-
Notifications
You must be signed in to change notification settings - Fork 1.1k
warning: failed to mlock NNNNNN-byte buffer (after previously locking 0 bytes): Cannot allocate memory #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ahh, saw the error. If you format your screen output and refer to the actual error in your issue description as per the template it helps people understand your issue easier. Here's the fix, which is not directly related to
There is a system-wide |
In the issue mentioned, they pasted an image of the output and still have a ctx of around 70KB and a correspondingly much smaller mem required than “+ 17592185987986.00 MB per state”. It lines up with what I observed running the llama.cpp version on its own vs the parameters that were somehow saved into the .so. If I’m interpreting that correctly I don’t think I would ever be able to get enough memory to run this even if I disabled mlock with those requirements, and worry for my computer if I tried 😅. |
17592185987986.00 MB (17.6 exabytes) is clearly a bug.
From llama.ccp CPU memory / disk reqs You can adjust |
Thanks, so as I understand it the n_gpu_layers will limit the amount of swap used to the associated VRAM amount? Just trying to make sure I'm not about to blow anything out. |
Actually now that I think of it though, isn't it kind of odd that swap is not needed in some cases then? I ran all testing on the same system, so my impression is that the same swap limits would be imposed. |
Generally you don't want the OS to swap the model out, but it may try to given the large memory footprint. The VRAM usage is AFAIK independent of the mlock behaviour. In my experience the |
llama_context struct changed in llama.cpp, updating your llama-cpp-python to current main should fix it. |
Okay, I managed to get it working now. Thanks again! |
This causes long prompts to parse very slowly.
@AnonymousAmalgrams What did you do? |
I just updated the repo… but there are probably a lot of other random things that can cause this problem. |
I'm getting the following output when running the web server from the git clone:
I manually built the libllama.so file and dumped it into the directory where it checks for it. I tried building it both following #30 and also with make libllama.so, both maybe as expected give the same result. However oddly enough, the pip install seems to work fine (not sure what it's doing differently) and gives the same "normal" ctx size (around 70KB) as running the model directly within vendor/llama.cpp and the -n 128 suggested for testing. Any suggestions for how to get a working libllama.so would be greatly appreciated.
The text was updated successfully, but these errors were encountered: