-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Models are failing to be properly unloaded and freeing up VRAM #1442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From what I can see, from llama_cpp import Llama
llama_model = Llama(…)
# Explicitly delete the model's internal object
llama_model._model.__del__() This approach has worked for me so far. |
In my experience, @jkawamoto approach is a good one, because it frees RAM/CUDA/other memory, even if the Llama object is stuck. I've tried calling |
Since calling a special method ( |
I am running llama_model._model.del() per the above comment, and I am still seeing the process use cuda ram. Has there been any movement on creating a proper close method? |
from llama_cpp import Llama
llama_model = Llama(…)
...
llama_model.close() |
Thank you!!! |
Expected Behavior
From the issue #302 , I expected the model to be unloaded with the following function:
However, there are two problems here:
1 - Using llama_free_model with the object llm (which is conventionally loaded) is resulting in this:
'llm' is generated with this:
2 - Even after deleting the object, assigning as None and invoking the garbage collection, the VRAM is still not freed. The VRAM only gets cleared after I kill the app along all of its processes and threads.
Current Behavior
1- llama_free_model does not work.
2 - Garbage collection not freeing up VRAM.
Environment and Context
I tried this on both an Arch Linux setup with an RTX 3090 and a Windows laptop with an eGPU. This problem was consistent on those two different OSes and different hardware setups.
AMD Ryzen 7 2700 Eight-Core Processor
NVIDIA GeForce RTX 3090
Arch Linux 6.8.9-arch1-1
Windows 11
Failure Information (for bugs)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
The text was updated successfully, but these errors were encountered: