Skip to content

[Question] How would you unload a model after it has been loaded? #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
real-limitless opened this issue May 30, 2023 · 3 comments
Closed
Labels
documentation Improvements or additions to documentation performance

Comments

@real-limitless
Copy link

real-limitless commented May 30, 2023

Hello World,

Im looking for a way to be able to load multiple models at diffrent times but im looking for way to unload the model from GPU/RAM when Im done using in the main process.

For example, in your main process you call llama-cpp-python you call your model and you interface with the model.
However say you need to switch to another model via the main process, typically you suppose to unload the model from ram and then load the next model but I can seem to figure out a way to beable to gracefully tell llama-cpp to shutdown and unload from memory so I dont run into OOM issues.

Anyone have any ideas on how to do this?

@gjmulder
Copy link
Contributor

There's an open bug upstream with llama.cpp about it not cleaning up GPU VRAM. More details in #223.

AFAIK, the Python garbage collector should clean-up a model object that resides in CPU RAM when there's no references to it. Or you can explicitly call del(llama_obj) to destroy it in the case where you immediately need the RAM freed in order to create a new instance in the current scope.

More experienced Python programmers are invited to correct me here 😄

@gjmulder gjmulder added documentation Improvements or additions to documentation performance labels May 31, 2023
@AmineDjeghri
Copy link

Hello
Any tip to unload the model from the GPU please ?
Every time I load a new model, the vram keeps increasing

@abetlen
Copy link
Owner

abetlen commented Dec 22, 2023

This should all be fixed now, once the Llama class is garbage collected you should be able to easily load a new model without running out of memory. To do this either set llama = None or del llama (assuming your reference to the model is llama).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation performance
Projects
None yet
Development

No branches or pull requests

4 participants