-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add methods to explicitly free model from memory #1513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jkawamoto definitely agree here, thank you for the contribution. A few things I noticed:
I also think we should have all the internal classess support the |
This commit introduces a `close` method to both `Llama` and `_LlamaModel`, allowing users to explicitly free the model from RAM/VRAM. The previous implementation relied on the destructor of `_LlamaModel` to free the model. However, in Python, the timing of destructor calls is unclear—for instance, the `del` statement does not guarantee immediate invocation of the destructor. This commit provides an explicit method to release the model, which works immediately and allows the user to load another model without memory issues. Additionally, this commit implements a context manager in the `Llama` class, enabling the automatic closure of the `Llama` object when used with the `with` statement.
…amaBatch This commit enables automatic resource management by implementing the `ContextManager` protocol in `_LlamaModel`, `_LlamaContext`, and `_LlamaBatch`. This ensures that resources are properly managed and released within a `with` statement, enhancing robustness and safety in resource handling.
This update implements ExitStack to manage and close internal classes in Llama, enhancing efficient and safe resource management.
cf2ea15
to
fa702b4
Compare
@abetlen Thanks for your comment. I implemented a context manager in three internal classes and added an Regarding the runtime check of whether the model, context, and batch are still loaded, it appears that the three inner classes verify the low-level object isn't |
@jkawamoto I've update the use of ExitStacks slightly, additionally I removed the explicit from contextlib import closing
from llama_cpp import Llama
with closing(Llama(...)) as llama:
...
# the model will be freed here |
@abetlen I think it would be better if the Additionally, it is still worthwhile to retain the |
This PR introduces a
close
method toLlama
and_LlamaModel
, allowing users to explicitly free the model from RAM/VRAM.The previous implementation relied on the destructor of
_LlamaModel
to free the model. However, in Python, the timing of destructor calls is unclear: for instance, thedel
statement does not guarantee immediate invocation of the destructor.This PR offers an explicit method to release the model, which works immediately and enables the user to load another model without encountering memory issues. Here is an example:
Additionally, this PR allows the the
Llama
class class to be used as a context manager via thecontextlib.closing
interface. Here is an example: