Skip to content

Add methods to explicitly free model from memory #1513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 13, 2024

Conversation

jkawamoto
Copy link
Contributor

@jkawamoto jkawamoto commented Jun 6, 2024

This PR introduces a close method to Llama and _LlamaModel, allowing users to explicitly free the model from RAM/VRAM.

The previous implementation relied on the destructor of _LlamaModel to free the model. However, in Python, the timing of destructor calls is unclear: for instance, the del statement does not guarantee immediate invocation of the destructor.

This PR offers an explicit method to release the model, which works immediately and enables the user to load another model without encountering memory issues. Here is an example:

from llama_cpp import Llama

llama = Llama(...)
...
llama.close()

Additionally, this PR allows the the Llama class class to be used as a context manager via the contextlib.closing interface. Here is an example:

from contextlib import closing
from llama_cpp import Llama

with closing(Llama(...)) as llama:
    ...
    # the model will be freed here

...

@abetlen
Copy link
Owner

abetlen commented Jun 6, 2024

@jkawamoto definitely agree here, thank you for the contribution.

A few things I noticed:

  • We may need to perform some runtime checks / asserts to make sure the model hasn't been closed
  • Right now you're only closing the model explicitly but the context and batch (which both also hold memory) still use the__del__

I also think we should have all the internal classess support the close / context manager approach, then we can use contextlib.ExitStack in the main Llama class to ensure they're cleaned up properly.

jkawamoto added 3 commits June 7, 2024 00:50
This commit introduces a `close` method to both `Llama` and `_LlamaModel`,
allowing users to explicitly free the model from RAM/VRAM.

The previous implementation relied on the destructor of `_LlamaModel` to free
the model. However, in Python, the timing of destructor calls is unclear—for
instance, the `del` statement does not guarantee immediate invocation of the
destructor.

This commit provides an explicit method to release the model, which works
immediately and allows the user to load another model without memory issues.

Additionally, this commit implements a context manager in the `Llama` class,
enabling the automatic closure of the `Llama` object when used with the `with`
statement.
…amaBatch

This commit enables automatic resource management by
implementing the `ContextManager` protocol in `_LlamaModel`,
`_LlamaContext`, and `_LlamaBatch`. This ensures that
resources are properly managed and released within a `with`
statement, enhancing robustness and safety in resource handling.
This update implements ExitStack to manage and close internal
classes in Llama, enhancing efficient and safe resource
management.
@jkawamoto
Copy link
Contributor Author

@abetlen Thanks for your comment.

I implemented a context manager in three internal classes and added an ExitStack to the Llama class.

Regarding the runtime check of whether the model, context, and batch are still loaded, it appears that the three inner classes verify the low-level object isn't None before freeing the object. Thus, we can safely close these objects more than once. Would it be necessary to add any additional checks?

@abetlen
Copy link
Owner

abetlen commented Jun 13, 2024

@jkawamoto I've update the use of ExitStacks slightly, additionally I removed the explicit __enter__ / __exit__ methods. Since we now have a .close() method we can use contextlib.closing from the standard library to turn the Llama class into a context manager automatically.

from contextlib import closing
from llama_cpp import Llama

with closing(Llama(...)) as llama:
    ...
    # the model will be freed here

@abetlen abetlen merged commit 320a5d7 into abetlen:main Jun 13, 2024
16 checks passed
@jkawamoto
Copy link
Contributor Author

@abetlen I think it would be better if the Llama class implements a context manager, as this would simplify the initialization of an instance by requiring fewer input characters.

Additionally, it is still worthwhile to retain the __del__ method in Llama to prevent memory leaks in cases where the user forgets to close the object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants