Add methods to explicitly free model from memory #1513

jkawamoto · 2024-06-06T09:29:24Z

This PR introduces a close method to Llama and _LlamaModel, allowing users to explicitly free the model from RAM/VRAM.

The previous implementation relied on the destructor of _LlamaModel to free the model. However, in Python, the timing of destructor calls is unclear: for instance, the del statement does not guarantee immediate invocation of the destructor.

This PR offers an explicit method to release the model, which works immediately and enables the user to load another model without encountering memory issues. Here is an example:

from llama_cpp import Llama

llama = Llama(...)
...
llama.close()

Additionally, this PR allows the the Llama class class to be used as a context manager via the contextlib.closing interface. Here is an example:

from contextlib import closing
from llama_cpp import Llama

with closing(Llama(...)) as llama:
    ...
    # the model will be freed here

...

abetlen · 2024-06-06T14:37:12Z

@jkawamoto definitely agree here, thank you for the contribution.

A few things I noticed:

We may need to perform some runtime checks / asserts to make sure the model hasn't been closed
Right now you're only closing the model explicitly but the context and batch (which both also hold memory) still use the__del__

I also think we should have all the internal classess support the close / context manager approach, then we can use contextlib.ExitStack in the main Llama class to ensure they're cleaned up properly.

This commit introduces a `close` method to both `Llama` and `_LlamaModel`, allowing users to explicitly free the model from RAM/VRAM. The previous implementation relied on the destructor of `_LlamaModel` to free the model. However, in Python, the timing of destructor calls is unclear—for instance, the `del` statement does not guarantee immediate invocation of the destructor. This commit provides an explicit method to release the model, which works immediately and allows the user to load another model without memory issues. Additionally, this commit implements a context manager in the `Llama` class, enabling the automatic closure of the `Llama` object when used with the `with` statement.

…amaBatch This commit enables automatic resource management by implementing the `ContextManager` protocol in `_LlamaModel`, `_LlamaContext`, and `_LlamaBatch`. This ensures that resources are properly managed and released within a `with` statement, enhancing robustness and safety in resource handling.

This update implements ExitStack to manage and close internal classes in Llama, enhancing efficient and safe resource management.

jkawamoto · 2024-06-07T07:07:01Z

@abetlen Thanks for your comment.

I implemented a context manager in three internal classes and added an ExitStack to the Llama class.

Regarding the runtime check of whether the model, context, and batch are still loaded, it appears that the three inner classes verify the low-level object isn't None before freeing the object. Thus, we can safely close these objects more than once. Would it be necessary to add any additional checks?

abetlen · 2024-06-13T08:15:44Z

@jkawamoto I've update the use of ExitStacks slightly, additionally I removed the explicit __enter__ / __exit__ methods. Since we now have a .close() method we can use contextlib.closing from the standard library to turn the Llama class into a context manager automatically.

from contextlib import closing
from llama_cpp import Llama

with closing(Llama(...)) as llama:
    ...
    # the model will be freed here

jkawamoto · 2024-06-13T18:59:17Z

@abetlen I think it would be better if the Llama class implements a context manager, as this would simplify the initialization of an instance by requiring fewer input characters.

Additionally, it is still worthwhile to retain the __del__ method in Llama to prevent memory leaks in cases where the user forgets to close the object.

jkawamoto mentioned this pull request Jun 6, 2024

Models are failing to be properly unloaded and freeing up VRAM #1442

Open

jkawamoto added 3 commits June 7, 2024 00:50

feat: add ExitStack for Llama's internal class closure

fa702b4

This update implements ExitStack to manage and close internal classes in Llama, enhancing efficient and safe resource management.

jkawamoto force-pushed the context-manager branch from cf2ea15 to fa702b4 Compare June 7, 2024 07:02

abetlen added 3 commits June 13, 2024 03:48

Merge branch 'main' into context-manager

35e8432

Use contextlib ExitStack and closing

bb9d102

Explicitly free model when closing resources on server

0a4d4a4

abetlen merged commit 320a5d7 into abetlen:main Jun 13, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add methods to explicitly free model from memory #1513

Add methods to explicitly free model from memory #1513

Uh oh!

jkawamoto commented Jun 6, 2024 •

edited by abetlen

Loading

Uh oh!

abetlen commented Jun 6, 2024

Uh oh!

jkawamoto commented Jun 7, 2024

Uh oh!

abetlen commented Jun 13, 2024

Uh oh!

Uh oh!

jkawamoto commented Jun 13, 2024

Uh oh!

Uh oh!

Add methods to explicitly free model from memory #1513

Add methods to explicitly free model from memory #1513

Uh oh!

Conversation

jkawamoto commented Jun 6, 2024 • edited by abetlen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abetlen commented Jun 6, 2024

Uh oh!

jkawamoto commented Jun 7, 2024

Uh oh!

abetlen commented Jun 13, 2024

Uh oh!

Uh oh!

jkawamoto commented Jun 13, 2024

Uh oh!

Uh oh!

jkawamoto commented Jun 6, 2024 •

edited by abetlen

Loading