-
-
Notifications
You must be signed in to change notification settings - Fork 155
Description
Feature Description
As far as I can tell there is currently no way to currently persist context, I would like to be able to dump the internal state of the model to a file or some other form of persistent storage so I can load a chat sometime later and keep the context history.
This would solve the following use cases:
- Continue a chat after a crash or restart.
- Run multiple chats on a single model.
The Solution
Having the ability to access (dump) the internal data as a stream would allow one to save to a file then load it from there later to continue the chat.
Considered Alternatives
It might be possible to save the entire (text) history and load it next time but this seems quite inefficient and would take a long time to load.
Other solutions to solve the above use cases would be welcome.
Additional Context
Does anyone know if llama.cpp expose such a feature which would make it trivial to add to this project?
This project: https://github.com/kuvaus/LlamaGPTJ-chat has a feature like this and I believe it uses llama.cpp so I am thinking there should be a way to do this.
Related Features to This Feature Request
- Metal support
- CUDA support
- Grammar
Are you willing to resolve this issue by submitting a Pull Request?
No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.