Add n_keep parameter to LLama constructor to enable Streaming-LLM

A recent [paper](https://arxiv.org/pdf/2309.17453.pdf) by Meta/MIT/CMU proposed [StreamingLLM](https://github.com/mit-han-lab/streaming-llm/), a simple yet efficient solution to enable "infinite" context. Better yet, the implementation in llama.cpp is as trivial as changing the `n_keep` value with option `--keep` as discussed [in this issue](https://github.com/ggerganov/llama.cpp/issues/3440). Unfortunately, the high-level API of llama-cpp-python does not support the `keep`/`n_keep` parameter.

It should be simple to add the parameter to the high-level API, ideally in the constructor for class `Llama` and to pass it along to function `llama_cpp.llama_load_model_from_file` as part of parameter `lparams` [here](https://github.com/abetlen/llama-cpp-python/blob/f3b844ed0a139fc5799d6e515e9d1d063c311f97/examples/low_level_api/low_level_api_llama_cpp.py#L16C19-L16C45).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add n_keep parameter to LLama constructor to enable Streaming-LLM #954

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add n_keep parameter to LLama constructor to enable Streaming-LLM #954

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions