Skip to content

Add n_keep parameter to LLama constructor to enable Streaming-LLM #954

Open
@twoletters

Description

@twoletters

A recent paper by Meta/MIT/CMU proposed StreamingLLM, a simple yet efficient solution to enable "infinite" context. Better yet, the implementation in llama.cpp is as trivial as changing the n_keep value with option --keep as discussed in this issue. Unfortunately, the high-level API of llama-cpp-python does not support the keep/n_keep parameter.

It should be simple to add the parameter to the high-level API, ideally in the constructor for class Llama and to pass it along to function llama_cpp.llama_load_model_from_file as part of parameter lparams here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions