Skip to content

Context length documentation confusion #5732

Closed
@mprudra

Description

@mprudra

n_keep: Specify the number of tokens from the prompt to retain when the context size is exceeded and tokens need to be discarded. By default, this value is set to 0 (meaning no tokens are kept). Use -1 to retain all tokens from the prompt.

Question: Ending n_keep tokens are kept in the context?

truncated: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (tokens_evaluated) plus tokens generated (tokens predicted) exceeded the context size (n_ctx)

examples/server/README.md#result-json

Question: With infinite length output generation, will this return true for intermediate truncation? Where server will truncate some context tokens once it hits the context limit.

A value of -1 will enable infinite text generation, even though we have a finite context window. When the context window is full, some of the earlier tokens (half of the tokens after --n-keep) will be discarded. The context must then be re-evaluated before generation can resume. On large models and/or large context windows, this will result in significant pause in output.

examples/main#number-of-tokens-to-predict

Question: We can keep arbitrary large value of n_predict?
Follow-up: To support this server keeps on generating the output, once it reaches the context-limit it truncates some tokens from start (i/p + o/p_so_far, sliding window of i/p+o/p_so_far), keep on doing this till stop is triggered?

Context: I'm using a context length of 16k (with Deepseek model) and using n_parallel=4 (4 requests to serve in parallel), I noticed as per the server logs: This divides the context length among 4 slots (4k each).
Question: Why is that? Due to memory constraint?
Follow-up: If I really want to support 16k context length of reach request, does setting context length as 16k * n_parallel suffice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions