Skip to content

Commit 4026166

Browse files
committed
docs: Update completion and chat_completion parameter docstrings
1 parent 945e20f commit 4026166

File tree

1 file changed

+53
-7
lines changed

1 file changed

+53
-7
lines changed

llama_cpp/llama.py

Lines changed: 53 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1863,13 +1863,27 @@ def create_completion(
18631863
suffix: A suffix to append to the generated text. If None, no suffix is appended.
18641864
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0 or None, the maximum number of tokens to generate is unlimited and depends on n_ctx.
18651865
temperature: The temperature to use for sampling.
1866-
top_p: The top-p value to use for sampling.
1866+
top_p: The top-p value to use for nucleus sampling. Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1867+
min_p: The min-p value to use for minimum p sampling. Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
1868+
typical_p: The typical-p value to use for sampling. Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
18671869
logprobs: The number of logprobs to return. If None, no logprobs are returned.
18681870
echo: Whether to echo the prompt.
18691871
stop: A list of strings to stop generation when encountered.
1872+
frequency_penalty: The penalty to apply to tokens based on their frequency in the prompt.
1873+
presence_penalty: The penalty to apply to tokens based on their presence in the prompt.
18701874
repeat_penalty: The penalty to apply to repeated tokens.
1871-
top_k: The top-k value to use for sampling.
1875+
top_k: The top-k value to use for sampling. Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
18721876
stream: Whether to stream the results.
1877+
seed: The seed to use for sampling.
1878+
tfs_z: The tail-free sampling parameter. Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
1879+
mirostat_mode: The mirostat sampling mode.
1880+
mirostat_tau: The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
1881+
mirostat_eta: The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
1882+
model: The name to use for the model in the completion object.
1883+
stopping_criteria: A list of stopping criteria to use.
1884+
logits_processor: A list of logits processors to use.
1885+
grammar: A grammar to use for constrained sampling.
1886+
logit_bias: A logit bias to use.
18731887
18741888
Raises:
18751889
ValueError: If the requested tokens exceed the context window.
@@ -1944,15 +1958,29 @@ def __call__(
19441958
Args:
19451959
prompt: The prompt to generate text from.
19461960
suffix: A suffix to append to the generated text. If None, no suffix is appended.
1947-
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0, the maximum number of tokens to generate is unlimited and depends on n_ctx.
1961+
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0 or None, the maximum number of tokens to generate is unlimited and depends on n_ctx.
19481962
temperature: The temperature to use for sampling.
1949-
top_p: The top-p value to use for sampling.
1963+
top_p: The top-p value to use for nucleus sampling. Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1964+
min_p: The min-p value to use for minimum p sampling. Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
1965+
typical_p: The typical-p value to use for sampling. Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
19501966
logprobs: The number of logprobs to return. If None, no logprobs are returned.
19511967
echo: Whether to echo the prompt.
19521968
stop: A list of strings to stop generation when encountered.
1969+
frequency_penalty: The penalty to apply to tokens based on their frequency in the prompt.
1970+
presence_penalty: The penalty to apply to tokens based on their presence in the prompt.
19531971
repeat_penalty: The penalty to apply to repeated tokens.
1954-
top_k: The top-k value to use for sampling.
1972+
top_k: The top-k value to use for sampling. Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
19551973
stream: Whether to stream the results.
1974+
seed: The seed to use for sampling.
1975+
tfs_z: The tail-free sampling parameter. Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
1976+
mirostat_mode: The mirostat sampling mode.
1977+
mirostat_tau: The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
1978+
mirostat_eta: The learning rate used to update `mu` based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause `mu` to be updated more quickly, while a smaller learning rate will result in slower updates.
1979+
model: The name to use for the model in the completion object.
1980+
stopping_criteria: A list of stopping criteria to use.
1981+
logits_processor: A list of logits processors to use.
1982+
grammar: A grammar to use for constrained sampling.
1983+
logit_bias: A logit bias to use.
19561984
19571985
Raises:
19581986
ValueError: If the requested tokens exceed the context window.
@@ -2024,13 +2052,31 @@ def create_chat_completion(
20242052
20252053
Args:
20262054
messages: A list of messages to generate a response for.
2055+
functions: A list of functions to use for the chat completion.
2056+
function_call: A function call to use for the chat completion.
2057+
tools: A list of tools to use for the chat completion.
2058+
tool_choice: A tool choice to use for the chat completion.
20272059
temperature: The temperature to use for sampling.
2028-
top_p: The top-p value to use for sampling.
2029-
top_k: The top-k value to use for sampling.
2060+
top_p: The top-p value to use for nucleus sampling. Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
2061+
top_k: The top-k value to use for sampling. Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
2062+
min_p: The min-p value to use for minimum p sampling. Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
2063+
typical_p: The typical-p value to use for sampling. Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
20302064
stream: Whether to stream the results.
20312065
stop: A list of strings to stop generation when encountered.
2066+
seed: The seed to use for sampling.
2067+
response_format: The response format to use for the chat completion. Use { "type": "json_object" } to contstrain output to only valid json.
20322068
max_tokens: The maximum number of tokens to generate. If max_tokens <= 0 or None, the maximum number of tokens to generate is unlimited and depends on n_ctx.
2069+
presence_penalty: The penalty to apply to tokens based on their presence in the prompt.
2070+
frequency_penalty: The penalty to apply to tokens based on their frequency in the prompt.
20332071
repeat_penalty: The penalty to apply to repeated tokens.
2072+
tfs_z: The tail-free sampling parameter.
2073+
mirostat_mode: The mirostat sampling mode.
2074+
mirostat_tau: The mirostat sampling tau parameter.
2075+
mirostat_eta: The mirostat sampling eta parameter.
2076+
model: The name to use for the model in the completion object.
2077+
logits_processor: A list of logits processors to use.
2078+
grammar: A grammar to use.
2079+
logit_bias: A logit bias to use.
20342080
20352081
Returns:
20362082
Generated chat completion or a stream of chat completion chunks.

0 commit comments

Comments
 (0)