Skip to content

Commit 958367b

Browse files
authored
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
* server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere
1 parent 40f2555 commit 958367b

File tree

5 files changed

+468
-348
lines changed

5 files changed

+468
-348
lines changed

examples/server/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,18 @@ node index.js
319319
- The prompt is a string or an array with the first element given as a string
320320
- The model's `tokenizer.ggml.add_bos_token` metadata is `true`
321321

322+
These input shapes and data type are allowed for `prompt`:
323+
324+
- Single string: `"string"`
325+
- Single sequence of tokens: `[12, 34, 56]`
326+
- Mixed tokens and strings: `[12, 34, "string", 56, 78]`
327+
328+
Multiple prompts are also supported. In this case, the completion result will be an array.
329+
330+
- Only strings: `["string1", "string2"]`
331+
- Strings and sequences of tokens: `["string1", [12, 34, 56]]`
332+
- Mixed types: `[[12, 34, "string", 56, 78], [12, 34, 56], "string"]`
333+
322334
`temperature`: Adjust the randomness of the generated text. Default: `0.8`
323335

324336
`dynatemp_range`: Dynamic temperature range. The final temperature will be in the range of `[temperature - dynatemp_range; temperature + dynatemp_range]` Default: `0.0`, which is disabled.

0 commit comments

Comments
 (0)