-
Notifications
You must be signed in to change notification settings - Fork 12k
Server Example Refactor and Improvements #1570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 157 commits
Commits
Show all changes
161 commits
Select commit
Hold shift + click to select a range
1c3fdf8
Add all generation parameters to server.cpp and allow resetting context
digiwombat 2071d73
Forgot to remove some testing code.
digiwombat 421e66b
Update examples/server/server.cpp
digiwombat add5f1b
Update examples/server/server.cpp
digiwombat 3537ad1
Merge branch 'ggerganov:master' into master
digiwombat 8d7b28c
Fixed some types in the params.
digiwombat c2b55cc
Added LoRA Loading
digiwombat 48cb16a
Merge branch 'ggerganov:master' into master
digiwombat 66ed19d
Corrected dashes in the help lines.
digiwombat 36c86d7
Automate Context resetting and minor fixes
digiwombat d20f36b
Removed unnecessary last_prompt_token set
digiwombat fdce895
Merge branch 'ggerganov:master' into master
digiwombat e84b802
Change top_k type.
digiwombat 1f40a78
Didn't see the already defined top_k var.
digiwombat 51e0994
server rewrite
SlyEcho f93fe36
Add all generation parameters to server.cpp and allow resetting context
digiwombat df0e0d0
Forgot to remove some testing code.
digiwombat 549291f
keep processed from the beginning
SlyEcho 177868e
Changed to params/args
digiwombat e8efd75
Initial timeout code and expanded json return on completion.
digiwombat 23928f2
Added generation_settings to final json object.
digiwombat 2e5c5ee
Changed JSON names to match the parameter name rather than the variab…
digiwombat dda915c
Added capturing the stopping word and sending it along with the final…
digiwombat 7740301
Set unspecified generation settings back to default. (Notes below)
digiwombat 7186d65
seed and gen params
SlyEcho 15ddc49
Merge remote-tracking branch 'slyecho/server_refactor'
digiwombat 74c6f36
Editorconfig suggested fixes
SlyEcho 2c9ee7a
Apply suggestions from code review
digiwombat 655899d
Add ignore_eos option to generation settings.
digiwombat b38d41e
--memory_f32 flag to --memory-f32 to match common.cpp
digiwombat 6c58f64
--ctx_size flag to --ctx-size to match common.cpp
digiwombat 33b6957
Fixed failing to return result on stopping token.
digiwombat 42cf4d8
Merge branch 'master' into master
SlyEcho 03ea8f0
Fix for the regen issue.
digiwombat d6fff56
add streaming via server-sent events
3292f05
Changed to single API endpoint for streaming and non.
digiwombat 38eaf2b
Removed testing fprintf calls.
digiwombat a25f830
Default streaming to false if it's not set in the request body.
digiwombat 2533878
Merge branch 'master' into sse
digiwombat e6de69a
Merge pull request #3 from anon998/sse
digiwombat 7a853dc
prevent the server from swallowing exceptions in debug mode
aa0788b
add --verbose flag and request logging
9197674
Merge pull request #4 from anon998/logging
digiwombat b6f536d
Cull to end of generated_text when encountering a stopping string in …
digiwombat 7a8104f
add missing quote when printing stopping strings
3a079d5
stop generating when the stream is closed
9f2424a
Merge pull request #5 from anon998/stop-stream
digiwombat c1cbde8
print error when server can't bind to the interface
2c08f29
make api server use only a single thread
284bc29
reserve memory for generated_text
f1710b9
add infinite generation when n_predict is -1
aa2bbb2
fix parameter type
27911d6
fix default model alias
dd30219
buffer incomplete multi-byte characters
40e1380
print timings + build info
d58e486
default penalize_nl to false + format
3edaf6b
print timings by default
96fa480
Merge pull request #6 from anon998/fix-multibyte
digiwombat 7332b41
Simple single-line server log for requests
digiwombat dda4c10
Switch to the CPPHTTPLIB logger. Verbose adds body dump as well as re…
digiwombat 86337e3
Server console logs now come in one flavor: Verbose.
digiwombat 1b96df2
Spacing fix. Nothing to see here.
digiwombat 276fa99
Misunderstood the instructions, I think. Back to the raw JSON output …
digiwombat 43d295f
filter empty stopping strings
1bd7cc6
reuse format_generation_settings for logging
497160a
remove old log function
f2e1130
Merge pull request #7 from anon998/logging-reuse
digiwombat 9104fe5
Change how the token buffers work.
SlyEcho 8478e59
Merge pull request #8 from SlyEcho/server_refactor
digiwombat bed308c
Apply suggestions from code review
SlyEcho 342604b
Added a super simple CORS header as default for all endpoints.
digiwombat e9b1f0b
fix stopping strings
5f6e16d
Merge pull request #9 from anon998/stopping-strings
digiwombat f7882e2
Fixed a crash caused by erasing from empty last_n_tokens
digiwombat 5bbc030
Add Options enpoints and Access-Control-Allow-Headers to satisfy CORS…
cirk2 8c6a5fc
last tokens fixes
SlyEcho 9531ae6
Add logit bias support
SlyEcho 797155a
Merge pull request #10 from cirk2/master
digiwombat af71126
Merge pull request #11 from SlyEcho/server_refactor
digiwombat 49a18bd
remove unused parameter warning
6025476
default penalize_nl back to true
8cbc4be
clear logit_bias between requests + print
d29b6d5
Merge pull request #12 from anon998/clear-logit-bias
digiwombat 0bc0477
Apply suggestions from code review
SlyEcho 731ecc0
fix typo
ebfead6
remove unused variables
1488a0f
make functions that never return false void
49dce94
make types match gpt_params exactly
a8a9f19
small fixes
2932db1
avoid creating element in logit_bias accidentally
47efbb5
use std::isinf to check if ignore_eos is active
88cc7bb
Stuff with logits
SlyEcho abb7782
Merge branch 'master' into small-fixes
bebea65
Merge pull request #13 from anon998/small-fixes
digiwombat 8f9e546
trim partial stopping strings when not streaming
f820740
move multibyte check to doCompletion
f5d5e70
Merge pull request #14 from anon998/do-completion-update
digiwombat 1bd52c8
Merge branch 'ggerganov:master' into master
digiwombat 3df0192
improve long input truncation
SlyEcho 28cc0cd
Merge pull request #15 from SlyEcho/server_refactor
digiwombat 3ff27d3
Fixed up a few things in embedding mode.
digiwombat 41bb71b
replace invalid characters instead of crashing
4dd72fc
Merge pull request #16 from anon998/fix-log-json
digiwombat 16e1c98
Removed the embedding api endpoint and associated code.
digiwombat 7cebe2e
Merge branch 'master' of https://github.com/digiwombat/llama.cpp
digiwombat bcd6167
improve docs and example
SlyEcho de6df48
Removed embedding from README
digiwombat 310bf61
Merge pull request #17 from SlyEcho/server_refactor
digiwombat 5758e9f
Removed embedding from flags.
digiwombat e1e2be2
remove --keep from help text
a6ed390
update readme
05a5a48
make help text load faster
98ae2de
parse --mlock and --no-mmap + format
df2ecc9
Merge pull request #18 from anon998/update-readme
digiwombat 64a0653
Merge remote-tracking branch 'upstream/master'
digiwombat 61befcb
Apply suggestions from code review
SlyEcho ccd85e0
Apply suggestions from code review
SlyEcho a9c3477
Spaces to 4 and other code style cleanup. Notes in README.
digiwombat cc2b336
Missed a pair of catch statements for formatting.
digiwombat 23a1b18
Merge branch 'ggerganov:master' into master
digiwombat 7580427
Resolving some review comments
digiwombat 889d904
Merge branch 'master' of https://github.com/digiwombat/llama.cpp
digiwombat 7cdeb08
More formatting cleanup
digiwombat 1a9141b
Remove model assign in main(). Clarified stop in README.
digiwombat 917540c
Clarify build instructions in README.
lesaun d6d263f
Merge pull request #19 from lesaun/master
digiwombat bac0ddb
Merge branch 'ggerganov:master' into master
digiwombat 2c00bf8
more formatting changes
SlyEcho 9612d12
big logging update
SlyEcho 6518f9c
build settings
SlyEcho eee8b28
Merge pull request #20 from SlyEcho/server_refactor
digiwombat 4148b9b
remove void
SlyEcho dff11a1
json parsing improvements
SlyEcho 13cf692
more json changes and stop info
SlyEcho b91200a
javascript chat update.
SlyEcho 1510337
fix make flags propagation
SlyEcho fc4264d
api url
SlyEcho 28694f7
add a simple bash script too
SlyEcho 429ed95
move CPPHTTPLIB settings inside server
SlyEcho f344d09
streaming shell script
SlyEcho 50e7c54
Merge pull request #21 from SlyEcho/server_refactor
digiwombat fc78910
Merge branch 'ggerganov:master' into master
digiwombat 6d72f0f
Make chat shell script work by piping the content out of the subshell.
digiwombat 9d564db
trim response and trim trailing space in prompt
9099709
Merge pull request #22 from anon998/bash-trim
digiwombat b8b8a6e
Add log flush
SlyEcho 6627a02
Allow overriding the server address
SlyEcho 1f39452
remove old verbose variable
99ef967
add static prefix to the other functions too
575cf23
remove json_indent variable
7df316b
fix linter warnings + make variables const
7a48ade
fix comment indentation
6075d78
Merge pull request #23 from anon998/fix-linter-warnings
digiwombat 546f850
Update examples/server/server.cpp
SlyEcho bd81096
fix typo in readme + don't ignore integers
5e107c2
Merge pull request #24 from anon998/logit-bias
digiwombat f858cd6
Merge remote-tracking branch 'upstream/master'
digiwombat aee8595
Update README.md
digiwombat 488c62a
Merge remote-tracking branch 'upstream/master'
digiwombat fb49c05
Merge branch 'ggerganov:master' into master
digiwombat 1b4b93a
Merge branch 'ggerganov:master' into master
digiwombat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ models/* | |
/embedding | ||
/benchmark-matmult | ||
/vdot | ||
/server | ||
/Pipfile | ||
/libllama.so | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
import * as readline from 'node:readline' | ||
import { stdin, stdout } from 'node:process' | ||
|
||
const API_URL = 'http://127.0.0.1:8080' | ||
|
||
const chat = [ | ||
{ | ||
human: "Hello, Assistant.", | ||
assistant: "Hello. How may I help you today?" | ||
}, | ||
{ | ||
human: "Please tell me the largest city in Europe.", | ||
assistant: "Sure. The largest city in Europe is Moscow, the capital of Russia." | ||
}, | ||
] | ||
|
||
const instruction = `A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.` | ||
|
||
function format_prompt(question) { | ||
return `${instruction}\n${ | ||
chat.map(m =>`### Human: ${m.human}\n### Assistant: ${m.assistant}`).join("\n") | ||
}\n### Human: ${question}\n### Assistant:` | ||
} | ||
|
||
async function tokenize(content) { | ||
const result = await fetch(`${API_URL}/tokenize`, { | ||
method: 'POST', | ||
body: JSON.stringify({ content }) | ||
}) | ||
|
||
if (!result.ok) { | ||
return [] | ||
} | ||
|
||
return await result.json().tokens | ||
} | ||
|
||
const n_keep = await tokenize(instruction).length | ||
|
||
async function chat_completion(question) { | ||
const result = await fetch(`${API_URL}/completion`, { | ||
method: 'POST', | ||
body: JSON.stringify({ | ||
prompt: format_prompt(question), | ||
temperature: 0.2, | ||
top_k: 40, | ||
top_p: 0.9, | ||
n_keep: n_keep, | ||
n_predict: 256, | ||
stop: ["\n### Human:"], // stop completion after generating this | ||
stream: true, | ||
}) | ||
}) | ||
|
||
if (!result.ok) { | ||
return | ||
} | ||
|
||
let answer = '' | ||
|
||
for await (var chunk of result.body) { | ||
const t = Buffer.from(chunk).toString('utf8') | ||
if (t.startsWith('data: ')) { | ||
const message = JSON.parse(t.substring(6)) | ||
answer += message.content | ||
process.stdout.write(message.content) | ||
if (message.stop) { | ||
if (message.truncated) { | ||
chat.shift() | ||
} | ||
break | ||
} | ||
} | ||
} | ||
|
||
process.stdout.write('\n') | ||
chat.push({ human: question, assistant: answer.trimStart() }) | ||
} | ||
|
||
const rl = readline.createInterface({ input: stdin, output: stdout }); | ||
|
||
const readlineQuestion = (rl, query, options) => new Promise((resolve, reject) => { | ||
rl.question(query, options, resolve) | ||
}); | ||
|
||
while(true) { | ||
const question = await readlineQuestion(rl, '> ') | ||
await chat_completion(question) | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
#!/bin/bash | ||
|
||
API_URL="${API_URL:-http://127.0.0.1:8080}" | ||
|
||
CHAT=( | ||
"Hello, Assistant." | ||
"Hello. How may I help you today?" | ||
"Please tell me the largest city in Europe." | ||
"Sure. The largest city in Europe is Moscow, the capital of Russia." | ||
) | ||
|
||
INSTRUCTION="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions." | ||
|
||
trim() { | ||
shopt -s extglob | ||
set -- "${1##+([[:space:]])}" | ||
printf "%s" "${1%%+([[:space:]])}" | ||
} | ||
|
||
trim_trailing() { | ||
shopt -s extglob | ||
printf "%s" "${1%%+([[:space:]])}" | ||
} | ||
|
||
format_prompt() { | ||
echo -n "${INSTRUCTION}" | ||
printf "\n### Human: %s\n### Assistant: %s" "${CHAT[@]}" "$1" | ||
} | ||
|
||
tokenize() { | ||
curl \ | ||
--silent \ | ||
--request POST \ | ||
--url "${API_URL}/tokenize" \ | ||
--data-raw "$(jq -ns --arg content "$1" '{content:$content}')" \ | ||
| jq '.tokens[]' | ||
} | ||
|
||
N_KEEP=$(tokenize "${INSTRUCTION}" | wc -l) | ||
|
||
chat_completion() { | ||
PROMPT="$(trim_trailing "$(format_prompt "$1")")" | ||
DATA="$(echo -n "$PROMPT" | jq -Rs --argjson n_keep $N_KEEP '{ | ||
prompt: ., | ||
temperature: 0.2, | ||
top_k: 40, | ||
top_p: 0.9, | ||
n_keep: $n_keep, | ||
n_predict: 256, | ||
stop: ["\n### Human:"], | ||
stream: true | ||
}')" | ||
|
||
ANSWER='' | ||
|
||
while IFS= read -r LINE; do | ||
if [[ $LINE = data:* ]]; then | ||
CONTENT="$(echo "${LINE:5}" | jq -r '.content')" | ||
printf "%s" "${CONTENT}" | ||
ANSWER+="${CONTENT}" | ||
fi | ||
done < <(curl \ | ||
--silent \ | ||
--no-buffer \ | ||
--request POST \ | ||
--url "${API_URL}/completion" \ | ||
--data-raw "${DATA}") | ||
|
||
printf "\n" | ||
|
||
CHAT+=("$1" "$(trim "$ANSWER")") | ||
} | ||
|
||
while true; do | ||
read -r -e -p "> " QUESTION | ||
chat_completion "${QUESTION}" | ||
done |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.