Skip to content

Commit 82c1e40

Browse files
committed
server: bench: change max prompt, use pre downloaded models
1 parent 48db7da commit 82c1e40

File tree

2 files changed

+21
-5
lines changed

2 files changed

+21
-5
lines changed

.github/workflows/bench.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -94,14 +94,14 @@ jobs:
9494
--port 8080 \
9595
--hf-repo ggml-org/models \
9696
--hf-file phi-2/ggml-model-q4_0.gguf \
97-
--model ggml-model.gguf \
97+
--model /models/phi-2/ggml-model-q4_0.gguf \
9898
--metrics \
9999
--parallel 8 \
100100
--batch-size 2048 \
101101
--ubatch-size 256 \
102-
--n-predict 4096 \
102+
--n-predict 2048 \
103103
--ctx-size 16384 \
104-
--defrag-thold 0.8 \
104+
--defrag-thold 0.1 \
105105
--log-format text \
106106
--log-format text \
107107
-ngl 33 &
@@ -117,6 +117,6 @@ jobs:
117117
cd examples/server/bench
118118
SERVER_BENCH_N_PROMPTS=1000 \
119119
SERVER_BENCH_MAX_PROMPT_TOKENS=1024 \
120-
SERVER_BENCH_MAX_CONTEXT=4096 \
121-
SERVER_BENCH_MAX_TOKENS=4096 \
120+
SERVER_BENCH_MAX_CONTEXT=2048 \
121+
SERVER_BENCH_MAX_TOKENS=1024 \
122122
../../../k6 run script.js --duration 10m --iterations 1000 --vus 8

examples/server/bench/bench.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import argparse
2+
3+
4+
def main(args_in: list[str] | None = None) -> None:
5+
parser = argparse.ArgumentParser(description="Start a github self-hosted runner using JIT based on a repo events")
6+
parser.add_argument("--token", type=str, help="GitHub token", required=True)
7+
parser.add_argument("--repo", type=str, help="GitHub repository", required=True)
8+
parser.add_argument("--runner-label", type=str, action="append", help="GitHub Runner group", required=True)
9+
10+
args = parser.parse_args(args_in)
11+
12+
start_mainloop(args)
13+
14+
15+
if __name__ == '__main__':
16+
main()

0 commit comments

Comments
 (0)