-
Notifications
You must be signed in to change notification settings - Fork 12.1k
server: init functional tests #5566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
100 commits
Select commit
Hold shift + click to select a range
157bcf2
server: init functional test
phymbert 9b63d70
server: tests: reduce number of files, all in one tests shell script
phymbert 6497755
server: tests: fix ci workflow
phymbert 4e5245e
server: tests: fix ci workflow
phymbert 30aa323
server: tests: fix ci workflow
phymbert fe9866a
server: tests: use ngxson llama_xs_q4.bin
phymbert 1680599
server: tests: build only the server
phymbert 8bb586b
server: tests: add health check and concurrent request example
phymbert 6c95ec6
server: tests: change model to: @karpathy's tinyllamas
phymbert 56583be
server: tests: refactor steps and vocabulary
phymbert 9b7ea97
server: tests: add OAI stream test, fix file end of line, fast fail b…
phymbert 11adf1d
server: tests: add OAI multi user scenario
phymbert c355f76
server: tests: slots endpoint checks
phymbert 367b59a
server: tests: check for infinite loops
phymbert b9f8390
server: tests: check for infinite loops
phymbert 0772884
server: tests: add a constant seed in completion request
phymbert 6b9dc4f
server: tests: add infinite loop
phymbert 68574c6
server: tests: add infinite loop scenario
phymbert b0b6d83
server: tests: add infinite loop scenario
phymbert 1ecda0d
server: tests: disable issue 3969 scenario
phymbert e6d4820
server: tests: add embeddings scenario
phymbert 1065f6d
server: tests: add tokenize/detokenize scenario
phymbert 19664b9
server: tests: detokenize endpoint issue reference added
phymbert 6dcbcfe
server: tests: simplify completion scenario
phymbert 672d98f
server: tests: CORS and api key checks scenario
phymbert 3322bfa
server: tests: add a small check to be sure all started threads have …
phymbert 469af4b
server: tests: change CI workflow trigger
phymbert 2a37bd6
server: tests: fix the multi users infinite loop test
phymbert f1d4138
server : fix initialization thread issues
ggerganov 600cbeb
server: test: ci change the GitHub workflow trigger
phymbert 68b8d4e
Merge remote-tracking branch 'origin/master' into test/server-add-ci-…
phymbert 6406208
server: tests:
phymbert 01cca66
server: tests: ci fix model download path
phymbert 534998d
server: tests: ci tests.sh exit code
phymbert a697cd1
minor : fix missing new line
ggerganov 41676d9
ci : actually no reason to exclude GPU code from triggers
ggerganov 016b221
server: fix health/slots endpoint slot state access available race co…
phymbert e43406e
server: tests: switch to asyncio for concurrent tests, match result c…
phymbert 597c181
server: tests: ci do not take a model anymore, fix trigger patch
phymbert 8b96bda
Merge remote-tracking branch 'origin/master' into test/server-add-ci-…
phymbert f820e10
server: tests: ci ensure the server is stopped before scenario, and d…
phymbert aa591ef
server: tests: add Multi users with total number of tokens to predict…
phymbert 26b66c5
server: tests: Fix some random behavior where the wait for busy statu…
phymbert 51f5274
server: tests: ci triggered on any changes on server example path
phymbert cba6d4e
server: tests: minor fix missing param.
phymbert 1bd07e5
server: tests: assert embeddings are actually computed, make the embe…
phymbert 14b6ede
server: tests: minor color change
phymbert b38b9e6
server: tests: minor fix server --alias param passed twice
phymbert 70e9055
server: tests: add log in server start to identify why the server doe…
phymbert 2f756f8
server: tests: allow to override the server port before launching tests
phymbert 6a215e5
server: tests: ci adding container to specify server port and allow t…
phymbert 2bb4732
server: tests: ci adding cmake as it is not present by default in ubu…
phymbert d0e0050
server: tests: ci adding python3-pip as it is not present by default …
phymbert 6e71126
server: tests: ci adding curl as it is not present by default in ubun…
phymbert 6bba3be
server: tests: ci adding psmisc as it is not present by default in ub…
phymbert 5110de0
server: tests: fix coloring console
phymbert bedf37c
server: tests: reducing n_ctx and n_predict for // prompts as it is t…
phymbert 530d3ae
server: tests: reducing sleep time during scenario
phymbert 36ddb96
server: tests: parallel fix server is started twice, add colors to he…
phymbert 0b0f056
server: tests: ci : build and run tests for all matrix defines, sanit…
phymbert 29f8833
server: tests: ci : fix wget missing
phymbert 12bb797
server: tests: ci : add git
phymbert 68cd1a4
server: tests: ci : matrix cuda
phymbert 86896aa
server: tests: ci : continue on error
phymbert 334902b
server: tests: ci : fix step id duplicated
phymbert fce2e00
server: tests: ci : fix cuda install
phymbert e4fb790
server: test: ci fix cuda build
phymbert 2edd995
server: test: ci fix cublas build
phymbert fa51bac
server: test: ci fix matrix
phymbert 606738e
server: test: ci fix clblast
phymbert d159e29
server: test: ci fix openblas build
phymbert 13863ef
server: test: ci matrix
phymbert 4d3791a
server: test: ci matrix, experimental on matrix avx512 entry which fa…
phymbert b94809b
server: test: ci cmake remove all warning as it is done by the classi…
phymbert 5a621e7
server: test: ci make arch not available pass the test
phymbert 54ea4d4
server: test: ax512 experimental
phymbert 5b2ce45
server: test: display server logs in case of failure
phymbert 6dc3af5
server: test: fix CUDA LD PATH
phymbert 83c386f
server: test: ci debug LD path
phymbert 0d380ae
server: test: ci debug CI LD path
phymbert c75e0e1
server: test: ci switch to nvidia based docker image for cuda
phymbert 2c8bf24
server: test: ci give up with nvidia as it requires the nvidia docker…
phymbert 777bdcf
server: test: ci rename step name to Test, change matrix order for be…
phymbert e10b83a
server: test: ci rename job name to Server
phymbert 4d27466
server: tests: move all requests call to asyncio
phymbert 1c1fd40
server: tests: allow to pass argument to the test file
phymbert 2109743
server: tests: print server logs only on github action
phymbert 30f802d
server: tests: check if the server has not crashed after a scenario
phymbert 6c0e6f4
server: tests: adding concurrent embedding in issue #5655
phymbert 77b8589
server: tests: linter
phymbert 7183149
server: tests: fix concurrent OAI streaming request
phymbert 2d107ba
server: tests: add a note regarding inference speed.
phymbert 124ca77
server: tests: removing debug print
phymbert 5957a2d
server: tests - allow print on debug
phymbert 482eb30
server: tests - README.md add build instruction and notice on @bug an…
phymbert 60781f0
server: tests - add explanation about KV Cache.
phymbert a779a4b
server: tests - print only in case of DEBUG
phymbert a2a928c
server: add link to tests in the README.md
phymbert 5ed4452
server: tests: improved README.md
phymbert 99163c8
github issue template: add link to the tests server framework
phymbert File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Server build and tests | ||
name: Server | ||
|
||
on: | ||
workflow_dispatch: # allows manual triggering | ||
push: | ||
branches: | ||
- master | ||
- test/server-add-ci-test # FIXME remove | ||
paths: ['.github/workflows/**', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/**.*'] | ||
pull_request: | ||
types: [opened, synchronize, reopened] | ||
paths: ['**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.swift', '**/*.m', 'examples/server/**.*'] | ||
|
||
jobs: | ||
server: | ||
runs-on: ubuntu-latest | ||
|
||
strategy: | ||
matrix: | ||
build: [noavx, avx2, avx, avx512, cublas, clblast, openblas, kompute, vulkan] | ||
sanitizer: [ADDRESS, THREAD, UNDEFINED] | ||
build_type: [Debug, Release] | ||
include: | ||
- build: 'noavx' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF' | ||
image: ubuntu:latest | ||
- build: 'avx2' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON' | ||
image: ubuntu:latest | ||
- build: 'avx' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX2=OFF' | ||
image: ubuntu:latest | ||
- build: 'avx512' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_AVX512=ON' | ||
image: ubuntu:latest | ||
experimental: true | ||
- build: 'cublas' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_CUBLAS=ON' | ||
image: nvidia/cuda:12.3.1-devel-ubuntu22.04 | ||
arch_not_available: true # require nvidia docker engine | ||
- build: 'clblast' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_CLBLAST=ON' | ||
image: ubuntu:latest | ||
arch_not_available: true | ||
- build: 'openblas' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS' | ||
image: ubuntu:latest | ||
- build: 'kompute' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_KOMPUTE=ON -DKOMPUTE_OPT_DISABLE_VULKAN_VERSION_CHECK=ON' | ||
image: ubuntu:latest | ||
arch_not_available: true | ||
- build: 'vulkan' | ||
defines: '-DLLAMA_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DLLAMA_VULKAN=ON' | ||
image: ubuntu:latest | ||
arch_not_available: true | ||
|
||
container: | ||
image: ${{ matrix.image }} | ||
ports: | ||
- 8888 | ||
options: --cpus 4 | ||
|
||
steps: | ||
- name: Clone | ||
id: checkout | ||
uses: actions/checkout@v3 | ||
|
||
- name: Dependencies | ||
id: depends | ||
run: | | ||
apt-get update | ||
apt-get -y install \ | ||
build-essential \ | ||
pkg-config \ | ||
git \ | ||
cmake \ | ||
python3-pip \ | ||
wget \ | ||
psmisc | ||
|
||
- name: Download CLBlast | ||
id: get_clblast | ||
if: ${{ matrix.build == 'clblast' }} | ||
run: | | ||
apt install -y libclblast-dev | ||
|
||
- name: Download OpenBLAS | ||
id: get_openblas | ||
if: ${{ matrix.build == 'openblas' }} | ||
run: | | ||
apt-get -y install libopenblas-dev | ||
|
||
- name: Install Vulkan SDK | ||
id: get_vulkan | ||
if: ${{ matrix.build == 'kompute' || matrix.build == 'vulkan' }} | ||
run: | | ||
wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | tee /etc/apt/trusted.gpg.d/lunarg.asc | ||
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list http://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list | ||
apt-get update | ||
apt-get -y install vulkan-sdk | ||
|
||
- name: Build | ||
id: cmake_build | ||
run: | | ||
mkdir build | ||
cd build | ||
cmake .. -DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} ${{ matrix.defines }} | ||
cmake --build . --config ${{ matrix.build_type }} -j $(nproc) --target server | ||
|
||
- name: Tests dependencies | ||
id: test_dependencies | ||
run: | | ||
pip install -r examples/server/tests/requirements.txt | ||
|
||
- name: Download models | ||
id: download_models | ||
run: | | ||
cd examples/server/tests | ||
../../../scripts/hf.sh --repo ggml-org/models --file tinyllamas/stories260K.gguf | ||
|
||
- name: Tests | ||
id: server_integration_test | ||
continue-on-error: ${{ matrix.experimental || matrix.arch_not_available }} | ||
run: | | ||
cd examples/server/tests | ||
PORT=8888 ./tests.sh |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Server tests | ||
|
||
Python based server tests scenario using [BDD](https://en.wikipedia.org/wiki/Behavior-driven_development) and [behave](https://behave.readthedocs.io/en/latest/): | ||
* [issues.feature](./features/issues.feature) Pending issues scenario | ||
* [parallel.feature](./features/parallel.feature) Scenario involving multi slots and concurrent requests | ||
* [security.feature](./features/security.feature) Security, CORS and API Key | ||
* [server.feature](./features/server.feature) Server base scenario: completion, embedding, tokenization, etc... | ||
|
||
Tests target GitHub workflows job runners with 4 vCPU. | ||
|
||
Requests are using [aiohttp](https://docs.aiohttp.org/en/stable/client_reference.html), [asyncio](https://docs.python.org/fr/3/library/asyncio.html) based http client. | ||
|
||
Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail. To mitigate it, you can increase values in `n_predict`, `kv_size`. | ||
|
||
### Install dependencies | ||
`pip install -r requirements.txt` | ||
|
||
### Run tests | ||
1. Build the server | ||
```shell | ||
cd ../../.. | ||
mkdir build | ||
cd build | ||
cmake ../ | ||
cmake --build . --target server | ||
``` | ||
2. download required models: | ||
1. `../../../scripts/hf.sh --repo ggml-org/models --file tinyllamas/stories260K.gguf` | ||
3. Start the test: `./tests.sh` | ||
|
||
It's possible to override some scenario steps values with environment variables: | ||
- `PORT` -> `context.server_port` to set the listening port of the server during scenario, default: `8080` | ||
- `LLAMA_SERVER_BIN_PATH` -> to change the server binary path, default: `../../../build/bin/server` | ||
- `DEBUG` -> "ON" to enable steps and server verbose mode `--verbose` | ||
|
||
### Run @bug, @wip or @wrong_usage annotated scenario | ||
|
||
Feature or Scenario must be annotated with `@llama.cpp` to be included in the default scope. | ||
- `@bug` annotation aims to link a scenario with a GitHub issue. | ||
- `@wrong_usage` are meant to show user issue that are actually an expected behavior | ||
- `@wip` to focus on a scenario working in progress | ||
|
||
To run a scenario annotated with `@bug`, start: | ||
`DEBUG=ON ./tests.sh --no-skipped --tags bug` | ||
|
||
After changing logic in `steps.py`, ensure that `@bug` and `@wrong_usage` scenario are updated. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
import os | ||
import socket | ||
import subprocess | ||
import time | ||
from contextlib import closing | ||
from signal import SIGKILL | ||
|
||
|
||
def before_scenario(context, scenario): | ||
print(f"\x1b[33;42mStarting new scenario: {scenario.name}!\x1b[0m") | ||
port = 8080 | ||
if 'PORT' in os.environ: | ||
port = int(os.environ['PORT']) | ||
if is_server_listening("localhost", port): | ||
assert False, "Server already started" | ||
|
||
|
||
def after_scenario(context, scenario): | ||
if scenario.status == "failed": | ||
if 'GITHUB_ACTIONS' in os.environ: | ||
print(f"\x1b[33;101mSCENARIO FAILED: {scenario.name} server logs:\x1b[0m\n\n") | ||
if os.path.isfile('llama.log'): | ||
with closing(open('llama.log', 'r')) as f: | ||
for line in f: | ||
print(line) | ||
if not is_server_listening(context.server_fqdn, context.server_port): | ||
print("\x1b[33;101mERROR: Server stopped listening\x1b[0m") | ||
|
||
if not pid_exists(context.server_process.pid): | ||
assert False, f"Server not running pid={context.server_process.pid} ..." | ||
|
||
print(f"stopping server pid={context.server_process.pid} ...") | ||
context.server_process.kill() | ||
# Wait few for socket to free up | ||
time.sleep(0.05) | ||
|
||
attempts = 0 | ||
while is_server_listening(context.server_fqdn, context.server_port): | ||
print(f"stopping server pid={context.server_process.pid} ...") | ||
os.kill(context.server_process.pid, SIGKILL) | ||
time.sleep(0.1) | ||
attempts += 1 | ||
if attempts > 5: | ||
print(f"Server dangling exits, killing all {context.server_path} ...") | ||
process = subprocess.run(['killall', '-9', context.server_path], | ||
stderr=subprocess.PIPE, | ||
universal_newlines=True) | ||
print(process) | ||
|
||
|
||
def is_server_listening(server_fqdn, server_port): | ||
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as sock: | ||
result = sock.connect_ex((server_fqdn, server_port)) | ||
return result == 0 | ||
|
||
|
||
def pid_exists(pid): | ||
"""Check whether pid exists in the current process table.""" | ||
import errno | ||
if pid < 0: | ||
return False | ||
try: | ||
os.kill(pid, 0) | ||
except OSError as e: | ||
return e.errno == errno.EPERM | ||
else: | ||
return True |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# List of ongoing issues | ||
@bug | ||
Feature: Issues | ||
# Issue #5655 | ||
Scenario: Multi users embeddings | ||
Given a server listening on localhost:8080 | ||
And a model file stories260K.gguf | ||
And a model alias tinyllama-2 | ||
And 42 as server seed | ||
And 64 KV cache size | ||
phymbert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
And 2 slots | ||
And continuous batching | ||
And embeddings extraction | ||
Then the server is starting | ||
Then the server is healthy | ||
|
||
Given a prompt: | ||
""" | ||
Write a very long story about AI. | ||
""" | ||
And a prompt: | ||
""" | ||
Write another very long music lyrics. | ||
""" | ||
And a prompt: | ||
""" | ||
Write a very long poem. | ||
""" | ||
And a prompt: | ||
""" | ||
Write a very long joke. | ||
""" | ||
Given concurrent embedding requests | ||
Then the server is busy | ||
Then the server is idle | ||
Then all embeddings are generated |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.