Skip to content

Commit 79ef3c0

Browse files
authored
server: tests: embeddings use a real embeddings model (#5908)
1 parent bfb121f commit 79ef3c0

File tree

6 files changed

+161
-93
lines changed

6 files changed

+161
-93
lines changed

.github/workflows/server.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,8 @@ jobs:
5858
cmake \
5959
python3-pip \
6060
wget \
61-
psmisc
61+
psmisc \
62+
language-pack-en
6263
6364
- name: Build
6465
id: cmake_build
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
@llama.cpp
2+
@embeddings
3+
Feature: llama.cpp server
4+
5+
Background: Server startup
6+
Given a server listening on localhost:8080
7+
And a model file bert-bge-small/ggml-model-f16.gguf from HF repo ggml-org/models
8+
And a model alias bert-bge-small
9+
And 42 as server seed
10+
And 2 slots
11+
And 512 as batch size
12+
And 1024 KV cache size
13+
And embeddings extraction
14+
Then the server is starting
15+
Then the server is healthy
16+
17+
Scenario: Embedding
18+
When embeddings are computed for:
19+
"""
20+
What is the capital of Bulgaria ?
21+
"""
22+
Then embeddings are generated
23+
24+
Scenario: OAI Embeddings compatibility
25+
Given a model bert-bge-small
26+
When an OAI compatible embeddings computation request for:
27+
"""
28+
What is the capital of Spain ?
29+
"""
30+
Then embeddings are generated
31+
32+
Scenario: OAI Embeddings compatibility with multiple inputs
33+
Given a model bert-bge-small
34+
Given a prompt:
35+
"""
36+
In which country Paris is located ?
37+
"""
38+
And a prompt:
39+
"""
40+
Is Madrid the capital of Spain ?
41+
"""
42+
When an OAI compatible embeddings computation request for multiple inputs
43+
Then embeddings are generated
44+
45+
Scenario: Multi users embeddings
46+
Given a prompt:
47+
"""
48+
Write a very long story about AI.
49+
"""
50+
And a prompt:
51+
"""
52+
Write another very long music lyrics.
53+
"""
54+
And a prompt:
55+
"""
56+
Write a very long poem.
57+
"""
58+
And a prompt:
59+
"""
60+
Write a very long joke.
61+
"""
62+
Given concurrent embedding requests
63+
Then the server is busy
64+
Then the server is idle
65+
Then all embeddings are generated
66+
67+
Scenario: Multi users OAI compatibility embeddings
68+
Given a prompt:
69+
"""
70+
In which country Paris is located ?
71+
"""
72+
And a prompt:
73+
"""
74+
Is Madrid the capital of Spain ?
75+
"""
76+
And a prompt:
77+
"""
78+
What is the biggest US city ?
79+
"""
80+
And a prompt:
81+
"""
82+
What is the capital of Bulgaria ?
83+
"""
84+
And a model bert-bge-small
85+
Given concurrent OAI embedding requests
86+
Then the server is busy
87+
Then the server is idle
88+
Then all embeddings are generated
89+
90+
@wip
91+
Scenario: All embeddings should be the same
92+
Given 20 fixed prompts
93+
And a model bert-bge-small
94+
Given concurrent OAI embedding requests
95+
Then the server is busy
96+
Then the server is idle
97+
Then all embeddings are the same

examples/server/tests/features/parallel.feature

Lines changed: 0 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ Feature: Parallel
99
And 512 as batch size
1010
And 64 KV cache size
1111
And 2 slots
12-
And embeddings extraction
1312
And continuous batching
1413
Then the server is starting
1514
Then the server is healthy
@@ -99,48 +98,3 @@ Feature: Parallel
9998
Then the server is busy
10099
Then the server is idle
101100
Then all prompts are predicted
102-
103-
Scenario: Multi users embeddings
104-
Given a prompt:
105-
"""
106-
Write a very long story about AI.
107-
"""
108-
And a prompt:
109-
"""
110-
Write another very long music lyrics.
111-
"""
112-
And a prompt:
113-
"""
114-
Write a very long poem.
115-
"""
116-
And a prompt:
117-
"""
118-
Write a very long joke.
119-
"""
120-
Given concurrent embedding requests
121-
Then the server is busy
122-
Then the server is idle
123-
Then all embeddings are generated
124-
125-
Scenario: Multi users OAI compatibility embeddings
126-
Given a prompt:
127-
"""
128-
In which country Paris is located ?
129-
"""
130-
And a prompt:
131-
"""
132-
Is Madrid the capital of Spain ?
133-
"""
134-
And a prompt:
135-
"""
136-
What is the biggest US city ?
137-
"""
138-
And a prompt:
139-
"""
140-
What is the capital of Bulgaria ?
141-
"""
142-
And a model tinyllama-2
143-
Given concurrent OAI embedding requests
144-
Then the server is busy
145-
Then the server is idle
146-
Then all embeddings are generated

examples/server/tests/features/server.feature

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -49,34 +49,6 @@ Feature: llama.cpp server
4949
| llama-2 | Book | What is the best book | 8 | (Mom\|what)+ | 8 | disabled |
5050
| codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 64 | (thanks\|happy\|bird)+ | 32 | enabled |
5151

52-
Scenario: Embedding
53-
When embeddings are computed for:
54-
"""
55-
What is the capital of Bulgaria ?
56-
"""
57-
Then embeddings are generated
58-
59-
Scenario: OAI Embeddings compatibility
60-
Given a model tinyllama-2
61-
When an OAI compatible embeddings computation request for:
62-
"""
63-
What is the capital of Spain ?
64-
"""
65-
Then embeddings are generated
66-
67-
Scenario: OAI Embeddings compatibility with multiple inputs
68-
Given a model tinyllama-2
69-
Given a prompt:
70-
"""
71-
In which country Paris is located ?
72-
"""
73-
And a prompt:
74-
"""
75-
Is Madrid the capital of Spain ?
76-
"""
77-
When an OAI compatible embeddings computation request for multiple inputs
78-
Then embeddings are generated
79-
8052
Scenario: Tokenize / Detokenize
8153
When tokenizing:
8254
"""

0 commit comments

Comments
 (0)