You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The latest update introduces new long-text embedding examples and service scripts, incorporating chunk processing support. The README documentation has been revised to include a quick start guide and comprehensive configuration instructions. Server startup scripts have been enhanced with automatic detection of optimal pooling types, significantly improving performance and compatibility for long-text processing.
Copy file name to clipboardExpand all lines: examples/online_serving/openai_embedding_long_text/README.md
+15-13Lines changed: 15 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,34 +10,34 @@ Use the provided script to start a vLLM server with chunked processing enabled:
10
10
11
11
```bash
12
12
# Basic usage (supports very long texts up to ~3M tokens)
13
-
./openai_embedding_long_text_service.sh
13
+
./service.sh
14
14
15
15
# Custom configuration with different models
16
16
MODEL_NAME="jinaai/jina-embeddings-v3" \
17
17
MAX_EMBED_LEN=1048576 \
18
-
./openai_embedding_long_text_service.sh
18
+
./service.sh
19
19
20
20
# For extremely long documents
21
21
MODEL_NAME="intfloat/multilingual-e5-large" \
22
22
MAX_EMBED_LEN=3072000 \
23
-
./openai_embedding_long_text_service.sh
23
+
./service.sh
24
24
```
25
25
26
26
### 2. Test Long Text Embedding
27
27
28
28
Run the comprehensive test client:
29
29
30
30
```bash
31
-
python openai_embedding_long_text_client.py
31
+
python client.py
32
32
```
33
33
34
34
## 📁 Files
35
35
36
36
| File | Description |
37
37
|------|-------------|
38
-
|`openai_embedding_long_text_service.sh`| Server startup script with chunked processing enabled |
39
-
|`openai_embedding_long_text_client.py`| Comprehensive test client for long text embedding |
40
-
|`openai_embedding_client.py`| Basic embedding client (updated with chunked processing info) |
38
+
|`service.sh`| Server startup script with chunked processing enabled |
39
+
|`client.py`| Comprehensive test client for long text embedding |
40
+
|`../openai_embedding_client.py`| Basic embedding client (updated with chunked processing info) |
41
41
42
42
## ⚙️ Configuration
43
43
@@ -47,20 +47,22 @@ The key parameters for chunked processing are in the `--override-pooler-config`:
47
47
48
48
```json
49
49
{
50
-
"pooling_type": "MEAN",
50
+
"pooling_type": "auto",
51
51
"normalize": true,
52
52
"enable_chunked_processing": true,
53
53
"max_embed_len": 3072000
54
54
}
55
55
```
56
56
57
+
**Note**: `pooling_type` sets the model's own pooling strategy for processing within each chunk. The cross-chunk aggregation automatically uses MEAN strategy when input exceeds the model's native maximum length.
58
+
57
59
#### Chunked Processing Behavior
58
60
59
-
Chunked processing now uses **MEAN aggregation** for cross-chunk combination, regardless of the model's native pooling type:
61
+
Chunked processing uses **MEAN aggregation** for cross-chunk combination when input exceeds the model's native maximum length:
0 commit comments