Skip to content

Commit a835f52

Browse files
committed
The latest update introduces new long-text embedding examples and service scripts, incorporating chunk processing support. The README documentation has been revised to include a quick start guide and comprehensive configuration instructions. Server startup scripts have been enhanced with automatic detection of optimal pooling types, significantly improving performance and compatibility for long-text processing.
Signed-off-by: x22x22 <[email protected]>
1 parent 5536db0 commit a835f52

File tree

3 files changed

+15
-13
lines changed

3 files changed

+15
-13
lines changed

examples/online_serving/openai_embedding_long_text.md renamed to examples/online_serving/openai_embedding_long_text/README.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,34 +10,34 @@ Use the provided script to start a vLLM server with chunked processing enabled:
1010

1111
```bash
1212
# Basic usage (supports very long texts up to ~3M tokens)
13-
./openai_embedding_long_text_service.sh
13+
./service.sh
1414

1515
# Custom configuration with different models
1616
MODEL_NAME="jinaai/jina-embeddings-v3" \
1717
MAX_EMBED_LEN=1048576 \
18-
./openai_embedding_long_text_service.sh
18+
./service.sh
1919

2020
# For extremely long documents
2121
MODEL_NAME="intfloat/multilingual-e5-large" \
2222
MAX_EMBED_LEN=3072000 \
23-
./openai_embedding_long_text_service.sh
23+
./service.sh
2424
```
2525

2626
### 2. Test Long Text Embedding
2727

2828
Run the comprehensive test client:
2929

3030
```bash
31-
python openai_embedding_long_text_client.py
31+
python client.py
3232
```
3333

3434
## 📁 Files
3535

3636
| File | Description |
3737
|------|-------------|
38-
| `openai_embedding_long_text_service.sh` | Server startup script with chunked processing enabled |
39-
| `openai_embedding_long_text_client.py` | Comprehensive test client for long text embedding |
40-
| `openai_embedding_client.py` | Basic embedding client (updated with chunked processing info) |
38+
| `service.sh` | Server startup script with chunked processing enabled |
39+
| `client.py` | Comprehensive test client for long text embedding |
40+
| `../openai_embedding_client.py` | Basic embedding client (updated with chunked processing info) |
4141

4242
## ⚙️ Configuration
4343

@@ -47,20 +47,22 @@ The key parameters for chunked processing are in the `--override-pooler-config`:
4747

4848
```json
4949
{
50-
"pooling_type": "MEAN",
50+
"pooling_type": "auto",
5151
"normalize": true,
5252
"enable_chunked_processing": true,
5353
"max_embed_len": 3072000
5454
}
5555
```
5656

57+
**Note**: `pooling_type` sets the model's own pooling strategy for processing within each chunk. The cross-chunk aggregation automatically uses MEAN strategy when input exceeds the model's native maximum length.
58+
5759
#### Chunked Processing Behavior
5860

59-
Chunked processing now uses **MEAN aggregation** for cross-chunk combination, regardless of the model's native pooling type:
61+
Chunked processing uses **MEAN aggregation** for cross-chunk combination when input exceeds the model's native maximum length:
6062

6163
| Component | Behavior | Description |
6264
|-----------|----------|-------------|
63-
| **Within chunks** | Native pooling (MEAN/CLS/LAST) | Uses model's original pooling strategy |
65+
| **Within chunks** | Model's native pooling | Uses the model's configured pooling strategy |
6466
| **Cross-chunk aggregation** | Always MEAN | Weighted averaging based on chunk token counts |
6567
| **Performance** | Optimal | All chunks processed for complete semantic coverage |
6668

@@ -72,15 +74,15 @@ Chunked processing now uses **MEAN aggregation** for cross-chunk combination, re
7274
| `PORT` | `31090` | Server port |
7375
| `GPU_COUNT` | `1` | Number of GPUs to use |
7476
| `MAX_EMBED_LEN` | `3072000` | Maximum embedding input length (supports very long documents) |
75-
| `POOLING_TYPE` | `auto` | Model's native pooling type: `auto`, `MEAN`, `CLS`, `LAST` |
77+
| `POOLING_TYPE` | `auto` | Model's native pooling type: `auto`, `MEAN`, `CLS`, `LAST` (only affects within-chunk pooling, not cross-chunk aggregation) |
7678
| `API_KEY` | `EMPTY` | API key for authentication |
7779

7880
## 🔧 How It Works
7981

8082
1. **Enhanced Input Validation**: `max_embed_len` allows accepting inputs longer than `max_model_len` without environment variables
8183
2. **Smart Chunking**: Text is split based on `max_position_embeddings` to maintain semantic integrity
82-
3. **Unified Processing**: All chunks processed separately through the model using native pooling
83-
4. **MEAN Aggregation**: Results combined using token count-based weighted averaging across all chunks
84+
3. **Unified Processing**: All chunks processed separately through the model using its configured pooling strategy
85+
4. **MEAN Aggregation**: When input exceeds model's native length, results combined using token count-based weighted averaging across all chunks
8486
5. **Consistent Output**: Final embeddings maintain the same dimensionality as standard processing
8587

8688
### Input Length Handling

0 commit comments

Comments
 (0)