Qwen3-Embedding models: embeddings from TEI differ sharply from Sentence-Transformers reference

### System Info

image: ghcr.io/huggingface/text-embeddings-inference:hopper-sha-a69cc2e
hardware: H100/H200

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

```from sentence_transformers import SentenceTransformer
import requests

model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

txt = "this is a test"

st_embedding = model.encode([txt])

tei_embedding = requests.post(
    "http://localhost:8000/embed", #TEI server
    json={"inputs": [txt]},
)

np.dot(tei_embedding.json()[0], st_embedding[0])
```

### Expected behavior

For Qwen3 models, the vectors returned by TEI have very low cosine similarity to vectors produced with the same model loaded via Sentence-Transformers.
Using identical text, pooling mode, and normalization, the cosine similarity between TEI and ST is often <0.2.
When running the same test with another model, e.g. BAAI/bge-base-en-v1.5, the similarity is 1.0, as expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-Embedding models: embeddings from TEI differ sharply from Sentence-Transformers reference #642

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-Embedding models: embeddings from TEI differ sharply from Sentence-Transformers reference #642

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions