Skip to content

Qwen3-Embedding models: embeddings from TEI differ sharply from Sentence-Transformers referenceΒ #642

@kevinmuto

Description

@kevinmuto

System Info

image: ghcr.io/huggingface/text-embeddings-inference:hopper-sha-a69cc2e
hardware: H100/H200

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

import requests

model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

txt = "this is a test"

st_embedding = model.encode([txt])

tei_embedding = requests.post(
    "http://localhost:8000/embed", #TEI server
    json={"inputs": [txt]},
)

np.dot(tei_embedding.json()[0], st_embedding[0])

Expected behavior

For Qwen3 models, the vectors returned by TEI have very low cosine similarity to vectors produced with the same model loaded via Sentence-Transformers.
Using identical text, pooling mode, and normalization, the cosine similarity between TEI and ST is often <0.2.
When running the same test with another model, e.g. BAAI/bge-base-en-v1.5, the similarity is 1.0, as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions