-
Notifications
You must be signed in to change notification settings - Fork 323
Open
Description
System Info
I'm using the current docker image ghcr.io/huggingface/text-embeddings-inference:turing-1.5 on Debian 11 with CUDA driver 12.2 and an Nvidia T4 GPU.
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Launch the server:
volume="/home/user/model_zoo" && docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:turing-1.5 --model-id "/data/gte-large-en-v1.5"Then make a request:
curl 0.0.0.0:8080/embed -X POST -d '{"inputs": ["Hello?"]}' -H 'Content-Type: application/json'
When the input is a single short sentence, for example {"inputs": ["Hello?"]} or {"inputs": ["What is Deep Learning?"]}, then I obtain all-null results:
[[null,null,...,null,null]]
But two short sentences with different lengths works. Some examples:
{"inputs": ["Hello!"]}: NULL{"inputs": ["What is Deep Learning?"]}: NULL{"inputs": ["Hello!", "Hello!"]}: NULL{"inputs": ["What is Deep Learning?", "What is Deep Learning?"]}: correct results{"inputs": ["Hello!", "What is Deep Learning?"]}: correct results{"inputs": ["Today is a very beautiful day."]}: NULL{"inputs": ["Today is a very beautiful day. What do you think?"]}: correct results
This does not happen with all-MiniLM-L6-v2 for example.
Expected behavior
There should be no Nulls in the output.
Metadata
Metadata
Assignees
Labels
No labels