server: recieving <|im_end|> in all responses of llama 3

I have been experiencing this problem since I have tried different models of llama 3 for example:
https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF
https://huggingface.co/QuantFactory/dolphin-2.9-llama3-8b-GGUF

In all responses calling "/chat/completions" returns at the end the '<|im_end|>'.

I'm using the latest docker version for cuda: 'ghcr.io/ggerganov/llama.cpp:server-cuda'

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: recieving <|im_end|> in all responses of llama 3 #6873

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

server: recieving <|im_end|> in all responses of llama 3 #6873

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions