Closed
Description
I have been experiencing this problem since I have tried different models of llama 3 for example:
https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF
https://huggingface.co/QuantFactory/dolphin-2.9-llama3-8b-GGUF
In all responses calling "/chat/completions" returns at the end the '<|im_end|>'.
I'm using the latest docker version for cuda: 'ghcr.io/ggerganov/llama.cpp:server-cuda'
Thanks in advance.