Closed
Description
Hello,
Now that I've updated llama.cpp to lama_cpp_python-0.2.14, I've noticed that when a new prompt is ready to be processed, it takes a lot of time (usually ~50 tokens context takes less than 2 seconds, now it takes 40 seconds!), the following outputs (with the same prompts) are ok though.
I also noticed that when I start the output just after "ASSISTANT:" (for Xwin), it just breaks the output, it never happened before