Skip to content

server: Unable to Utilize Models Outside of 'ChatML' with OpenAI Library #5921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
infozzdatalabs opened this issue Mar 7, 2024 · 4 comments

Comments

@infozzdatalabs
Copy link

I'm unsure whether this is a limitation of the OpenAI library or a result of poor server management. However, after extensively testing various models using the latest server image in Docker with CUDA, I've come to a conclusion. It seems impossible to run a model that utilizes a chat template different from ChatML along with OpenAI library. All attempts resulted in failures in responses. This includes the model located at https://huggingface.co/mlabonne/AlphaMonarch-7B-GGUF, which I requested some time ago. I apologize if this isn't considered a bug, but I'm at a loss for what to do next. Thank you in advance.

@ngxson
Copy link
Collaborator

ngxson commented Mar 7, 2024

FYI, we already support some templates including AlphaMonarch that you mentioned: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

You can run server with --verbose option, it will show the formatted chat in the log.

If the server still uses chatml for monarch, maybe the GGUF model file does not have chat template inside it, or either you're using an old version of server

@infozzdatalabs
Copy link
Author

That's curious; it seems to be formatted fine. I think my issue must be related to the use of special characters like {} or :. That brings another question: is there any way to use a chat template that is not "hardcoded" in llama.cpp? They are always showing newer models with better performance, but I can't use them because of that.

@ngxson
Copy link
Collaborator

ngxson commented Mar 7, 2024

In the same wiki page, I mentioned that for now you can use /completions endpoint instead of /chat/completions, as it allows you to enter the raw prompt that you format yourself.

I agree that it's not easy to use, but AFAIK there are currently 2 most used chat templates and none of them are suitable to be implemented in llama.cpp:

  • Jinja template: the parser is too complicated to implement in cpp
  • LM Studio format: personally I don't like the idea of this format, because it miss many details like where to place BOS, EOS,...

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants