You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current v1/completions in REST API (mlc_chat/rest.py:142) implementation only takes prompt and returns the generated texts. However, it should considers all parameters in the request (e.g., temperature, etc). Also, the current response only includes the generated text, it would be better to also return other information as well as logprobs (if requested).
Motivation
We are evaluating open source LLMs and would like to compare quantized LLaMA 7B model against the BF16 model on: