[Feature Request] Make v1/completions behave as OpenAI API

## 🚀 Feature

The current `v1/completions` in REST API (mlc_chat/rest.py:142) implementation only takes `prompt` and returns the generated texts. However, it should considers all parameters in the request (e.g., temperature, etc). Also, the current response only includes the generated text, it would be better to also return other information as well as logprobs (if requested).

## Motivation

We are evaluating open source LLMs and would like to compare quantized LLaMA 7B model against the BF16 model on:
1. The quality by HELM evaluation, and
2. The memory and latency

## Alternatives


N/A

## Additional context


N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Make v1/completions behave as OpenAI API #596

🚀 Feature

Motivation

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Make v1/completions behave as OpenAI API #596

Description

🚀 Feature

Motivation

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions