Skip to content

[Feature Request] Make v1/completions behave as OpenAI API #596

@comaniac

Description

@comaniac

🚀 Feature

The current v1/completions in REST API (mlc_chat/rest.py:142) implementation only takes prompt and returns the generated texts. However, it should considers all parameters in the request (e.g., temperature, etc). Also, the current response only includes the generated text, it would be better to also return other information as well as logprobs (if requested).

Motivation

We are evaluating open source LLMs and would like to compare quantized LLaMA 7B model against the BF16 model on:

  1. The quality by HELM evaluation, and
  2. The memory and latency

Alternatives

N/A

Additional context

N/A

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions