Enhanced OpenAI Chat-Completions API Support

Following up #798 which added initial support for the OpenAI Chat-Completions API, I believe the following enhancements are sensible:

1. In the mentioned PR, chat-completions requests are partially collapsed into the `schedulingtypes.LLMRequest::Prompt` field. While this is sensible for current use, the loss of the original fields such as `messages`, `tools` and `tool_choices` would affect scorers that require precise templating of the request - such as ones that utilizes a global KV-cache index.
   - I think instead there should be a clear distinction between `prompt` from the completions API and the fields of a chat-completions request while balancing efficiency as well
   - **This should be postponed until such a scorer is implemented**
 
2. Since prefix-aware routing is an attempt at estimating the locations of KVCache, this may be sufficient to some degree, but a chat-completions request is more complex. Two chat-completion requests can have the same messages but lead to entirely different KV blocks.
    - See this struct for example: https://github.com/sashabaranov/go-openai/blob/6181facea7e6e5525b6b8da42205d7cce822c249/chat.go#L95
    - And an example to how a chat-completions request is templated before tokenization in vLLM: https://github.com/vllm-project/vllm/blob/main/examples/tool_chat_template_llama3.2_json.jinja

Both issues can be resolved by an approach that utilizes this [go-openai package](https://github.com/sashabaranov/go-openai/blob/master/chat.go).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhanced OpenAI Chat-Completions API Support #827

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhanced OpenAI Chat-Completions API Support #827

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions