Skip to content

Enhanced OpenAI Chat-Completions API Support #827

@vMaroon

Description

@vMaroon

Following up #798 which added initial support for the OpenAI Chat-Completions API, I believe the following enhancements are sensible:

  1. In the mentioned PR, chat-completions requests are partially collapsed into the schedulingtypes.LLMRequest::Prompt field. While this is sensible for current use, the loss of the original fields such as messages, tools and tool_choices would affect scorers that require precise templating of the request - such as ones that utilizes a global KV-cache index.

    • I think instead there should be a clear distinction between prompt from the completions API and the fields of a chat-completions request while balancing efficiency as well
    • This should be postponed until such a scorer is implemented
  2. Since prefix-aware routing is an attempt at estimating the locations of KVCache, this may be sufficient to some degree, but a chat-completions request is more complex. Two chat-completion requests can have the same messages but lead to entirely different KV blocks.

Both issues can be resolved by an approach that utilizes this go-openai package.

Metadata

Metadata

Assignees

Labels

needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions