Prefix cache plugin should also add response to the cache

**What would you like to be added**:

To do this, we simply add the PostResponse extension point to the prefix plugin, and update the cache with the response text.

**Why is this needed**:

The generated tokens are also cached by the model servers (vLLM at least). Upon receiving the response, the prefix plugin should also add the response to the prefix indexer. This makes the prefix indexer more accurate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefix cache plugin should also add response to the cache #971

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prefix cache plugin should also add response to the cache #971

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions