Skip to content

Optimize JSON property order for prompt caching #316

Open
@IvanLuchkin

Description

@IvanLuchkin

Context/motivation: ability to reduce costs while using the APIs via this SDK.

Image

In order to increase the chances of prompts hitting the cache, OpenAI suggests the following:

Image

As far as I understand, and based on experimentation and monitoring, caching works for the entirety of the content passed onto the LLM, and the structure of this content is inherited from the structure of the request body JSON.

While debugging the SDK I have found out that there is no straightforward way to control the structure of Responses API requests.

Example:

  • Developer message (static)
  • User message (static)
  • User message (dynamic)
  • Structured outputs schema (static)

Alternative structure to maximize the probability of cache hits:

  • Structured outputs schema (static)
  • Developer message (static)
  • User message (static)
  • User message (dynamic)

I have tried to call .text() function on ResponseCreateParams after everything else but it has no effect on the resulting request body.

Can there be a workaround or even functionality for this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions