Description
Context/motivation: ability to reduce costs while using the APIs via this SDK.
In order to increase the chances of prompts hitting the cache, OpenAI suggests the following:
As far as I understand, and based on experimentation and monitoring, caching works for the entirety of the content passed onto the LLM, and the structure of this content is inherited from the structure of the request body JSON.
While debugging the SDK I have found out that there is no straightforward way to control the structure of Responses API requests.
Example:
- Developer message (static)
- User message (static)
- User message (dynamic)
- Structured outputs schema (static)
Alternative structure to maximize the probability of cache hits:
- Structured outputs schema (static)
- Developer message (static)
- User message (static)
- User message (dynamic)
I have tried to call .text()
function on ResponseCreateParams
after everything else but it has no effect on the resulting request body.
Can there be a workaround or even functionality for this?