Skip to content

Enable more advanced Anthropic prompt caching than CachePoint implementation llows #3453

@nathan-gage

Description

@nathan-gage

Initial Checks

Description

I think it would be worth reverting the CachePoint implementation of Anthropic Context Caching.

It has caused several problems regarding message serialization, specifically. In order to use context caching for traditional chat conversations, we seemingly need a message history processor to remove CachePoint from previously serialized messages. Otherwise, I've enabled anthropic_cache_instructions and anthropic_cache_tool_definitions, with a CachePoint after the user_prompt, and am seeing ModelHTTPErrors: 'A maximum of 4 blocks with cache_control may be provided. Found 7.'

While I understand *why* I'm getting those, the DX is not great. Here's an MRE.
from pydantic_ai import Agent, RunContext, CachePoint, AgentRunResultEvent
from pydantic_ai.models.anthropic import AnthropicModelSettings
from dataclasses import dataclass

@dataclass
class MREDeps:
    flag: bool

mre = Agent(
    "anthropic:claude-sonnet-4-5-20250929",
    model_settings=AnthropicModelSettings(
        max_tokens=64_000,
        anthropic_cache_instructions=True,
        anthropic_cache_tool_definitions=True,
    )
)


@mre.instructions
def instructions(ctx: RunContext[MREDeps]) -> str:
    if ctx.deps.flag:
        return "Talk in piglatin"
    return "Talk in a Texan accent"


@mre.tool
def yeehaw(ctx: RunContext[MREDeps], phrase: str) -> str:
    if not ctx.deps.flag:
        return f"yee was haw'd: {phrase.upper()}"
    return "ERROR: you're not talkin' texan"


messages = []

for i in range(10):
    user_text = input("> ")
    print("User input:", user_text)
    
    async for event in mre.run_stream_events([user_text, CachePoint()], message_history=messages, deps=MREDeps(i % 2 != 0)):
        if isinstance(event, AgentRunResultEvent):
            new_messages = event.result.new_messages()
            messages.extend(new_messages)

            for m in new_messages:
                print(m)

And the logs:

User input: howdy partner
ModelRequest(parts=[UserPromptPart(content=['howdy partner', CachePoint(kind='cache-point')], timestamp=datetime.datetime(2025, 11, 17, 17, 28, 1, 509238, tzinfo=datetime.timezone.utc))], instructions='Talk in a Texan accent', run_id='1a6bba3c-b7c7-439b-8974-b4bfa0cbb68b')
ModelResponse(parts=[TextPart(content="Howdy there, partner! Well ain't this a fine day! Welcome on in, friend. What can I do ya for today? Y'all need any help with somethin', or just stoppin' by to shoot the breeze? Either way, I'm happier than a tornado in a trailer park to see ya!")], usage=RequestUsage(input_tokens=564, output_tokens=72, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 564, 'output_tokens': 72}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 4, 483811, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01BTNejiLTFEAjySC2vg4Uad', finish_reason='stop', run_id='1a6bba3c-b7c7-439b-8974-b4bfa0cbb68b')
User input: can you try yee'in
ModelRequest(parts=[UserPromptPart(content=["can you try yee'in", CachePoint(kind='cache-point')], timestamp=datetime.datetime(2025, 11, 17, 17, 28, 12, 195243, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content='Owday-hay, artnerpay! Ime-tay otay ive-gay is-thay a-yay ot-shay!'), ToolCallPart(tool_name='yeehaw', args='{"phrase": "Howdy partner! Time to give this a shot!"}', tool_call_id='toolu_018tKuUA2PnPex69W1kvANfz')], usage=RequestUsage(input_tokens=644, output_tokens=100, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 644, 'output_tokens': 100}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 16, 669363, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01LNaaEPX9vWCkR2tn2iyYwr', finish_reason='tool_call', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelRequest(parts=[ToolReturnPart(tool_name='yeehaw', content="ERROR: you're not talkin' texan", tool_call_id='toolu_018tKuUA2PnPex69W1kvANfz', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 19, 4602, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content='Uh-oh-yay! Ooks-lay ike-lay I-yay ot-gay old-tay off-yay or-fay ot-nay alkin-tay exan-Tay enough-yay! Et-lay e-may y-tray at-thay again-yay ith-way ore-may outhern-Say air-flay!'), ToolCallPart(tool_name='yeehaw', args='{"phrase": "Well howdy there partner! Yeehaw, ain\'t this just a rootin\' tootin\' good time! Y\'all come back now, ya hear?"}', tool_call_id='toolu_014ymJHfnVd11MjGr1CrzDX1')], usage=RequestUsage(input_tokens=766, output_tokens=183, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 766, 'output_tokens': 183}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 21, 256867, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01L9i4aXTkPHe7YRtfcB6UF6', finish_reason='tool_call', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelRequest(parts=[ToolReturnPart(tool_name='yeehaw', content="ERROR: you're not talkin' texan", tool_call_id='toolu_014ymJHfnVd11MjGr1CrzDX1', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 26, 136516, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content='Oot-shay! is-Thay is-yay ougher-tay an-thay a-yay o-tway-ollar-day eak-stay! Et-lay e-may y-tray one-yay ore-may ime-tay ith-way even-yay ore-may exan-Tay irit-spay!'), ToolCallPart(tool_name='yeehaw', args='{"phrase": "YEEHAW! Howdy partner! Well I\'ll be hornswoggled! This here\'s finer than frog hair split four ways! Hot diggity dog, ain\'t nothin\' better than ridin\' off into the sunset with good folks!"}', tool_call_id='toolu_01KH9LZoUp5TzMY36z8g4jQD')], usage=RequestUsage(input_tokens=971, output_tokens=196, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 971, 'output_tokens': 196}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 29, 910374, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01THAPikxEV21Uyqfzw5ypRB', finish_reason='tool_call', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelRequest(parts=[ToolReturnPart(tool_name='yeehaw', content="ERROR: you're not talkin' texan", tool_call_id='toolu_01KH9LZoUp5TzMY36z8g4jQD', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 33, 539040, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content="Ell-way oot-shay, artnerpay! Eems-say ike-lay I-yay ust-jay an-cay't-yay et-gay e-thay ight-ray exas-Tay ang-tway o-tay is-thay ere-hay unction-fay! Ight-may e-bay it-yay ants-way even-yay ore-may authentic-yay exan-Tay alk-tay an-thay I-yay an-cay uster-may up-yay!")], usage=RequestUsage(input_tokens=1189, output_tokens=141, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 1189, 'output_tokens': 141}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 35, 693712, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01PMuFuiujKjc4satcmhw3z5', finish_reason='stop', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
User input: darn
Traceback (most recent call last):
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/models/anthropic.py", line 336, in _messages_create
    return await self.client.beta.messages.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/anthropic/resources/beta/messages/messages.py", line 2430, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/anthropic/_base_client.py", line 1902, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/anthropic/_base_client.py", line 1702, in request
    raise self._make_status_error_from_response(err.response) from None
anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'A maximum of 4 blocks with cache_control may be provided. Found 5.'}, 'request_id': 'req_011CVDrdJxB6RtskNNdrL48L'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3670, in run_code
    await eval(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/0d/y0_v_0415vl8c0dqjchgkxtc0000gn/T/ipykernel_64417/4139601478.py", line 39, in <module>
    async for event in mre.run_stream_events([user_text, CachePoint()], message_history=messages, deps=MREDeps(i % 2 != 0)):
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/agent/abstract.py", line 906, in _run_stream_events
    result = await task
             ^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/agent/abstract.py", line 883, in run_agent
    return await self.run(
           ^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/agent/abstract.py", line 243, in run
    async with node.stream(agent_run.ctx) as stream:
  File "/Users/ngage/.pyenv/versions/3.11.14/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 434, in stream
    async with ctx.deps.model.request_stream(
  File "/Users/ngage/.pyenv/versions/3.11.14/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/models/anthropic.py", line 255, in request_stream
    response = await self._messages_create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/models/anthropic.py", line 356, in _messages_create
    raise ModelHTTPError(status_code=status_code, model_name=self.model_name, body=e.body) from e
pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: claude-sonnet-4-5-20250929, body: {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'A maximum of 4 blocks with cache_control may be provided. Found 5.'}, 'request_id': 'req_011CVDrdJxB6RtskNNdrL48L'}

Furthermore, since CachePoint is only included in UserPromptPart, its not possible to Cache at user & assistant messages in the chain. While many people dislike Anthropic's impl., the ability to cache at Assistant message parts is a very valuable -- in Anthropic, it is valid to start a conversation with an Assistant message pre-fill.

I'm very much not a fan of this implementation also requiring system/tool_def to be cached at the ModelSettings level. Let's say you have:

tools=[
    StaticTool1(),
    StaticTool2(),
    DynamicTool(),
]

The current impl. does not allow a cache point at StaticTool2, which makes the anthropic_cache_tool_definitions not very useful. This also applies for MCP toolsets.


Last I understood, the Context Caching impl. was to be apart of the vendor_details. What happened to that? I vastly prefered the vendor_details or even history_processors based approach.

Python, Pydantic AI & LLM client version

Python 3.11.14
Pydantic 1.18.0
Anthropic 0.72.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions