Enable more advanced Anthropic prompt caching than `CachePoint` implementation llows

### Initial Checks

- [x] I confirm that I'm using the latest version of Pydantic AI
- [x] I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue

### Description

I think it would be worth reverting the `CachePoint` implementation of Anthropic Context Caching.

It has caused several problems regarding message serialization, specifically. In order to use context caching for traditional chat conversations, we seemingly need a message history processor to remove CachePoint from previously serialized messages. Otherwise, I've enabled `anthropic_cache_instructions` and `anthropic_cache_tool_definitions`, with a `CachePoint` after the user_prompt, and am seeing ModelHTTPErrors: 'A maximum of 4 blocks with cache_control may be provided. Found 7.' 

<details>

<summary>While I understand *why* I'm getting those, the DX is not great. Here's an MRE.</summary>

```python
from pydantic_ai import Agent, RunContext, CachePoint, AgentRunResultEvent
from pydantic_ai.models.anthropic import AnthropicModelSettings
from dataclasses import dataclass

@dataclass
class MREDeps:
    flag: bool

mre = Agent(
    "anthropic:claude-sonnet-4-5-20250929",
    model_settings=AnthropicModelSettings(
        max_tokens=64_000,
        anthropic_cache_instructions=True,
        anthropic_cache_tool_definitions=True,
    )
)


@mre.instructions
def instructions(ctx: RunContext[MREDeps]) -> str:
    if ctx.deps.flag:
        return "Talk in piglatin"
    return "Talk in a Texan accent"


@mre.tool
def yeehaw(ctx: RunContext[MREDeps], phrase: str) -> str:
    if not ctx.deps.flag:
        return f"yee was haw'd: {phrase.upper()}"
    return "ERROR: you're not talkin' texan"


messages = []

for i in range(10):
    user_text = input("> ")
    print("User input:", user_text)
    
    async for event in mre.run_stream_events([user_text, CachePoint()], message_history=messages, deps=MREDeps(i % 2 != 0)):
        if isinstance(event, AgentRunResultEvent):
            new_messages = event.result.new_messages()
            messages.extend(new_messages)

            for m in new_messages:
                print(m)
```

And the logs:

```
User input: howdy partner
ModelRequest(parts=[UserPromptPart(content=['howdy partner', CachePoint(kind='cache-point')], timestamp=datetime.datetime(2025, 11, 17, 17, 28, 1, 509238, tzinfo=datetime.timezone.utc))], instructions='Talk in a Texan accent', run_id='1a6bba3c-b7c7-439b-8974-b4bfa0cbb68b')
ModelResponse(parts=[TextPart(content="Howdy there, partner! Well ain't this a fine day! Welcome on in, friend. What can I do ya for today? Y'all need any help with somethin', or just stoppin' by to shoot the breeze? Either way, I'm happier than a tornado in a trailer park to see ya!")], usage=RequestUsage(input_tokens=564, output_tokens=72, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 564, 'output_tokens': 72}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 4, 483811, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01BTNejiLTFEAjySC2vg4Uad', finish_reason='stop', run_id='1a6bba3c-b7c7-439b-8974-b4bfa0cbb68b')
User input: can you try yee'in
ModelRequest(parts=[UserPromptPart(content=["can you try yee'in", CachePoint(kind='cache-point')], timestamp=datetime.datetime(2025, 11, 17, 17, 28, 12, 195243, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content='Owday-hay, artnerpay! Ime-tay otay ive-gay is-thay a-yay ot-shay!'), ToolCallPart(tool_name='yeehaw', args='{"phrase": "Howdy partner! Time to give this a shot!"}', tool_call_id='toolu_018tKuUA2PnPex69W1kvANfz')], usage=RequestUsage(input_tokens=644, output_tokens=100, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 644, 'output_tokens': 100}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 16, 669363, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01LNaaEPX9vWCkR2tn2iyYwr', finish_reason='tool_call', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelRequest(parts=[ToolReturnPart(tool_name='yeehaw', content="ERROR: you're not talkin' texan", tool_call_id='toolu_018tKuUA2PnPex69W1kvANfz', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 19, 4602, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content='Uh-oh-yay! Ooks-lay ike-lay I-yay ot-gay old-tay off-yay or-fay ot-nay alkin-tay exan-Tay enough-yay! Et-lay e-may y-tray at-thay again-yay ith-way ore-may outhern-Say air-flay!'), ToolCallPart(tool_name='yeehaw', args='{"phrase": "Well howdy there partner! Yeehaw, ain\'t this just a rootin\' tootin\' good time! Y\'all come back now, ya hear?"}', tool_call_id='toolu_014ymJHfnVd11MjGr1CrzDX1')], usage=RequestUsage(input_tokens=766, output_tokens=183, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 766, 'output_tokens': 183}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 21, 256867, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01L9i4aXTkPHe7YRtfcB6UF6', finish_reason='tool_call', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelRequest(parts=[ToolReturnPart(tool_name='yeehaw', content="ERROR: you're not talkin' texan", tool_call_id='toolu_014ymJHfnVd11MjGr1CrzDX1', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 26, 136516, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content='Oot-shay! is-Thay is-yay ougher-tay an-thay a-yay o-tway-ollar-day eak-stay! Et-lay e-may y-tray one-yay ore-may ime-tay ith-way even-yay ore-may exan-Tay irit-spay!'), ToolCallPart(tool_name='yeehaw', args='{"phrase": "YEEHAW! Howdy partner! Well I\'ll be hornswoggled! This here\'s finer than frog hair split four ways! Hot diggity dog, ain\'t nothin\' better than ridin\' off into the sunset with good folks!"}', tool_call_id='toolu_01KH9LZoUp5TzMY36z8g4jQD')], usage=RequestUsage(input_tokens=971, output_tokens=196, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 971, 'output_tokens': 196}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 29, 910374, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'tool_use'}, provider_response_id='msg_01THAPikxEV21Uyqfzw5ypRB', finish_reason='tool_call', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelRequest(parts=[ToolReturnPart(tool_name='yeehaw', content="ERROR: you're not talkin' texan", tool_call_id='toolu_01KH9LZoUp5TzMY36z8g4jQD', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 33, 539040, tzinfo=datetime.timezone.utc))], instructions='Talk in piglatin', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
ModelResponse(parts=[TextPart(content="Ell-way oot-shay, artnerpay! Eems-say ike-lay I-yay ust-jay an-cay't-yay et-gay e-thay ight-ray exas-Tay ang-tway o-tay is-thay ere-hay unction-fay! Ight-may e-bay it-yay ants-way even-yay ore-may authentic-yay exan-Tay alk-tay an-thay I-yay an-cay uster-may up-yay!")], usage=RequestUsage(input_tokens=1189, output_tokens=141, details={'cache_creation_input_tokens': 0, 'cache_read_input_tokens': 0, 'input_tokens': 1189, 'output_tokens': 141}), model_name='claude-sonnet-4-5-20250929', timestamp=datetime.datetime(2025, 11, 17, 17, 28, 35, 693712, tzinfo=datetime.timezone.utc), provider_name='anthropic', provider_details={'finish_reason': 'end_turn'}, provider_response_id='msg_01PMuFuiujKjc4satcmhw3z5', finish_reason='stop', run_id='2a2e4a1f-13ec-4f22-8093-d53a689f01b0')
User input: darn
Traceback (most recent call last):
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/models/anthropic.py", line 336, in _messages_create
    return await self.client.beta.messages.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/anthropic/resources/beta/messages/messages.py", line 2430, in create
    return await self._post(
           ^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/anthropic/_base_client.py", line 1902, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/anthropic/_base_client.py", line 1702, in request
    raise self._make_status_error_from_response(err.response) from None
anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'A maximum of 4 blocks with cache_control may be provided. Found 5.'}, 'request_id': 'req_011CVDrdJxB6RtskNNdrL48L'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3670, in run_code
    await eval(code_obj, self.user_global_ns, self.user_ns)
  File "/var/folders/0d/y0_v_0415vl8c0dqjchgkxtc0000gn/T/ipykernel_64417/4139601478.py", line 39, in <module>
    async for event in mre.run_stream_events([user_text, CachePoint()], message_history=messages, deps=MREDeps(i % 2 != 0)):
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/agent/abstract.py", line 906, in _run_stream_events
    result = await task
             ^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/agent/abstract.py", line 883, in run_agent
    return await self.run(
           ^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/agent/abstract.py", line 243, in run
    async with node.stream(agent_run.ctx) as stream:
  File "/Users/ngage/.pyenv/versions/3.11.14/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/_agent_graph.py", line 434, in stream
    async with ctx.deps.model.request_stream(
  File "/Users/ngage/.pyenv/versions/3.11.14/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/models/anthropic.py", line 255, in request_stream
    response = await self._messages_create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ngage/repos/sweetspot/sweetspot-gov/.venv/lib/python3.11/site-packages/pydantic_ai/models/anthropic.py", line 356, in _messages_create
    raise ModelHTTPError(status_code=status_code, model_name=self.model_name, body=e.body) from e
pydantic_ai.exceptions.ModelHTTPError: status_code: 400, model_name: claude-sonnet-4-5-20250929, body: {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'A maximum of 4 blocks with cache_control may be provided. Found 5.'}, 'request_id': 'req_011CVDrdJxB6RtskNNdrL48L'}
```

</details>


Furthermore, since `CachePoint` is only included in UserPromptPart, its not possible to Cache at user & assistant messages in the chain. While many people dislike Anthropic's impl., the ability to cache at Assistant message parts is a very valuable -- in Anthropic, it is valid to start a conversation with an Assistant message pre-fill. 

I'm very much not a fan of this implementation also requiring system/tool_def to be cached at the ModelSettings level. Let's say you have:

```python
tools=[
    StaticTool1(),
    StaticTool2(),
    DynamicTool(),
]
```

The current impl. does not allow a cache point at StaticTool2, which makes the `anthropic_cache_tool_definitions` not very useful. This also applies for MCP toolsets.

---

Last I understood, the Context Caching impl. was to be apart of the `vendor_details`. What happened to that? I vastly prefered the `vendor_details` or even `history_processors` based approach. 

### Python, Pydantic AI & LLM client version

```Text
Python 3.11.14
Pydantic 1.18.0
Anthropic 0.72.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable more advanced Anthropic prompt caching than `CachePoint` implementation llows #3453

Initial Checks

Description

Python, Pydantic AI & LLM client version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable more advanced Anthropic prompt caching than CachePoint implementation llows #3453

Description

Initial Checks

Description

Python, Pydantic AI & LLM client version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Enable more advanced Anthropic prompt caching than `CachePoint` implementation llows #3453