Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 101 additions & 2 deletions docs/models/anthropic.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,18 +80,29 @@ agent = Agent(model)

## Prompt Caching

Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides three ways to use prompt caching:
Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides four ways to use prompt caching:

1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker in your user messages to cache everything before it
2. **Cache System Instructions**: Set [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
3. **Cache Tool Definitions**: Set [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
4. **Cache Last Message (Convenience)**: Set [`AnthropicModelSettings.anthropic_cache_messages`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_messages] to `True` to automatically cache the last user message

You can combine all three strategies for maximum savings:
You can combine multiple strategies for maximum savings:

```python {test="skip"}
from pydantic_ai import Agent, CachePoint, RunContext
from pydantic_ai.models.anthropic import AnthropicModelSettings

# Option 1: Use anthropic_cache_messages for convenience (caches last message only)
agent = Agent(
'anthropic:claude-sonnet-4-5',
system_prompt='Detailed instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_messages=True, # Caches the last user message
),
)

# Option 2: Fine-grained control with individual settings
agent = Agent(
'anthropic:claude-sonnet-4-5',
system_prompt='Detailed instructions...',
Expand Down Expand Up @@ -145,3 +156,91 @@ async def main():
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')
```

### Cache Point Limits

Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.

#### How Cache Points Are Allocated

Cache points can be placed in three locations:

1. **System Prompt**: Via `anthropic_cache_instructions` setting (adds cache point to last system prompt block)
2. **Tool Definitions**: Via `anthropic_cache_tool_definitions` setting (adds cache point to last tool definition)
3. **Messages**: Via `CachePoint` markers or `anthropic_cache_messages` setting (adds cache points to message content)

Each setting uses **at most 1 cache point**, but you can combine them:

```python {test="skip"}
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.anthropic import AnthropicModelSettings

# Example: Using all 3 cache point sources
agent = Agent(
'anthropic:claude-sonnet-4-5',
system_prompt='Detailed instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # 1 cache point
anthropic_cache_tool_definitions=True, # 1 cache point
anthropic_cache_messages=True, # 1 cache point
),
)

@agent.tool_plain
def my_tool() -> str:
return 'result'

async def main():
# This uses 3 cache points (instructions + tools + last message)
# You can add 1 more CachePoint marker before hitting the limit
result = await agent.run([
'Context', CachePoint(), # 4th cache point - OK
'Question'
])
print(result.output)
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')
```

#### Automatic Cache Point Limiting

When cache points from all sources (settings + `CachePoint` markers) exceed 4, Pydantic AI automatically removes excess cache points from **older message content** (keeping the most recent ones):

```python {test="skip"}
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.anthropic import AnthropicModelSettings

agent = Agent(
'anthropic:claude-sonnet-4-5',
system_prompt='Instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # 1 cache point
anthropic_cache_tool_definitions=True, # 1 cache point
),
)

@agent.tool_plain
def search() -> str:
return 'data'

async def main():
# Already using 2 cache points (instructions + tools)
# Can add 2 more CachePoint markers (4 total limit)
result = await agent.run([
'Context 1', CachePoint(), # Oldest - will be removed
'Context 2', CachePoint(), # Will be kept (3rd point)
'Context 3', CachePoint(), # Will be kept (4th point)
'Question'
])
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
print(result.output)
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')
```

**Key Points**:
- System and tool cache points are **always preserved**
- Message cache points are removed from oldest to newest when limit is exceeded
- This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching
105 changes: 103 additions & 2 deletions pydantic_ai_slim/pydantic_ai/models/anthropic.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,19 @@ class AnthropicModelSettings(ModelSettings, total=False):
See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
"""

anthropic_cache_messages: bool | Literal['5m', '1h']
"""Convenience setting to enable caching for the last user message.

When enabled, this automatically adds a cache point to the last content block
in the final user message, which is useful for caching conversation history
or context in multi-turn conversations.
If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.

Note: Uses 1 of Anthropic's 4 available cache points per request. Any additional CachePoint
markers in messages will be automatically limited to respect the 4-cache-point maximum.
See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
"""


@dataclass(init=False)
class AnthropicModel(Model):
Expand Down Expand Up @@ -333,7 +346,7 @@ async def _messages_create(
tool_choice = self._infer_tool_choice(tools, model_settings, model_request_parameters)

system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)

self._limit_cache_points(system_prompt, anthropic_messages, tools)
try:
extra_headers = self._map_extra_headers(beta_features, model_settings)

Expand Down Expand Up @@ -376,7 +389,7 @@ async def _messages_count_tokens(
tool_choice = self._infer_tool_choice(tools, model_settings, model_request_parameters)

system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)

self._limit_cache_points(system_prompt, anthropic_messages, tools)
try:
extra_headers = self._map_extra_headers(beta_features, model_settings)

Expand Down Expand Up @@ -747,6 +760,25 @@ async def _map_message( # noqa: C901
system_prompt_parts.insert(0, instructions)
system_prompt = '\n\n'.join(system_prompt_parts)

# Add cache_control to the last message content if anthropic_cache_messages is enabled
if anthropic_messages and (cache_messages := model_settings.get('anthropic_cache_messages')):
ttl: Literal['5m', '1h'] = '5m' if cache_messages is True else cache_messages
m = anthropic_messages[-1]
content = m['content']
if isinstance(content, str):
# Convert string content to list format with cache_control
m['content'] = [ # pragma: no cover
BetaTextBlockParam(
text=content,
type='text',
cache_control=BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl),
)
]
else:
# Add cache_control to the last content block
content = cast(list[BetaContentBlockParam], content)
self._add_cache_control_to_last_param(content, ttl)

# If anthropic_cache_instructions is enabled, return system prompt as a list with cache_control
if system_prompt and (cache_instructions := model_settings.get('anthropic_cache_instructions')):
# If True, use '5m'; otherwise use the specified ttl value
Expand All @@ -762,6 +794,75 @@ async def _map_message( # noqa: C901

return system_prompt, anthropic_messages

@staticmethod
def _limit_cache_points(
system_prompt: str | list[BetaTextBlockParam],
anthropic_messages: list[BetaMessageParam],
tools: list[BetaToolUnionParam],
) -> None:
"""Limit the number of cache points in the request to Anthropic's maximum.

Anthropic enforces a maximum of 4 cache points per request. This method ensures
compliance by counting existing cache points and removing excess ones from messages.

Strategy:
1. Count cache points in system_prompt (can be multiple if list of blocks)
2. Count cache points in tools (can be in any position, not just last)
3. Raise UserError if system + tools already exceed MAX_CACHE_POINTS
4. Calculate remaining budget for message cache points
5. Traverse messages from newest to oldest, keeping the most recent cache points
within the remaining budget
6. Remove excess cache points from older messages to stay within limit

Cache point priority (always preserved):
- System prompt cache points
- Tool definition cache points
- Message cache points (newest first, oldest removed if needed)

Raises:
UserError: If system_prompt and tools combined already exceed MAX_CACHE_POINTS (4).
This indicates a configuration error that cannot be auto-fixed.
"""
MAX_CACHE_POINTS = 4

# Count existing cache points in system prompt
used_cache_points = (
sum(1 for block in system_prompt if 'cache_control' in cast(dict[str, Any], block))
if isinstance(system_prompt, list)
else 0
)

# Count existing cache points in tools (any tool may have cache_control)
# Note: cache_control can be in the middle of tools list if builtin tools are added after
for tool in tools:
if 'cache_control' in tool:
used_cache_points += 1

# Calculate remaining cache points budget for messages
remaining_budget = MAX_CACHE_POINTS - used_cache_points
if remaining_budget < 0: # pragma: no cover
raise UserError(
f'Too many cache points for Anthropic request. '
f'System prompt and tool definitions already use {used_cache_points} cache points, '
f'which exceeds the maximum of {MAX_CACHE_POINTS}.'
)
# Remove excess cache points from messages (newest to oldest)
for message in reversed(anthropic_messages):
content = message['content']
if isinstance(content, str): # pragma: no cover
continue

# Process content blocks in reverse order (newest first)
for block in reversed(cast(list[BetaContentBlockParam], content)):
block_dict = cast(dict[str, Any], block)

if 'cache_control' in block_dict:
if remaining_budget > 0:
remaining_budget -= 1
else:
# Exceeded limit, remove this cache point
del block_dict['cache_control']

@staticmethod
def _add_cache_control_to_last_param(params: list[BetaContentBlockParam], ttl: Literal['5m', '1h'] = '5m') -> None:
"""Add cache control to the last content block param.
Expand Down
Loading