Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions docs/models/bedrock.md
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DouweM It's mostly the duplication of the same documentation we have for Anthropic CachePoint. What do you think, maybe we need to move it somewhere?

Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,79 @@ model = BedrockConverseModel(model_name='us.amazon.nova-pro-v1:0')
agent = Agent(model=model, model_settings=bedrock_model_settings)
```

## Prompt Caching

Bedrock supports [prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html) on Anthropic models so you can reuse expensive context across requests. Pydantic AI exposes the same three strategies as Anthropic:

1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker to cache everything before it in the current user message.
2. **Cache System Instructions**: Enable [`BedrockModelSettings.bedrock_cache_instructions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_instructions] to append a cache point after the system prompt.
3. **Cache Tool Definitions**: Enable [`BedrockModelSettings.bedrock_cache_tool_definitions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_tool_definitions] to cache your tool schemas.

> [!NOTE]
> AWS only serves cached content once a segment crosses the provider-specific minimum token thresholds (see the [Bedrock prompt caching docs](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html)). Short prompts or tool definitions below those limits will bypass the cache, so don't expect savings for tiny payloads.

You can combine all of them:

```python {test="skip"}
from pydantic_ai import Agent, CachePoint, RunContext
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings

model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
agent = Agent(
model,
system_prompt='Detailed instructions...',
model_settings=BedrockModelSettings(
bedrock_cache_instructions=True,
bedrock_cache_tool_definitions=True,
),
)


@agent.tool
async def search_docs(ctx: RunContext, query: str) -> str:
return f'Results for {query}'


async def main():
result1 = await agent.run(
[
'Long cached context...',
CachePoint(),
'First question',
]
)
result2 = await agent.run(
[
'Long cached context...',
CachePoint(),
'Second question',
]
)
print(result1.output, result1.usage())
print(result2.output, result2.usage())
```

Access cache usage statistics via [`RequestUsage`][pydantic_ai.usage.RequestUsage]:

```python {test="skip"}
from pydantic_ai import Agent, CachePoint

agent = Agent('bedrock:us.anthropic.claude-sonnet-4-5-20250929-v1:0')


async def main():
result = await agent.run(
[
'Reference material...',
CachePoint(),
'What changed since last time?',
]
)
usage = result.usage()
print(f'Cache writes: {usage.cache_write_tokens}')
print(f'Cache reads: {usage.cache_read_tokens}')
```

## `provider` argument

You can provide a custom `BedrockProvider` via the `provider` argument. This is useful when you want to specify credentials directly or use a custom boto3 client:
Expand Down
1 change: 1 addition & 0 deletions pydantic_ai_slim/pydantic_ai/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -622,6 +622,7 @@ class CachePoint:
Supported by:

- Anthropic
- Amazon Bedrock (Converse API)
"""

kind: Literal['cache-point'] = 'cache-point'
Expand Down
62 changes: 53 additions & 9 deletions pydantic_ai_slim/pydantic_ai/models/bedrock.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,21 @@ class BedrockModelSettings(ModelSettings, total=False):
See more about it on <https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html>.
"""

bedrock_cache_tool_definitions: bool
"""Whether to add a cache point after the last tool definition.

When enabled, the last tool in the `tools` array will include a `cachePoint`, allowing Bedrock to cache tool
definitions and reduce costs for compatible models.
See https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html for more information.
"""

bedrock_cache_instructions: bool
"""Whether to add a cache point after the system prompt blocks.

When enabled, an extra `cachePoint` is appended to the system prompt so Bedrock can cache system instructions.
See https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html for more information.
"""


@dataclass(init=False)
class BedrockConverseModel(Model):
Expand Down Expand Up @@ -299,7 +314,8 @@ async def count_tokens(
Check the actual supported models on <https://docs.aws.amazon.com/bedrock/latest/userguide/count-tokens.html>
"""
model_settings, model_request_parameters = self.prepare_request(model_settings, model_request_parameters)
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters)
settings = cast(BedrockModelSettings, model_settings or {})
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters, settings)
params: CountTokensRequestTypeDef = {
'modelId': self._remove_inference_geo_prefix(self.model_name),
'input': {
Expand Down Expand Up @@ -374,6 +390,8 @@ async def _process_response(self, response: ConverseResponseTypeDef) -> ModelRes
u = usage.RequestUsage(
input_tokens=response['usage']['inputTokens'],
output_tokens=response['usage']['outputTokens'],
cache_read_tokens=response['usage'].get('cacheReadInputTokens', 0),
cache_write_tokens=response['usage'].get('cacheWriteInputTokens', 0),
)
response_id = response.get('ResponseMetadata', {}).get('RequestId', None)
raw_finish_reason = response['stopReason']
Expand Down Expand Up @@ -417,8 +435,9 @@ async def _messages_create(
model_settings: BedrockModelSettings | None,
model_request_parameters: ModelRequestParameters,
) -> ConverseResponseTypeDef | ConverseStreamResponseTypeDef:
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters)
inference_config = self._map_inference_config(model_settings)
settings = model_settings or BedrockModelSettings()
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters, settings)
inference_config = self._map_inference_config(settings)

params: ConverseRequestTypeDef = {
'modelId': self.model_name,
Expand All @@ -427,7 +446,7 @@ async def _messages_create(
'inferenceConfig': inference_config,
}

tool_config = self._map_tool_config(model_request_parameters)
tool_config = self._map_tool_config(model_request_parameters, settings)
if tool_config:
params['toolConfig'] = tool_config

Expand Down Expand Up @@ -481,11 +500,18 @@ def _map_inference_config(

return inference_config

def _map_tool_config(self, model_request_parameters: ModelRequestParameters) -> ToolConfigurationTypeDef | None:
def _map_tool_config(
self,
model_request_parameters: ModelRequestParameters,
model_settings: BedrockModelSettings | None,
) -> ToolConfigurationTypeDef | None:
tools = self._get_tools(model_request_parameters)
if not tools:
return None

if model_settings and model_settings.get('bedrock_cache_tool_definitions'):
tools.append({'cachePoint': {'type': 'default'}})

tool_choice: ToolChoiceTypeDef
if not model_request_parameters.allow_text_output:
tool_choice = {'any': {}}
Expand All @@ -499,12 +525,16 @@ def _map_tool_config(self, model_request_parameters: ModelRequestParameters) ->
return tool_config

async def _map_messages( # noqa: C901
self, messages: list[ModelMessage], model_request_parameters: ModelRequestParameters
self,
messages: list[ModelMessage],
model_request_parameters: ModelRequestParameters,
model_settings: BedrockModelSettings | None,
) -> tuple[list[SystemContentBlockTypeDef], list[MessageUnionTypeDef]]:
"""Maps a `pydantic_ai.Message` to the Bedrock `MessageUnionTypeDef`.

Groups consecutive ToolReturnPart objects into a single user message as required by Bedrock Claude/Nova models.
"""
settings = model_settings or BedrockModelSettings()
profile = BedrockModelProfile.from_profile(self.profile)
system_prompt: list[SystemContentBlockTypeDef] = []
bedrock_messages: list[MessageUnionTypeDef] = []
Expand Down Expand Up @@ -613,10 +643,13 @@ async def _map_messages( # noqa: C901
if instructions := self._get_instructions(messages, model_request_parameters):
system_prompt.insert(0, {'text': instructions})

if system_prompt and settings.get('bedrock_cache_instructions'):
system_prompt.append({'cachePoint': {'type': 'default'}})

return system_prompt, processed_messages

@staticmethod
async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int]) -> list[MessageUnionTypeDef]:
async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int]) -> list[MessageUnionTypeDef]: # noqa: C901
content: list[ContentBlockUnionTypeDef] = []
if isinstance(part.content, str):
content.append({'text': part.content})
Expand Down Expand Up @@ -674,8 +707,17 @@ async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int])
elif isinstance(item, AudioUrl): # pragma: no cover
raise NotImplementedError('Audio is not supported yet.')
elif isinstance(item, CachePoint):
# Bedrock support has not been implemented yet: https://github.com/pydantic/pydantic-ai/issues/3418
pass
if not content or 'cachePoint' in content[-1]:
raise UserError(
'CachePoint cannot be the first content in a user message - there must be previous content to cache when using Bedrock. '
'To cache system instructions or tool definitions, use the `bedrock_cache_instructions` or `bedrock_cache_tool_definitions` settings instead.'
)
if 'text' not in content[-1]:
# AWS currently rejects cache points that directly follow non-text content.
# Insert an empty text block as a workaround (see https://github.com/pydantic/pydantic-ai/issues/3418
# and https://github.com/pydantic/pydantic-ai/pull/2560#discussion_r2349209916).
content.append({'text': '\n'})
content.append({'cachePoint': {'type': 'default'}})
else:
assert_never(item)
return [{'role': 'user', 'content': content}]
Expand Down Expand Up @@ -796,6 +838,8 @@ def _map_usage(self, metadata: ConverseStreamMetadataEventTypeDef) -> usage.Requ
return usage.RequestUsage(
input_tokens=metadata['usage']['inputTokens'],
output_tokens=metadata['usage']['outputTokens'],
cache_read_tokens=metadata['usage'].get('cacheReadInputTokens', 0),
cache_write_tokens=metadata['usage'].get('cacheWriteInputTokens', 0),
)


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
interactions:
- request:
body: '{"messages": [{"role": "user", "content": [{"text": "ONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\nONLY
SINGLE NUMBER IN RESPONSE\nONLY SINGLE NUMBER IN RESPONSE\n"}, {"cachePoint": {"type": "default"}}, {"text": "Response
only number What is 2 + 3"}]}], "system": [{"text": "YOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY
WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE
ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST
RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU
MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE
NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY
WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE
ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST
RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU
MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE
NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY
WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE
ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST
RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU
MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE
NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY
WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE
ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST RESPONSE ONLY WITH SINGLE NUMBER\nYOU MUST
RESPONSE ONLY WITH SINGLE NUMBER\n"}, {"cachePoint": {"type": "default"}}], "inferenceConfig": {}}'
headers:
amz-sdk-invocation-id:
- !!binary |
Y2RmYWJiOGYtYjM0MC00NzY4LTgwZTEtMDI5NzZiZDdiZjVm
amz-sdk-request:
- !!binary |
YXR0ZW1wdD0x
content-length:
- '5580'
content-type:
- !!binary |
YXBwbGljYXRpb24vanNvbg==
method: POST
uri: https://bedrock-runtime.us-east-1.amazonaws.com/model/us.anthropic.claude-sonnet-4-5-20250929-v1%3A0/converse
response:
headers:
connection:
- keep-alive
content-length:
- '321'
content-type:
- application/json
parsed_body:
metrics:
latencyMs: 2015
output:
message:
content:
- text: '5'
role: assistant
stopReason: end_turn
usage:
cacheReadInputTokenCount: 0
cacheReadInputTokens: 0
cacheWriteInputTokenCount: 1503
cacheWriteInputTokens: 1503
inputTokens: 14
outputTokens: 5
serverToolUsage: {}
totalTokens: 1522
status:
code: 200
message: OK
version: 1
Loading