Skip to content

Commit 5263d8a

Browse files
Bedrock: add prompt caching support and verification
- Emit cache-point tool entries so Bedrock accepts cached tool definitions - Document and test prompt caching (writes + reads) with cassette-body checks - Refresh Bedrock cassettes and type annotations to align with the new flow
1 parent 359c6d2 commit 5263d8a

File tree

7 files changed

+527
-35
lines changed

7 files changed

+527
-35
lines changed

docs/models/bedrock.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,63 @@ model = BedrockConverseModel(model_name='us.amazon.nova-pro-v1:0')
7474
agent = Agent(model=model, model_settings=bedrock_model_settings)
7575
```
7676

77+
## Prompt Caching
78+
79+
Bedrock supports [prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html) on Anthropic models so you can reuse expensive context across requests. Pydantic AI exposes the same three strategies as Anthropic:
80+
81+
1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker to cache everything before it in the current user message.
82+
2. **Cache System Instructions**: Enable [`BedrockModelSettings.bedrock_cache_instructions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_instructions] to append a cache point after the system prompt.
83+
3. **Cache Tool Definitions**: Enable [`BedrockModelSettings.bedrock_cache_tool_definitions`][pydantic_ai.models.bedrock.BedrockModelSettings.bedrock_cache_tool_definitions] to cache your tool schemas.
84+
85+
> [!NOTE]
86+
> AWS only serves cached content once a segment crosses the provider-specific minimum token thresholds (see the [Bedrock prompt caching docs](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html)). Short prompts or tool definitions below those limits will bypass the cache, so don't expect savings for tiny payloads.
87+
88+
You can combine all of them:
89+
90+
```python {test="skip"}
91+
from pydantic_ai import Agent, CachePoint, RunContext
92+
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings
93+
94+
model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
95+
agent = Agent(
96+
model,
97+
system_prompt='Detailed instructions...',
98+
model_settings=BedrockModelSettings(
99+
bedrock_cache_instructions=True,
100+
bedrock_cache_tool_definitions=True,
101+
),
102+
)
103+
104+
@agent.tool
105+
async def search_docs(ctx: RunContext, query: str) -> str:
106+
return f'Results for {query}'
107+
108+
async def main():
109+
await agent.run([
110+
'Long cached context...',
111+
CachePoint(),
112+
'First question',
113+
])
114+
await agent.run([
115+
'Long cached context...',
116+
CachePoint(),
117+
'Second question',
118+
])
119+
```
120+
121+
Access cache usage statistics via [`RequestUsage`][pydantic_ai.usage.RequestUsage]:
122+
123+
```python {test="skip"}
124+
result = await agent.run([
125+
'Reference material...',
126+
CachePoint(),
127+
'What changed since last time?'
128+
])
129+
usage = result.usage()
130+
print(f'Cache writes: {usage.cache_write_tokens}')
131+
print(f'Cache reads: {usage.cache_read_tokens}')
132+
```
133+
77134
## `provider` argument
78135

79136
You can provide a custom `BedrockProvider` via the `provider` argument. This is useful when you want to specify credentials directly or use a custom boto3 client:

pydantic_ai_slim/pydantic_ai/messages.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -622,6 +622,7 @@ class CachePoint:
622622
Supported by:
623623
624624
- Anthropic
625+
- Amazon Bedrock (Converse API)
625626
"""
626627

627628
kind: Literal['cache-point'] = 'cache-point'

pydantic_ai_slim/pydantic_ai/models/bedrock.py

Lines changed: 53 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,21 @@ class BedrockModelSettings(ModelSettings, total=False):
208208
See more about it on <https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html>.
209209
"""
210210

211+
bedrock_cache_tool_definitions: bool
212+
"""Whether to add a cache point after the last tool definition.
213+
214+
When enabled, the last tool in the `tools` array will include a `cachePoint`, allowing Bedrock to cache tool
215+
definitions and reduce costs for compatible models.
216+
See https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html for more information.
217+
"""
218+
219+
bedrock_cache_instructions: bool
220+
"""Whether to add a cache point after the system prompt blocks.
221+
222+
When enabled, an extra `cachePoint` is appended to the system prompt so Bedrock can cache system instructions.
223+
See https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html for more information.
224+
"""
225+
211226

212227
@dataclass(init=False)
213228
class BedrockConverseModel(Model):
@@ -299,7 +314,8 @@ async def count_tokens(
299314
Check the actual supported models on <https://docs.aws.amazon.com/bedrock/latest/userguide/count-tokens.html>
300315
"""
301316
model_settings, model_request_parameters = self.prepare_request(model_settings, model_request_parameters)
302-
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters)
317+
settings = cast(BedrockModelSettings, model_settings or {})
318+
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters, settings)
303319
params: CountTokensRequestTypeDef = {
304320
'modelId': self._remove_inference_geo_prefix(self.model_name),
305321
'input': {
@@ -374,6 +390,8 @@ async def _process_response(self, response: ConverseResponseTypeDef) -> ModelRes
374390
u = usage.RequestUsage(
375391
input_tokens=response['usage']['inputTokens'],
376392
output_tokens=response['usage']['outputTokens'],
393+
cache_read_tokens=response['usage'].get('cacheReadInputTokens', 0),
394+
cache_write_tokens=response['usage'].get('cacheWriteInputTokens', 0),
377395
)
378396
response_id = response.get('ResponseMetadata', {}).get('RequestId', None)
379397
raw_finish_reason = response['stopReason']
@@ -417,8 +435,9 @@ async def _messages_create(
417435
model_settings: BedrockModelSettings | None,
418436
model_request_parameters: ModelRequestParameters,
419437
) -> ConverseResponseTypeDef | ConverseStreamResponseTypeDef:
420-
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters)
421-
inference_config = self._map_inference_config(model_settings)
438+
settings = model_settings or BedrockModelSettings()
439+
system_prompt, bedrock_messages = await self._map_messages(messages, model_request_parameters, settings)
440+
inference_config = self._map_inference_config(settings)
422441

423442
params: ConverseRequestTypeDef = {
424443
'modelId': self.model_name,
@@ -427,7 +446,7 @@ async def _messages_create(
427446
'inferenceConfig': inference_config,
428447
}
429448

430-
tool_config = self._map_tool_config(model_request_parameters)
449+
tool_config = self._map_tool_config(model_request_parameters, settings)
431450
if tool_config:
432451
params['toolConfig'] = tool_config
433452

@@ -481,11 +500,18 @@ def _map_inference_config(
481500

482501
return inference_config
483502

484-
def _map_tool_config(self, model_request_parameters: ModelRequestParameters) -> ToolConfigurationTypeDef | None:
503+
def _map_tool_config(
504+
self,
505+
model_request_parameters: ModelRequestParameters,
506+
model_settings: BedrockModelSettings | None,
507+
) -> ToolConfigurationTypeDef | None:
485508
tools = self._get_tools(model_request_parameters)
486509
if not tools:
487510
return None
488511

512+
if model_settings and model_settings.get('bedrock_cache_tool_definitions'):
513+
tools.append({'cachePoint': {'type': 'default'}})
514+
489515
tool_choice: ToolChoiceTypeDef
490516
if not model_request_parameters.allow_text_output:
491517
tool_choice = {'any': {}}
@@ -499,12 +525,16 @@ def _map_tool_config(self, model_request_parameters: ModelRequestParameters) ->
499525
return tool_config
500526

501527
async def _map_messages( # noqa: C901
502-
self, messages: list[ModelMessage], model_request_parameters: ModelRequestParameters
528+
self,
529+
messages: list[ModelMessage],
530+
model_request_parameters: ModelRequestParameters,
531+
model_settings: BedrockModelSettings | None,
503532
) -> tuple[list[SystemContentBlockTypeDef], list[MessageUnionTypeDef]]:
504533
"""Maps a `pydantic_ai.Message` to the Bedrock `MessageUnionTypeDef`.
505534
506535
Groups consecutive ToolReturnPart objects into a single user message as required by Bedrock Claude/Nova models.
507536
"""
537+
settings = model_settings or BedrockModelSettings()
508538
profile = BedrockModelProfile.from_profile(self.profile)
509539
system_prompt: list[SystemContentBlockTypeDef] = []
510540
bedrock_messages: list[MessageUnionTypeDef] = []
@@ -613,10 +643,13 @@ async def _map_messages( # noqa: C901
613643
if instructions := self._get_instructions(messages, model_request_parameters):
614644
system_prompt.insert(0, {'text': instructions})
615645

646+
if system_prompt and settings.get('bedrock_cache_instructions'):
647+
system_prompt.append({'cachePoint': {'type': 'default'}})
648+
616649
return system_prompt, processed_messages
617650

618651
@staticmethod
619-
async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int]) -> list[MessageUnionTypeDef]:
652+
async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int]) -> list[MessageUnionTypeDef]: # noqa: C901
620653
content: list[ContentBlockUnionTypeDef] = []
621654
if isinstance(part.content, str):
622655
content.append({'text': part.content})
@@ -674,8 +707,17 @@ async def _map_user_prompt(part: UserPromptPart, document_count: Iterator[int])
674707
elif isinstance(item, AudioUrl): # pragma: no cover
675708
raise NotImplementedError('Audio is not supported yet.')
676709
elif isinstance(item, CachePoint):
677-
# Bedrock support has not been implemented yet: https://github.com/pydantic/pydantic-ai/issues/3418
678-
pass
710+
if not content or 'cachePoint' in content[-1]:
711+
raise UserError(
712+
'CachePoint cannot be the first content in a user message - there must be previous content to cache when using Bedrock. '
713+
'To cache system instructions or tool definitions, use the `bedrock_cache_instructions` or `bedrock_cache_tool_definitions` settings instead.'
714+
)
715+
if 'text' not in content[-1]:
716+
# AWS currently rejects cache points that directly follow non-text content.
717+
# Insert an empty text block as a workaround (see https://github.com/pydantic/pydantic-ai/issues/3418
718+
# and https://github.com/pydantic/pydantic-ai/pull/2560#discussion_r2349209916).
719+
content.append({'text': '\n'})
720+
content.append({'cachePoint': {'type': 'default'}})
679721
else:
680722
assert_never(item)
681723
return [{'role': 'user', 'content': content}]
@@ -796,6 +838,8 @@ def _map_usage(self, metadata: ConverseStreamMetadataEventTypeDef) -> usage.Requ
796838
return usage.RequestUsage(
797839
input_tokens=metadata['usage']['inputTokens'],
798840
output_tokens=metadata['usage']['outputTokens'],
841+
cache_read_tokens=metadata['usage'].get('cacheReadInputTokens', 0),
842+
cache_write_tokens=metadata['usage'].get('cacheWriteInputTokens', 0),
799843
)
800844

801845

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
interactions:
2+
- request:
3+
body: '{"messages": [{"role": "user", "content": [{"text": "long promptlong promptlong promptlong promptlong promptlong
4+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
5+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
6+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
7+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
8+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
9+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
10+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
11+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
12+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
13+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
14+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
15+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
16+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
17+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
18+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
19+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
20+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
21+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
22+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
23+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
24+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
25+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
26+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
27+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
28+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
29+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
30+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
31+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
32+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
33+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
34+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
35+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
36+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
37+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
38+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
39+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
40+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
41+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
42+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
43+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
44+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
45+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
46+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
47+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
48+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
49+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
50+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
51+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
52+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
53+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
54+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
55+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
56+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
57+
promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong promptlong
58+
prompt"}, {"cachePoint": {"type": "default"}}, {"text": "Response only number What is 2 + 3"}]}], "system": [], "inferenceConfig":
59+
{}}'
60+
headers:
61+
amz-sdk-invocation-id:
62+
- !!binary |
63+
MWQ3YjUzZDItNTI1NS00NDJhLWE5ZjAtZDM0YTMzMzcxOTI5
64+
amz-sdk-request:
65+
- !!binary |
66+
YXR0ZW1wdD0x
67+
content-length:
68+
- '6781'
69+
content-type:
70+
- !!binary |
71+
YXBwbGljYXRpb24vanNvbg==
72+
method: POST
73+
uri: https://bedrock-runtime.us-east-1.amazonaws.com/model/us.anthropic.claude-sonnet-4-5-20250929-v1%3A0/converse
74+
response:
75+
headers:
76+
connection:
77+
- keep-alive
78+
content-length:
79+
- '321'
80+
content-type:
81+
- application/json
82+
parsed_body:
83+
metrics:
84+
latencyMs: 1867
85+
output:
86+
message:
87+
content:
88+
- text: '5'
89+
role: assistant
90+
stopReason: end_turn
91+
usage:
92+
cacheReadInputTokenCount: 0
93+
cacheReadInputTokens: 0
94+
cacheWriteInputTokenCount: 1203
95+
cacheWriteInputTokens: 1203
96+
inputTokens: 15
97+
outputTokens: 5
98+
serverToolUsage: {}
99+
totalTokens: 1223
100+
status:
101+
code: 200
102+
message: OK
103+
version: 1

0 commit comments

Comments
 (0)