Skip to content

Commit 6440d29

Browse files
Provide more details about Bedrock cache metrics (#247)
* Provide more details about Bedrock cache metrics * move content to correct place and add more print lines
1 parent 80977ae commit 6440d29

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

docs/user-guide/concepts/model-providers/amazon-bedrock.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,22 @@ When you enable prompt caching, Amazon Bedrock creates a cache composed of **cac
312312

313313
The cache has a five-minute Time To Live (TTL), which resets with each successful cache hit. During this period, the context in the cache is preserved. If no cache hits occur within the TTL window, your cache expires.
314314

315+
When using prompt caching, Amazon Bedrock provides cache statistics including `CacheReadInputTokens` and `CacheWriteInputTokens`.
316+
317+
- `CacheWriteInputTokens`: Number of input tokens written to the cache (occurs on first request with new content).
318+
319+
- `CacheReadInputTokens`: Number of input tokens read from the cache (occurs on subsequent requests with cached content).
320+
321+
Strands automatically captures these metrics and makes them available through multiple methods:
322+
323+
- Method 1: AgentResult Metrics (Recommended)
324+
325+
Cache statistics are automatically included in the `AgentResult.metrics.accumulated_usage`
326+
327+
- Method 2: OpenTelemetry Traces
328+
329+
Cache metrics are automatically recorded in OpenTelemetry traces when telemetry is enabled
330+
315331
For detailed information about supported models, minimum token requirements, and other limitations, see the [Amazon Bedrock documentation on prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
316332

317333
#### System Prompt Caching
@@ -338,9 +354,13 @@ agent = Agent(
338354

339355
# First request will cache the system prompt
340356
response1 = agent("Tell me about Python")
357+
print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
358+
print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")
341359

342360
# Second request will reuse the cached system prompt
343361
response2 = agent("Tell me about JavaScript")
362+
print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
363+
print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
344364
```
345365

346366
#### Tool Caching
@@ -365,9 +385,13 @@ agent = Agent(
365385
)
366386
# First request will cache the tools
367387
response1 = agent("What time is it?")
388+
print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
389+
print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")
368390

369391
# Second request will reuse the cached tools
370392
response2 = agent("What is the square root of 1764?")
393+
print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
394+
print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
371395
```
372396

373397
#### Messages Caching

0 commit comments

Comments
 (0)