Provide more details about Bedrock cache metrics (#247)

JackYPCOnline · web-flow · commit 6440d2946dd3 · 2025-09-12T12:10:17.000-04:00
* Provide more details about Bedrock cache metrics

* move content to correct place and add more print lines
diff --git a/docs/user-guide/concepts/model-providers/amazon-bedrock.md b/docs/user-guide/concepts/model-providers/amazon-bedrock.md
@@ -312,6 +312,22 @@ When you enable prompt caching, Amazon Bedrock creates a cache composed of **cac
 
 The cache has a five-minute Time To Live (TTL), which resets with each successful cache hit. During this period, the context in the cache is preserved. If no cache hits occur within the TTL window, your cache expires.
 
+When using prompt caching, Amazon Bedrock provides cache statistics including `CacheReadInputTokens` and `CacheWriteInputTokens`.
+
+- `CacheWriteInputTokens`: Number of input tokens written to the cache (occurs on first request with new content).
+
+- `CacheReadInputTokens`: Number of input tokens read from the cache (occurs on subsequent requests with cached content).
+
+Strands automatically captures these metrics and makes them available through multiple methods:
+
+- Method 1: AgentResult Metrics (Recommended)
+
+    Cache statistics are automatically included in the `AgentResult.metrics.accumulated_usage`
+
+- Method 2: OpenTelemetry Traces
+
+    Cache metrics are automatically recorded in OpenTelemetry traces when telemetry is enabled
+
 For detailed information about supported models, minimum token requirements, and other limitations, see the [Amazon Bedrock documentation on prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
 
 #### System Prompt Caching
@@ -338,9 +354,13 @@ agent = Agent(
 
 # First request will cache the system prompt
 response1 = agent("Tell me about Python")
+print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
+print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")
 
 # Second request will reuse the cached system prompt
 response2 = agent("Tell me about JavaScript")
+print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
+print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
 ```
 
 #### Tool Caching
@@ -365,9 +385,13 @@ agent = Agent(
 )
 # First request will cache the tools
 response1 = agent("What time is it?")
+print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
+print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")
 
 # Second request will reuse the cached tools
 response2 = agent("What is the square root of 1764?")
+print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
+print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
 ```
 
 #### Messages Caching