Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions docs/user-guide/concepts/model-providers/amazon-bedrock.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,22 @@ When you enable prompt caching, Amazon Bedrock creates a cache composed of **cac

The cache has a five-minute Time To Live (TTL), which resets with each successful cache hit. During this period, the context in the cache is preserved. If no cache hits occur within the TTL window, your cache expires.

When using prompt caching, Amazon Bedrock provides cache statistics including `CacheReadInputTokens` and `CacheWriteInputTokens`.

- `CacheWriteInputTokens`: Number of input tokens written to the cache (occurs on first request with new content).

- `CacheReadInputTokens`: Number of input tokens read from the cache (occurs on subsequent requests with cached content).

Strands automatically captures these metrics and makes them available through multiple methods:

- Method 1: AgentResult Metrics (Recommended)

Cache statistics are automatically included in the `AgentResult.metrics.accumulated_usage`

- Method 2: OpenTelemetry Traces

Cache metrics are automatically recorded in OpenTelemetry traces when telemetry is enabled

For detailed information about supported models, minimum token requirements, and other limitations, see the [Amazon Bedrock documentation on prompt caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).

#### System Prompt Caching
Expand All @@ -334,9 +350,13 @@ agent = Agent(

# First request will cache the system prompt
response1 = agent("Tell me about Python")
print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")

# Second request will reuse the cached system prompt
response2 = agent("Tell me about JavaScript")
print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
```

#### Tool Caching
Expand All @@ -361,9 +381,13 @@ agent = Agent(
)
# First request will cache the tools
response1 = agent("What time is it?")
print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")

# Second request will reuse the cached tools
response2 = agent("What is the square root of 1764?")
print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
```

#### Messages Caching
Expand Down