-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Agent Result and Metrics
Implement comprehensive metrics collection and reporting as part of the AgentResult interface. Provide detailed execution metrics including event loop cycles, model invocations, and tool execution statistics.
Implementation Requirements
Based on analysis of the Python SDK implementation and clarification discussion, this feature will provide comprehensive metrics collection with OpenTelemetry integration support for real-time streaming to backends like Langfuse.
Design Decisions
- Include Traces - Execution tree structure for detailed debugging
- Per-invocation model metrics - Both aggregated totals and per-invocation details
- Optional collection - Configurable via
AgentConfig.enableMetrics(default:true) - Metrics in AgentResult - Added as optional property
- OpenTelemetry real-time streaming - Critical requirement for live trace backends (Langfuse, etc.)
Metrics Interface Structure
interface Metrics {
eventLoop: EventLoopMetrics
model: ModelMetrics
tools: ToolMetrics
traces: Trace[]
}
interface EventLoopMetrics {
cycleCount: number
totalDurationMs: number
cycleDurationsMs: number[]
}
interface ModelMetrics {
invocationCount: number
totalLatencyMs: number
aggregatedUsage: Usage
invocations: ModelInvocationMetrics[]
}
interface ModelInvocationMetrics {
latencyMs: number
usage: Usage
timeToFirstByteMs?: number
}
interface Usage {
inputTokens: number
outputTokens: number
totalTokens: number
cacheReadInputTokens?: number
cacheWriteInputTokens?: number
}
interface ToolMetrics {
[toolName: string]: ToolExecutionMetrics
}
interface ToolExecutionMetrics {
callCount: number
successCount: number
errorCount: number
totalDurationMs: number
averageDurationMs: number
}
interface Trace {
id: string
name: string
startTime: number
endTime?: number
durationMs?: number
parentId?: string
children: Trace[]
metadata?: Record<string, unknown>
}Updated AgentResult Interface
interface AgentResult {
stopReason: string
lastMessage: Message
metrics?: Metrics // Optional - present when enableMetrics is true
}Agent Configuration
interface AgentConfig {
// ... existing fields
enableMetrics?: boolean // Default: true
otelMeterProvider?: MeterProvider // Optional: for real-time OTel streaming
}Technical Approach
1. Metrics Interfaces (src/types/metrics.ts)
Create new file with all metrics-related interfaces:
Metrics(top-level container)EventLoopMetrics(cycle tracking)ModelMetrics(invocation tracking)ModelInvocationMetrics(per-invocation details)ToolMetrics(tool execution tracking)ToolExecutionMetrics(per-tool stats)Trace(execution tree node)
Note: Extend existing Usage and Metrics types from src/models/streaming.ts rather than duplicate.
2. Metrics Collector (src/agent/metrics-collector.ts)
Create new class to handle metrics collection with optional OpenTelemetry real-time emission:
export class MetricsCollector {
private _eventLoopMetrics: EventLoopMetrics
private _modelMetrics: ModelMetrics
private _toolMetrics: ToolMetrics
private _traces: Trace[]
private _currentCycleTrace?: Trace
// OpenTelemetry instruments (optional)
private _otelMeter?: Meter
private _otelInstruments?: {
eventLoopCycleCount: Counter
eventLoopCycleDuration: Histogram
modelInvocationCount: Counter
modelLatency: Histogram
modelInputTokens: Histogram
modelOutputTokens: Histogram
modelCacheReadTokens: Histogram
modelCacheWriteTokens: Histogram
toolCallCount: Counter
toolSuccessCount: Counter
toolErrorCount: Counter
toolDuration: Histogram
}
constructor(otelMeterProvider?: MeterProvider) {
this._eventLoopMetrics = { cycleCount: 0, totalDurationMs: 0, cycleDurationsMs: [] }
this._modelMetrics = { invocationCount: 0, totalLatencyMs: 0, aggregatedUsage: {...}, invocations: [] }
this._toolMetrics = {}
this._traces = []
// Initialize OTel instruments if provider given
if (otelMeterProvider) {
this._otelMeter = otelMeterProvider.getMeter('strands-agents-sdk')
this._initializeOTelInstruments()
}
}
private _initializeOTelInstruments(): void {
// Create all OTel instruments
this._otelInstruments = {
eventLoopCycleCount: this._otelMeter.createCounter('strands.event_loop.cycle.count'),
eventLoopCycleDuration: this._otelMeter.createHistogram('strands.event_loop.cycle.duration'),
modelInvocationCount: this._otelMeter.createCounter('strands.model.invocation.count'),
modelLatency: this._otelMeter.createHistogram('strands.model.latency'),
modelInputTokens: this._otelMeter.createHistogram('strands.model.input_tokens'),
modelOutputTokens: this._otelMeter.createHistogram('strands.model.output_tokens'),
modelCacheReadTokens: this._otelMeter.createHistogram('strands.model.cache_read_tokens'),
modelCacheWriteTokens: this._otelMeter.createHistogram('strands.model.cache_write_tokens'),
toolCallCount: this._otelMeter.createCounter('strands.tool.call.count'),
toolSuccessCount: this._otelMeter.createCounter('strands.tool.success.count'),
toolErrorCount: this._otelMeter.createCounter('strands.tool.error.count'),
toolDuration: this._otelMeter.createHistogram('strands.tool.duration'),
}
}
startCycle(): { startTime: number; trace: Trace } {
const startTime = performance.now()
const trace: Trace = {
id: crypto.randomUUID(),
name: `Cycle ${this._eventLoopMetrics.cycleCount + 1}`,
startTime,
children: []
}
this._traces.push(trace)
this._currentCycleTrace = trace
// Emit to OTel in real-time
this._otelInstruments?.eventLoopCycleCount.add(1)
return { startTime, trace }
}
endCycle(startTime: number, trace: Trace): void {
const endTime = performance.now()
const durationMs = endTime - startTime
trace.endTime = endTime
trace.durationMs = durationMs
this._eventLoopMetrics.cycleCount++
this._eventLoopMetrics.totalDurationMs += durationMs
this._eventLoopMetrics.cycleDurationsMs.push(durationMs)
// Emit to OTel in real-time
this._otelInstruments?.eventLoopCycleDuration.record(durationMs)
}
recordModelInvocation(latency: number, usage: Usage, timeToFirstByte?: number): void {
// Store per-invocation metrics
const invocation: ModelInvocationMetrics = {
latencyMs: latency,
usage: { ...usage },
timeToFirstByteMs: timeToFirstByte
}
this._modelMetrics.invocations.push(invocation)
// Update aggregated metrics
this._modelMetrics.invocationCount++
this._modelMetrics.totalLatencyMs += latency
this._modelMetrics.aggregatedUsage.inputTokens += usage.inputTokens
this._modelMetrics.aggregatedUsage.outputTokens += usage.outputTokens
this._modelMetrics.aggregatedUsage.totalTokens += usage.totalTokens
if (usage.cacheReadInputTokens) {
this._modelMetrics.aggregatedUsage.cacheReadInputTokens =
(this._modelMetrics.aggregatedUsage.cacheReadInputTokens || 0) + usage.cacheReadInputTokens
}
if (usage.cacheWriteInputTokens) {
this._modelMetrics.aggregatedUsage.cacheWriteInputTokens =
(this._modelMetrics.aggregatedUsage.cacheWriteInputTokens || 0) + usage.cacheWriteInputTokens
}
// Emit to OTel in real-time
if (this._otelInstruments) {
this._otelInstruments.modelInvocationCount.add(1)
this._otelInstruments.modelLatency.record(latency)
this._otelInstruments.modelInputTokens.record(usage.inputTokens)
this._otelInstruments.modelOutputTokens.record(usage.outputTokens)
if (usage.cacheReadInputTokens) {
this._otelInstruments.modelCacheReadTokens.record(usage.cacheReadInputTokens)
}
if (usage.cacheWriteInputTokens) {
this._otelInstruments.modelCacheWriteTokens.record(usage.cacheWriteInputTokens)
}
}
}
startToolExecution(toolName: string, parentTrace: Trace): { startTime: number; trace: Trace } {
const startTime = performance.now()
const trace: Trace = {
id: crypto.randomUUID(),
name: toolName,
startTime,
parentId: parentTrace.id,
children: [],
metadata: { toolName }
}
parentTrace.children.push(trace)
return { startTime, trace }
}
endToolExecution(toolName: string, startTime: number, success: boolean, trace: Trace): void {
const endTime = performance.now()
const durationMs = endTime - startTime
trace.endTime = endTime
trace.durationMs = durationMs
trace.metadata = { ...trace.metadata, success }
// Initialize or update tool metrics
if (!this._toolMetrics[toolName]) {
this._toolMetrics[toolName] = {
callCount: 0,
successCount: 0,
errorCount: 0,
totalDurationMs: 0,
averageDurationMs: 0
}
}
const toolMetric = this._toolMetrics[toolName]
toolMetric.callCount++
toolMetric.totalDurationMs += durationMs
toolMetric.averageDurationMs = toolMetric.totalDurationMs / toolMetric.callCount
if (success) {
toolMetric.successCount++
} else {
toolMetric.errorCount++
}
// Emit to OTel in real-time
if (this._otelInstruments) {
const attributes = { tool_name: toolName }
this._otelInstruments.toolCallCount.add(1, attributes)
this._otelInstruments.toolDuration.record(durationMs, attributes)
if (success) {
this._otelInstruments.toolSuccessCount.add(1, attributes)
} else {
this._otelInstruments.toolErrorCount.add(1, attributes)
}
}
}
getMetrics(): Metrics {
return {
eventLoop: { ...this._eventLoopMetrics },
model: {
...this._modelMetrics,
invocations: [...this._modelMetrics.invocations]
},
tools: { ...this._toolMetrics },
traces: this._traces.map(t => this._cloneTrace(t))
}
}
private _cloneTrace(trace: Trace): Trace {
return {
...trace,
children: trace.children.map(c => this._cloneTrace(c)),
metadata: trace.metadata ? { ...trace.metadata } : undefined
}
}
}Key Design Points:
- Dual Purpose: Builds in-memory Metrics object AND emits to OTel in real-time
- Optional OTel: Only creates OTel instruments if MeterProvider is provided
- Real-time Emission: Metrics are emitted to OTel as they're collected, not batched
- Attributes: Tool metrics include tool name as OTel attribute for filtering/grouping
3. Agent Integration (src/agent/agent.ts)
Configuration:
constructor(config?: AgentConfig) {
// ... existing code
this._metricsCollector = config?.enableMetrics !== false
? new MetricsCollector(config?.otelMeterProvider)
: undefined
}Event Loop Integration:
In _stream() method, add metrics collection:
private async *_stream(args: InvokeArgs): AsyncGenerator<AgentStreamEvent, AgentResult, undefined> {
const cycleState = this._metricsCollector?.startCycle()
try {
// Main loop with metrics collection
while (true) {
const modelResult = yield* this.invokeModel(currentArgs)
// ... existing logic
if (modelResult.stopReason !== 'toolUse') {
this._metricsCollector?.endCycle(cycleState.startTime, cycleState.trace)
return {
stopReason: modelResult.stopReason,
lastMessage: modelResult.message,
metrics: this._metricsCollector?.getMetrics()
}
}
// Tool execution with metrics
const toolResultMessage = yield* this.executeTools(modelResult.message, this._toolRegistry)
// ...
}
} finally {
// Cleanup
}
}4. Model Metadata Handling (src/models/model.ts)
Update streamAggregated() to handle ModelMetadataEvent:
case 'modelMetadataEvent':
// Capture usage and metrics from event
if (event.usage) {
usage = event.usage
}
if (event.metrics) {
metrics = event.metrics
}
breakPass metrics to collector in agent loop after model invocation.
5. Tool Execution Metrics (src/agent/agent.ts)
In executeTool() method, add timing and success tracking:
private async *executeTool(
toolUseBlock: ToolUseBlock,
toolRegistry: ToolRegistry
): AsyncGenerator<AgentStreamEvent, ToolResultBlock, undefined> {
const toolState = this._metricsCollector?.startToolExecution(
toolUseBlock.name,
this._currentCycleTrace
)
let success = false
try {
const toolResult = yield* toolGenerator
success = toolResult.status === 'success'
return toolResult
} finally {
this._metricsCollector?.endToolExecution(
toolUseBlock.name,
toolState.startTime,
success,
toolState.trace
)
}
}OpenTelemetry Integration
Approach: Integrated directly into MetricsCollector for real-time streaming.
Configuration:
import { MeterProvider } from '@opentelemetry/api'
// Initialize OTel (using Langfuse or other backend)
const meterProvider = new MeterProvider({
// Configure exporters, processors, etc.
})
// Create agent with OTel enabled
const agent = new Agent({
enableMetrics: true, // Enable metrics collection
otelMeterProvider: meterProvider // Enable real-time OTel streaming
})
// Metrics are streamed to OTel backend during execution
const result = await agent.invoke('prompt')
// Metrics also available in result for programmatic access
console.log(result.metrics?.eventLoop.cycleCount)OTel Instruments Created:
strands.event_loop.cycle.count(Counter)strands.event_loop.cycle.duration(Histogram, milliseconds)strands.model.invocation.count(Counter)strands.model.latency(Histogram, milliseconds)strands.model.input_tokens(Histogram)strands.model.output_tokens(Histogram)strands.model.cache_read_tokens(Histogram)strands.model.cache_write_tokens(Histogram)strands.tool.call.count(Counter, withtool_nameattribute)strands.tool.success.count(Counter, withtool_nameattribute)strands.tool.error.count(Counter, withtool_nameattribute)strands.tool.duration(Histogram, milliseconds, withtool_nameattribute)
Benefits:
- ✅ Real-time streaming to OTel backends (Langfuse, Jaeger, etc.)
- ✅ No batch export needed - metrics emitted as they're collected
- ✅ Works with any OTel-compatible backend
- ✅ Still provides in-memory metrics in AgentResult
- ✅ Optional - only enabled if MeterProvider is provided
Files to Create/Modify
New Files:
src/types/metrics.ts- Metrics interface definitions (~150 lines)src/agent/metrics-collector.ts- Collection logic with OTel integration (~400 lines)src/agent/__tests__/metrics-collector.test.ts- Unit tests (~500 lines)tests_integ/metrics.test.ts- Integration test (~200 lines)
Modified Files:
src/types/agent.ts- Update AgentResult interface, add Metrics exports (~10 lines)src/agent/agent.ts- Add config option, integrate collector (~100 lines changes)src/models/model.ts- Handle ModelMetadataEvent (~20 lines)src/index.ts- Export new metrics types (~5 lines)
Total Estimated Changes:
- New code: ~1,250 lines
- Modified code: ~135 lines
- Test code: ~700 lines
Testing Strategy
Unit Tests (src/agent/__tests__/metrics-collector.test.ts):
- Test MetricsCollector initialization (with and without OTel)
- Test cycle tracking (start, end, durations)
- Test model invocation recording (aggregated + per-invocation)
- Test tool execution tracking (success/error, timing)
- Test trace tree structure
- Test metrics aggregation
- Test OTel instrument emission (mock MeterProvider)
- Test with metrics disabled
Integration Test (tests_integ/metrics.test.ts):
- Full agent execution with metrics enabled
- Verify all metrics are collected accurately
- Verify trace tree structure
- Test multiple cycles
- Test tool invocations
- Test model metadata capture
- Test with metrics disabled
- Test OTel integration with mock backend
Exit Criteria
- ✅ Metrics interfaces defined and exported
- ✅ MetricsCollector class implemented with all required methods
- ✅ OpenTelemetry real-time emission integrated in MetricsCollector
- ✅ Agent loop integrated with metrics collection
- ✅ Model metadata (usage, latency) captured correctly
- ✅ Tool execution metrics tracked (success/error, timing)
- ✅ Trace tree built correctly for execution flow
- ✅ AgentResult includes metrics when enabled
- ✅ Metrics collection can be disabled via config
- ✅ OTel integration works with real-time streaming
- ✅ Unit tests pass with 80%+ coverage
- ✅ Integration test validates end-to-end metrics collection
- ✅ Documentation updated (TSDoc on all interfaces and methods)
Implementation Notes
- Trace IDs: Use
crypto.randomUUID()for trace IDs (available in Node 14.17+) - Timing: Use
performance.now()for high-resolution timing - Memory: Keep trace tree shallow (don't nest too deep)
- Serialization: Ensure all metrics structures are JSON-serializable
- Backwards Compatibility: Metrics is optional on AgentResult (existing code unaffected)
- Type Safety: Use TypeScript strict mode, no
anytypes - Testing: Test with both metrics enabled and disabled, and with/without OTel
- OTel Optional: OpenTelemetry integration is opt-in via
otelMeterProviderconfig - Real-time: Metrics are emitted to OTel instruments immediately as collected
- Dual Purpose: MetricsCollector both streams to OTel AND builds in-memory object
Related Documentation
- Python SDK Metrics:
src/strands/telemetry/metrics.py - Python SDK AgentResult:
src/strands/agent/agent_result.py - Existing ModelMetadataEvent:
src/models/streaming.ts(lines 209-267) - Usage and Metrics types:
src/models/streaming.ts(lines 365-402) - OpenTelemetry Metrics API: https://opentelemetry.io/docs/instrumentation/js/instrumentation/
Future Enhancements (Out of Scope)
- Metrics dashboard/visualization
- Metrics persistence/storage
- Metrics comparison across invocations
- Custom metrics via hooks
- Trace sampling/filtering for large executions
- Span/trace integration (beyond metrics)
- Automatic context propagation