Skip to content

Agent Result and Metrics #70

@Unshure

Description

@Unshure

Agent Result and Metrics

Implement comprehensive metrics collection and reporting as part of the AgentResult interface. Provide detailed execution metrics including event loop cycles, model invocations, and tool execution statistics.

Implementation Requirements

Based on analysis of the Python SDK implementation and clarification discussion, this feature will provide comprehensive metrics collection with OpenTelemetry integration support for real-time streaming to backends like Langfuse.

Design Decisions

  1. Include Traces - Execution tree structure for detailed debugging
  2. Per-invocation model metrics - Both aggregated totals and per-invocation details
  3. Optional collection - Configurable via AgentConfig.enableMetrics (default: true)
  4. Metrics in AgentResult - Added as optional property
  5. OpenTelemetry real-time streaming - Critical requirement for live trace backends (Langfuse, etc.)

Metrics Interface Structure

interface Metrics {
  eventLoop: EventLoopMetrics
  model: ModelMetrics
  tools: ToolMetrics
  traces: Trace[]
}

interface EventLoopMetrics {
  cycleCount: number
  totalDurationMs: number
  cycleDurationsMs: number[]
}

interface ModelMetrics {
  invocationCount: number
  totalLatencyMs: number
  aggregatedUsage: Usage
  invocations: ModelInvocationMetrics[]
}

interface ModelInvocationMetrics {
  latencyMs: number
  usage: Usage
  timeToFirstByteMs?: number
}

interface Usage {
  inputTokens: number
  outputTokens: number
  totalTokens: number
  cacheReadInputTokens?: number
  cacheWriteInputTokens?: number
}

interface ToolMetrics {
  [toolName: string]: ToolExecutionMetrics
}

interface ToolExecutionMetrics {
  callCount: number
  successCount: number
  errorCount: number
  totalDurationMs: number
  averageDurationMs: number
}

interface Trace {
  id: string
  name: string
  startTime: number
  endTime?: number
  durationMs?: number
  parentId?: string
  children: Trace[]
  metadata?: Record<string, unknown>
}

Updated AgentResult Interface

interface AgentResult {
  stopReason: string
  lastMessage: Message
  metrics?: Metrics  // Optional - present when enableMetrics is true
}

Agent Configuration

interface AgentConfig {
  // ... existing fields
  enableMetrics?: boolean  // Default: true
  otelMeterProvider?: MeterProvider  // Optional: for real-time OTel streaming
}

Technical Approach

1. Metrics Interfaces (src/types/metrics.ts)

Create new file with all metrics-related interfaces:

  • Metrics (top-level container)
  • EventLoopMetrics (cycle tracking)
  • ModelMetrics (invocation tracking)
  • ModelInvocationMetrics (per-invocation details)
  • ToolMetrics (tool execution tracking)
  • ToolExecutionMetrics (per-tool stats)
  • Trace (execution tree node)

Note: Extend existing Usage and Metrics types from src/models/streaming.ts rather than duplicate.

2. Metrics Collector (src/agent/metrics-collector.ts)

Create new class to handle metrics collection with optional OpenTelemetry real-time emission:

export class MetricsCollector {
  private _eventLoopMetrics: EventLoopMetrics
  private _modelMetrics: ModelMetrics
  private _toolMetrics: ToolMetrics
  private _traces: Trace[]
  private _currentCycleTrace?: Trace
  
  // OpenTelemetry instruments (optional)
  private _otelMeter?: Meter
  private _otelInstruments?: {
    eventLoopCycleCount: Counter
    eventLoopCycleDuration: Histogram
    modelInvocationCount: Counter
    modelLatency: Histogram
    modelInputTokens: Histogram
    modelOutputTokens: Histogram
    modelCacheReadTokens: Histogram
    modelCacheWriteTokens: Histogram
    toolCallCount: Counter
    toolSuccessCount: Counter
    toolErrorCount: Counter
    toolDuration: Histogram
  }
  
  constructor(otelMeterProvider?: MeterProvider) {
    this._eventLoopMetrics = { cycleCount: 0, totalDurationMs: 0, cycleDurationsMs: [] }
    this._modelMetrics = { invocationCount: 0, totalLatencyMs: 0, aggregatedUsage: {...}, invocations: [] }
    this._toolMetrics = {}
    this._traces = []
    
    // Initialize OTel instruments if provider given
    if (otelMeterProvider) {
      this._otelMeter = otelMeterProvider.getMeter('strands-agents-sdk')
      this._initializeOTelInstruments()
    }
  }
  
  private _initializeOTelInstruments(): void {
    // Create all OTel instruments
    this._otelInstruments = {
      eventLoopCycleCount: this._otelMeter.createCounter('strands.event_loop.cycle.count'),
      eventLoopCycleDuration: this._otelMeter.createHistogram('strands.event_loop.cycle.duration'),
      modelInvocationCount: this._otelMeter.createCounter('strands.model.invocation.count'),
      modelLatency: this._otelMeter.createHistogram('strands.model.latency'),
      modelInputTokens: this._otelMeter.createHistogram('strands.model.input_tokens'),
      modelOutputTokens: this._otelMeter.createHistogram('strands.model.output_tokens'),
      modelCacheReadTokens: this._otelMeter.createHistogram('strands.model.cache_read_tokens'),
      modelCacheWriteTokens: this._otelMeter.createHistogram('strands.model.cache_write_tokens'),
      toolCallCount: this._otelMeter.createCounter('strands.tool.call.count'),
      toolSuccessCount: this._otelMeter.createCounter('strands.tool.success.count'),
      toolErrorCount: this._otelMeter.createCounter('strands.tool.error.count'),
      toolDuration: this._otelMeter.createHistogram('strands.tool.duration'),
    }
  }
  
  startCycle(): { startTime: number; trace: Trace } {
    const startTime = performance.now()
    const trace: Trace = {
      id: crypto.randomUUID(),
      name: `Cycle ${this._eventLoopMetrics.cycleCount + 1}`,
      startTime,
      children: []
    }
    this._traces.push(trace)
    this._currentCycleTrace = trace
    
    // Emit to OTel in real-time
    this._otelInstruments?.eventLoopCycleCount.add(1)
    
    return { startTime, trace }
  }
  
  endCycle(startTime: number, trace: Trace): void {
    const endTime = performance.now()
    const durationMs = endTime - startTime
    
    trace.endTime = endTime
    trace.durationMs = durationMs
    
    this._eventLoopMetrics.cycleCount++
    this._eventLoopMetrics.totalDurationMs += durationMs
    this._eventLoopMetrics.cycleDurationsMs.push(durationMs)
    
    // Emit to OTel in real-time
    this._otelInstruments?.eventLoopCycleDuration.record(durationMs)
  }
  
  recordModelInvocation(latency: number, usage: Usage, timeToFirstByte?: number): void {
    // Store per-invocation metrics
    const invocation: ModelInvocationMetrics = {
      latencyMs: latency,
      usage: { ...usage },
      timeToFirstByteMs: timeToFirstByte
    }
    this._modelMetrics.invocations.push(invocation)
    
    // Update aggregated metrics
    this._modelMetrics.invocationCount++
    this._modelMetrics.totalLatencyMs += latency
    this._modelMetrics.aggregatedUsage.inputTokens += usage.inputTokens
    this._modelMetrics.aggregatedUsage.outputTokens += usage.outputTokens
    this._modelMetrics.aggregatedUsage.totalTokens += usage.totalTokens
    if (usage.cacheReadInputTokens) {
      this._modelMetrics.aggregatedUsage.cacheReadInputTokens = 
        (this._modelMetrics.aggregatedUsage.cacheReadInputTokens || 0) + usage.cacheReadInputTokens
    }
    if (usage.cacheWriteInputTokens) {
      this._modelMetrics.aggregatedUsage.cacheWriteInputTokens = 
        (this._modelMetrics.aggregatedUsage.cacheWriteInputTokens || 0) + usage.cacheWriteInputTokens
    }
    
    // Emit to OTel in real-time
    if (this._otelInstruments) {
      this._otelInstruments.modelInvocationCount.add(1)
      this._otelInstruments.modelLatency.record(latency)
      this._otelInstruments.modelInputTokens.record(usage.inputTokens)
      this._otelInstruments.modelOutputTokens.record(usage.outputTokens)
      if (usage.cacheReadInputTokens) {
        this._otelInstruments.modelCacheReadTokens.record(usage.cacheReadInputTokens)
      }
      if (usage.cacheWriteInputTokens) {
        this._otelInstruments.modelCacheWriteTokens.record(usage.cacheWriteInputTokens)
      }
    }
  }
  
  startToolExecution(toolName: string, parentTrace: Trace): { startTime: number; trace: Trace } {
    const startTime = performance.now()
    const trace: Trace = {
      id: crypto.randomUUID(),
      name: toolName,
      startTime,
      parentId: parentTrace.id,
      children: [],
      metadata: { toolName }
    }
    parentTrace.children.push(trace)
    return { startTime, trace }
  }
  
  endToolExecution(toolName: string, startTime: number, success: boolean, trace: Trace): void {
    const endTime = performance.now()
    const durationMs = endTime - startTime
    
    trace.endTime = endTime
    trace.durationMs = durationMs
    trace.metadata = { ...trace.metadata, success }
    
    // Initialize or update tool metrics
    if (!this._toolMetrics[toolName]) {
      this._toolMetrics[toolName] = {
        callCount: 0,
        successCount: 0,
        errorCount: 0,
        totalDurationMs: 0,
        averageDurationMs: 0
      }
    }
    
    const toolMetric = this._toolMetrics[toolName]
    toolMetric.callCount++
    toolMetric.totalDurationMs += durationMs
    toolMetric.averageDurationMs = toolMetric.totalDurationMs / toolMetric.callCount
    
    if (success) {
      toolMetric.successCount++
    } else {
      toolMetric.errorCount++
    }
    
    // Emit to OTel in real-time
    if (this._otelInstruments) {
      const attributes = { tool_name: toolName }
      this._otelInstruments.toolCallCount.add(1, attributes)
      this._otelInstruments.toolDuration.record(durationMs, attributes)
      if (success) {
        this._otelInstruments.toolSuccessCount.add(1, attributes)
      } else {
        this._otelInstruments.toolErrorCount.add(1, attributes)
      }
    }
  }
  
  getMetrics(): Metrics {
    return {
      eventLoop: { ...this._eventLoopMetrics },
      model: {
        ...this._modelMetrics,
        invocations: [...this._modelMetrics.invocations]
      },
      tools: { ...this._toolMetrics },
      traces: this._traces.map(t => this._cloneTrace(t))
    }
  }
  
  private _cloneTrace(trace: Trace): Trace {
    return {
      ...trace,
      children: trace.children.map(c => this._cloneTrace(c)),
      metadata: trace.metadata ? { ...trace.metadata } : undefined
    }
  }
}

Key Design Points:

  • Dual Purpose: Builds in-memory Metrics object AND emits to OTel in real-time
  • Optional OTel: Only creates OTel instruments if MeterProvider is provided
  • Real-time Emission: Metrics are emitted to OTel as they're collected, not batched
  • Attributes: Tool metrics include tool name as OTel attribute for filtering/grouping

3. Agent Integration (src/agent/agent.ts)

Configuration:

constructor(config?: AgentConfig) {
  // ... existing code
  this._metricsCollector = config?.enableMetrics !== false 
    ? new MetricsCollector(config?.otelMeterProvider) 
    : undefined
}

Event Loop Integration:
In _stream() method, add metrics collection:

private async *_stream(args: InvokeArgs): AsyncGenerator<AgentStreamEvent, AgentResult, undefined> {
  const cycleState = this._metricsCollector?.startCycle()
  
  try {
    // Main loop with metrics collection
    while (true) {
      const modelResult = yield* this.invokeModel(currentArgs)
      // ... existing logic
      
      if (modelResult.stopReason !== 'toolUse') {
        this._metricsCollector?.endCycle(cycleState.startTime, cycleState.trace)
        return {
          stopReason: modelResult.stopReason,
          lastMessage: modelResult.message,
          metrics: this._metricsCollector?.getMetrics()
        }
      }
      
      // Tool execution with metrics
      const toolResultMessage = yield* this.executeTools(modelResult.message, this._toolRegistry)
      // ...
    }
  } finally {
    // Cleanup
  }
}

4. Model Metadata Handling (src/models/model.ts)

Update streamAggregated() to handle ModelMetadataEvent:

case 'modelMetadataEvent':
  // Capture usage and metrics from event
  if (event.usage) {
    usage = event.usage
  }
  if (event.metrics) {
    metrics = event.metrics
  }
  break

Pass metrics to collector in agent loop after model invocation.

5. Tool Execution Metrics (src/agent/agent.ts)

In executeTool() method, add timing and success tracking:

private async *executeTool(
  toolUseBlock: ToolUseBlock,
  toolRegistry: ToolRegistry
): AsyncGenerator<AgentStreamEvent, ToolResultBlock, undefined> {
  const toolState = this._metricsCollector?.startToolExecution(
    toolUseBlock.name,
    this._currentCycleTrace
  )
  
  let success = false
  try {
    const toolResult = yield* toolGenerator
    success = toolResult.status === 'success'
    return toolResult
  } finally {
    this._metricsCollector?.endToolExecution(
      toolUseBlock.name,
      toolState.startTime,
      success,
      toolState.trace
    )
  }
}

OpenTelemetry Integration

Approach: Integrated directly into MetricsCollector for real-time streaming.

Configuration:

import { MeterProvider } from '@opentelemetry/api'

// Initialize OTel (using Langfuse or other backend)
const meterProvider = new MeterProvider({
  // Configure exporters, processors, etc.
})

// Create agent with OTel enabled
const agent = new Agent({ 
  enableMetrics: true,  // Enable metrics collection
  otelMeterProvider: meterProvider  // Enable real-time OTel streaming
})

// Metrics are streamed to OTel backend during execution
const result = await agent.invoke('prompt')

// Metrics also available in result for programmatic access
console.log(result.metrics?.eventLoop.cycleCount)

OTel Instruments Created:

  • strands.event_loop.cycle.count (Counter)
  • strands.event_loop.cycle.duration (Histogram, milliseconds)
  • strands.model.invocation.count (Counter)
  • strands.model.latency (Histogram, milliseconds)
  • strands.model.input_tokens (Histogram)
  • strands.model.output_tokens (Histogram)
  • strands.model.cache_read_tokens (Histogram)
  • strands.model.cache_write_tokens (Histogram)
  • strands.tool.call.count (Counter, with tool_name attribute)
  • strands.tool.success.count (Counter, with tool_name attribute)
  • strands.tool.error.count (Counter, with tool_name attribute)
  • strands.tool.duration (Histogram, milliseconds, with tool_name attribute)

Benefits:

  • ✅ Real-time streaming to OTel backends (Langfuse, Jaeger, etc.)
  • ✅ No batch export needed - metrics emitted as they're collected
  • ✅ Works with any OTel-compatible backend
  • ✅ Still provides in-memory metrics in AgentResult
  • ✅ Optional - only enabled if MeterProvider is provided

Files to Create/Modify

New Files:

  1. src/types/metrics.ts - Metrics interface definitions (~150 lines)
  2. src/agent/metrics-collector.ts - Collection logic with OTel integration (~400 lines)
  3. src/agent/__tests__/metrics-collector.test.ts - Unit tests (~500 lines)
  4. tests_integ/metrics.test.ts - Integration test (~200 lines)

Modified Files:

  1. src/types/agent.ts - Update AgentResult interface, add Metrics exports (~10 lines)
  2. src/agent/agent.ts - Add config option, integrate collector (~100 lines changes)
  3. src/models/model.ts - Handle ModelMetadataEvent (~20 lines)
  4. src/index.ts - Export new metrics types (~5 lines)

Total Estimated Changes:

  • New code: ~1,250 lines
  • Modified code: ~135 lines
  • Test code: ~700 lines

Testing Strategy

Unit Tests (src/agent/__tests__/metrics-collector.test.ts):

  • Test MetricsCollector initialization (with and without OTel)
  • Test cycle tracking (start, end, durations)
  • Test model invocation recording (aggregated + per-invocation)
  • Test tool execution tracking (success/error, timing)
  • Test trace tree structure
  • Test metrics aggregation
  • Test OTel instrument emission (mock MeterProvider)
  • Test with metrics disabled

Integration Test (tests_integ/metrics.test.ts):

  • Full agent execution with metrics enabled
  • Verify all metrics are collected accurately
  • Verify trace tree structure
  • Test multiple cycles
  • Test tool invocations
  • Test model metadata capture
  • Test with metrics disabled
  • Test OTel integration with mock backend

Exit Criteria

  • ✅ Metrics interfaces defined and exported
  • ✅ MetricsCollector class implemented with all required methods
  • ✅ OpenTelemetry real-time emission integrated in MetricsCollector
  • ✅ Agent loop integrated with metrics collection
  • ✅ Model metadata (usage, latency) captured correctly
  • ✅ Tool execution metrics tracked (success/error, timing)
  • ✅ Trace tree built correctly for execution flow
  • ✅ AgentResult includes metrics when enabled
  • ✅ Metrics collection can be disabled via config
  • ✅ OTel integration works with real-time streaming
  • ✅ Unit tests pass with 80%+ coverage
  • ✅ Integration test validates end-to-end metrics collection
  • ✅ Documentation updated (TSDoc on all interfaces and methods)

Implementation Notes

  1. Trace IDs: Use crypto.randomUUID() for trace IDs (available in Node 14.17+)
  2. Timing: Use performance.now() for high-resolution timing
  3. Memory: Keep trace tree shallow (don't nest too deep)
  4. Serialization: Ensure all metrics structures are JSON-serializable
  5. Backwards Compatibility: Metrics is optional on AgentResult (existing code unaffected)
  6. Type Safety: Use TypeScript strict mode, no any types
  7. Testing: Test with both metrics enabled and disabled, and with/without OTel
  8. OTel Optional: OpenTelemetry integration is opt-in via otelMeterProvider config
  9. Real-time: Metrics are emitted to OTel instruments immediately as collected
  10. Dual Purpose: MetricsCollector both streams to OTel AND builds in-memory object

Related Documentation

  • Python SDK Metrics: src/strands/telemetry/metrics.py
  • Python SDK AgentResult: src/strands/agent/agent_result.py
  • Existing ModelMetadataEvent: src/models/streaming.ts (lines 209-267)
  • Usage and Metrics types: src/models/streaming.ts (lines 365-402)
  • OpenTelemetry Metrics API: https://opentelemetry.io/docs/instrumentation/js/instrumentation/

Future Enhancements (Out of Scope)

  • Metrics dashboard/visualization
  • Metrics persistence/storage
  • Metrics comparison across invocations
  • Custom metrics via hooks
  • Trace sampling/filtering for large executions
  • Span/trace integration (beyond metrics)
  • Automatic context propagation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions