Agent Result and Metrics

# Agent Result and Metrics

Implement comprehensive metrics collection and reporting as part of the AgentResult interface. Provide detailed execution metrics including event loop cycles, model invocations, and tool execution statistics.

## Implementation Requirements

Based on analysis of the Python SDK implementation and clarification discussion, this feature will provide comprehensive metrics collection with OpenTelemetry integration support for real-time streaming to backends like Langfuse.

### Design Decisions

1. **Include Traces** - Execution tree structure for detailed debugging
2. **Per-invocation model metrics** - Both aggregated totals and per-invocation details
3. **Optional collection** - Configurable via `AgentConfig.enableMetrics` (default: `true`)
4. **Metrics in AgentResult** - Added as optional property
5. **OpenTelemetry real-time streaming** - Critical requirement for live trace backends (Langfuse, etc.)

### Metrics Interface Structure

```typescript
interface Metrics {
  eventLoop: EventLoopMetrics
  model: ModelMetrics
  tools: ToolMetrics
  traces: Trace[]
}

interface EventLoopMetrics {
  cycleCount: number
  totalDurationMs: number
  cycleDurationsMs: number[]
}

interface ModelMetrics {
  invocationCount: number
  totalLatencyMs: number
  aggregatedUsage: Usage
  invocations: ModelInvocationMetrics[]
}

interface ModelInvocationMetrics {
  latencyMs: number
  usage: Usage
  timeToFirstByteMs?: number
}

interface Usage {
  inputTokens: number
  outputTokens: number
  totalTokens: number
  cacheReadInputTokens?: number
  cacheWriteInputTokens?: number
}

interface ToolMetrics {
  [toolName: string]: ToolExecutionMetrics
}

interface ToolExecutionMetrics {
  callCount: number
  successCount: number
  errorCount: number
  totalDurationMs: number
  averageDurationMs: number
}

interface Trace {
  id: string
  name: string
  startTime: number
  endTime?: number
  durationMs?: number
  parentId?: string
  children: Trace[]
  metadata?: Record<string, unknown>
}
```

### Updated AgentResult Interface

```typescript
interface AgentResult {
  stopReason: string
  lastMessage: Message
  metrics?: Metrics  // Optional - present when enableMetrics is true
}
```

### Agent Configuration

```typescript
interface AgentConfig {
  // ... existing fields
  enableMetrics?: boolean  // Default: true
  otelMeterProvider?: MeterProvider  // Optional: for real-time OTel streaming
}
```

### Technical Approach

#### 1. Metrics Interfaces (`src/types/metrics.ts`)
Create new file with all metrics-related interfaces:
- `Metrics` (top-level container)
- `EventLoopMetrics` (cycle tracking)
- `ModelMetrics` (invocation tracking)
- `ModelInvocationMetrics` (per-invocation details)
- `ToolMetrics` (tool execution tracking)
- `ToolExecutionMetrics` (per-tool stats)
- `Trace` (execution tree node)

Note: Extend existing `Usage` and `Metrics` types from `src/models/streaming.ts` rather than duplicate.

#### 2. Metrics Collector (`src/agent/metrics-collector.ts`)
Create new class to handle metrics collection with optional OpenTelemetry real-time emission:

```typescript
export class MetricsCollector {
  private _eventLoopMetrics: EventLoopMetrics
  private _modelMetrics: ModelMetrics
  private _toolMetrics: ToolMetrics
  private _traces: Trace[]
  private _currentCycleTrace?: Trace
  
  // OpenTelemetry instruments (optional)
  private _otelMeter?: Meter
  private _otelInstruments?: {
    eventLoopCycleCount: Counter
    eventLoopCycleDuration: Histogram
    modelInvocationCount: Counter
    modelLatency: Histogram
    modelInputTokens: Histogram
    modelOutputTokens: Histogram
    modelCacheReadTokens: Histogram
    modelCacheWriteTokens: Histogram
    toolCallCount: Counter
    toolSuccessCount: Counter
    toolErrorCount: Counter
    toolDuration: Histogram
  }
  
  constructor(otelMeterProvider?: MeterProvider) {
    this._eventLoopMetrics = { cycleCount: 0, totalDurationMs: 0, cycleDurationsMs: [] }
    this._modelMetrics = { invocationCount: 0, totalLatencyMs: 0, aggregatedUsage: {...}, invocations: [] }
    this._toolMetrics = {}
    this._traces = []
    
    // Initialize OTel instruments if provider given
    if (otelMeterProvider) {
      this._otelMeter = otelMeterProvider.getMeter('strands-agents-sdk')
      this._initializeOTelInstruments()
    }
  }
  
  private _initializeOTelInstruments(): void {
    // Create all OTel instruments
    this._otelInstruments = {
      eventLoopCycleCount: this._otelMeter.createCounter('strands.event_loop.cycle.count'),
      eventLoopCycleDuration: this._otelMeter.createHistogram('strands.event_loop.cycle.duration'),
      modelInvocationCount: this._otelMeter.createCounter('strands.model.invocation.count'),
      modelLatency: this._otelMeter.createHistogram('strands.model.latency'),
      modelInputTokens: this._otelMeter.createHistogram('strands.model.input_tokens'),
      modelOutputTokens: this._otelMeter.createHistogram('strands.model.output_tokens'),
      modelCacheReadTokens: this._otelMeter.createHistogram('strands.model.cache_read_tokens'),
      modelCacheWriteTokens: this._otelMeter.createHistogram('strands.model.cache_write_tokens'),
      toolCallCount: this._otelMeter.createCounter('strands.tool.call.count'),
      toolSuccessCount: this._otelMeter.createCounter('strands.tool.success.count'),
      toolErrorCount: this._otelMeter.createCounter('strands.tool.error.count'),
      toolDuration: this._otelMeter.createHistogram('strands.tool.duration'),
    }
  }
  
  startCycle(): { startTime: number; trace: Trace } {
    const startTime = performance.now()
    const trace: Trace = {
      id: crypto.randomUUID(),
      name: `Cycle ${this._eventLoopMetrics.cycleCount + 1}`,
      startTime,
      children: []
    }
    this._traces.push(trace)
    this._currentCycleTrace = trace
    
    // Emit to OTel in real-time
    this._otelInstruments?.eventLoopCycleCount.add(1)
    
    return { startTime, trace }
  }
  
  endCycle(startTime: number, trace: Trace): void {
    const endTime = performance.now()
    const durationMs = endTime - startTime
    
    trace.endTime = endTime
    trace.durationMs = durationMs
    
    this._eventLoopMetrics.cycleCount++
    this._eventLoopMetrics.totalDurationMs += durationMs
    this._eventLoopMetrics.cycleDurationsMs.push(durationMs)
    
    // Emit to OTel in real-time
    this._otelInstruments?.eventLoopCycleDuration.record(durationMs)
  }
  
  recordModelInvocation(latency: number, usage: Usage, timeToFirstByte?: number): void {
    // Store per-invocation metrics
    const invocation: ModelInvocationMetrics = {
      latencyMs: latency,
      usage: { ...usage },
      timeToFirstByteMs: timeToFirstByte
    }
    this._modelMetrics.invocations.push(invocation)
    
    // Update aggregated metrics
    this._modelMetrics.invocationCount++
    this._modelMetrics.totalLatencyMs += latency
    this._modelMetrics.aggregatedUsage.inputTokens += usage.inputTokens
    this._modelMetrics.aggregatedUsage.outputTokens += usage.outputTokens
    this._modelMetrics.aggregatedUsage.totalTokens += usage.totalTokens
    if (usage.cacheReadInputTokens) {
      this._modelMetrics.aggregatedUsage.cacheReadInputTokens = 
        (this._modelMetrics.aggregatedUsage.cacheReadInputTokens || 0) + usage.cacheReadInputTokens
    }
    if (usage.cacheWriteInputTokens) {
      this._modelMetrics.aggregatedUsage.cacheWriteInputTokens = 
        (this._modelMetrics.aggregatedUsage.cacheWriteInputTokens || 0) + usage.cacheWriteInputTokens
    }
    
    // Emit to OTel in real-time
    if (this._otelInstruments) {
      this._otelInstruments.modelInvocationCount.add(1)
      this._otelInstruments.modelLatency.record(latency)
      this._otelInstruments.modelInputTokens.record(usage.inputTokens)
      this._otelInstruments.modelOutputTokens.record(usage.outputTokens)
      if (usage.cacheReadInputTokens) {
        this._otelInstruments.modelCacheReadTokens.record(usage.cacheReadInputTokens)
      }
      if (usage.cacheWriteInputTokens) {
        this._otelInstruments.modelCacheWriteTokens.record(usage.cacheWriteInputTokens)
      }
    }
  }
  
  startToolExecution(toolName: string, parentTrace: Trace): { startTime: number; trace: Trace } {
    const startTime = performance.now()
    const trace: Trace = {
      id: crypto.randomUUID(),
      name: toolName,
      startTime,
      parentId: parentTrace.id,
      children: [],
      metadata: { toolName }
    }
    parentTrace.children.push(trace)
    return { startTime, trace }
  }
  
  endToolExecution(toolName: string, startTime: number, success: boolean, trace: Trace): void {
    const endTime = performance.now()
    const durationMs = endTime - startTime
    
    trace.endTime = endTime
    trace.durationMs = durationMs
    trace.metadata = { ...trace.metadata, success }
    
    // Initialize or update tool metrics
    if (!this._toolMetrics[toolName]) {
      this._toolMetrics[toolName] = {
        callCount: 0,
        successCount: 0,
        errorCount: 0,
        totalDurationMs: 0,
        averageDurationMs: 0
      }
    }
    
    const toolMetric = this._toolMetrics[toolName]
    toolMetric.callCount++
    toolMetric.totalDurationMs += durationMs
    toolMetric.averageDurationMs = toolMetric.totalDurationMs / toolMetric.callCount
    
    if (success) {
      toolMetric.successCount++
    } else {
      toolMetric.errorCount++
    }
    
    // Emit to OTel in real-time
    if (this._otelInstruments) {
      const attributes = { tool_name: toolName }
      this._otelInstruments.toolCallCount.add(1, attributes)
      this._otelInstruments.toolDuration.record(durationMs, attributes)
      if (success) {
        this._otelInstruments.toolSuccessCount.add(1, attributes)
      } else {
        this._otelInstruments.toolErrorCount.add(1, attributes)
      }
    }
  }
  
  getMetrics(): Metrics {
    return {
      eventLoop: { ...this._eventLoopMetrics },
      model: {
        ...this._modelMetrics,
        invocations: [...this._modelMetrics.invocations]
      },
      tools: { ...this._toolMetrics },
      traces: this._traces.map(t => this._cloneTrace(t))
    }
  }
  
  private _cloneTrace(trace: Trace): Trace {
    return {
      ...trace,
      children: trace.children.map(c => this._cloneTrace(c)),
      metadata: trace.metadata ? { ...trace.metadata } : undefined
    }
  }
}
```

**Key Design Points:**
- **Dual Purpose**: Builds in-memory Metrics object AND emits to OTel in real-time
- **Optional OTel**: Only creates OTel instruments if MeterProvider is provided
- **Real-time Emission**: Metrics are emitted to OTel as they're collected, not batched
- **Attributes**: Tool metrics include tool name as OTel attribute for filtering/grouping

#### 3. Agent Integration (`src/agent/agent.ts`)

**Configuration:**
```typescript
constructor(config?: AgentConfig) {
  // ... existing code
  this._metricsCollector = config?.enableMetrics !== false 
    ? new MetricsCollector(config?.otelMeterProvider) 
    : undefined
}
```

**Event Loop Integration:**
In `_stream()` method, add metrics collection:
```typescript
private async *_stream(args: InvokeArgs): AsyncGenerator<AgentStreamEvent, AgentResult, undefined> {
  const cycleState = this._metricsCollector?.startCycle()
  
  try {
    // Main loop with metrics collection
    while (true) {
      const modelResult = yield* this.invokeModel(currentArgs)
      // ... existing logic
      
      if (modelResult.stopReason !== 'toolUse') {
        this._metricsCollector?.endCycle(cycleState.startTime, cycleState.trace)
        return {
          stopReason: modelResult.stopReason,
          lastMessage: modelResult.message,
          metrics: this._metricsCollector?.getMetrics()
        }
      }
      
      // Tool execution with metrics
      const toolResultMessage = yield* this.executeTools(modelResult.message, this._toolRegistry)
      // ...
    }
  } finally {
    // Cleanup
  }
}
```

#### 4. Model Metadata Handling (`src/models/model.ts`)

Update `streamAggregated()` to handle `ModelMetadataEvent`:

```typescript
case 'modelMetadataEvent':
  // Capture usage and metrics from event
  if (event.usage) {
    usage = event.usage
  }
  if (event.metrics) {
    metrics = event.metrics
  }
  break
```

Pass metrics to collector in agent loop after model invocation.

#### 5. Tool Execution Metrics (`src/agent/agent.ts`)

In `executeTool()` method, add timing and success tracking:

```typescript
private async *executeTool(
  toolUseBlock: ToolUseBlock,
  toolRegistry: ToolRegistry
): AsyncGenerator<AgentStreamEvent, ToolResultBlock, undefined> {
  const toolState = this._metricsCollector?.startToolExecution(
    toolUseBlock.name,
    this._currentCycleTrace
  )
  
  let success = false
  try {
    const toolResult = yield* toolGenerator
    success = toolResult.status === 'success'
    return toolResult
  } finally {
    this._metricsCollector?.endToolExecution(
      toolUseBlock.name,
      toolState.startTime,
      success,
      toolState.trace
    )
  }
}
```

### OpenTelemetry Integration

**Approach:** Integrated directly into MetricsCollector for real-time streaming.

**Configuration:**
```typescript
import { MeterProvider } from '@opentelemetry/api'

// Initialize OTel (using Langfuse or other backend)
const meterProvider = new MeterProvider({
  // Configure exporters, processors, etc.
})

// Create agent with OTel enabled
const agent = new Agent({ 
  enableMetrics: true,  // Enable metrics collection
  otelMeterProvider: meterProvider  // Enable real-time OTel streaming
})

// Metrics are streamed to OTel backend during execution
const result = await agent.invoke('prompt')

// Metrics also available in result for programmatic access
console.log(result.metrics?.eventLoop.cycleCount)
```

**OTel Instruments Created:**
- `strands.event_loop.cycle.count` (Counter)
- `strands.event_loop.cycle.duration` (Histogram, milliseconds)
- `strands.model.invocation.count` (Counter)
- `strands.model.latency` (Histogram, milliseconds)
- `strands.model.input_tokens` (Histogram)
- `strands.model.output_tokens` (Histogram)
- `strands.model.cache_read_tokens` (Histogram)
- `strands.model.cache_write_tokens` (Histogram)
- `strands.tool.call.count` (Counter, with `tool_name` attribute)
- `strands.tool.success.count` (Counter, with `tool_name` attribute)
- `strands.tool.error.count` (Counter, with `tool_name` attribute)
- `strands.tool.duration` (Histogram, milliseconds, with `tool_name` attribute)

**Benefits:**
- ✅ Real-time streaming to OTel backends (Langfuse, Jaeger, etc.)
- ✅ No batch export needed - metrics emitted as they're collected
- ✅ Works with any OTel-compatible backend
- ✅ Still provides in-memory metrics in AgentResult
- ✅ Optional - only enabled if MeterProvider is provided

### Files to Create/Modify

**New Files:**
1. `src/types/metrics.ts` - Metrics interface definitions (~150 lines)
2. `src/agent/metrics-collector.ts` - Collection logic with OTel integration (~400 lines)
3. `src/agent/__tests__/metrics-collector.test.ts` - Unit tests (~500 lines)
4. `tests_integ/metrics.test.ts` - Integration test (~200 lines)

**Modified Files:**
1. `src/types/agent.ts` - Update AgentResult interface, add Metrics exports (~10 lines)
2. `src/agent/agent.ts` - Add config option, integrate collector (~100 lines changes)
3. `src/models/model.ts` - Handle ModelMetadataEvent (~20 lines)
4. `src/index.ts` - Export new metrics types (~5 lines)

**Total Estimated Changes:**
- New code: ~1,250 lines
- Modified code: ~135 lines
- Test code: ~700 lines

### Testing Strategy

**Unit Tests (`src/agent/__tests__/metrics-collector.test.ts`):**
- Test MetricsCollector initialization (with and without OTel)
- Test cycle tracking (start, end, durations)
- Test model invocation recording (aggregated + per-invocation)
- Test tool execution tracking (success/error, timing)
- Test trace tree structure
- Test metrics aggregation
- Test OTel instrument emission (mock MeterProvider)
- Test with metrics disabled

**Integration Test (`tests_integ/metrics.test.ts`):**
- Full agent execution with metrics enabled
- Verify all metrics are collected accurately
- Verify trace tree structure
- Test multiple cycles
- Test tool invocations
- Test model metadata capture
- Test with metrics disabled
- Test OTel integration with mock backend

### Exit Criteria

- ✅ Metrics interfaces defined and exported
- ✅ MetricsCollector class implemented with all required methods
- ✅ OpenTelemetry real-time emission integrated in MetricsCollector
- ✅ Agent loop integrated with metrics collection
- ✅ Model metadata (usage, latency) captured correctly
- ✅ Tool execution metrics tracked (success/error, timing)
- ✅ Trace tree built correctly for execution flow
- ✅ AgentResult includes metrics when enabled
- ✅ Metrics collection can be disabled via config
- ✅ OTel integration works with real-time streaming
- ✅ Unit tests pass with 80%+ coverage
- ✅ Integration test validates end-to-end metrics collection
- ✅ Documentation updated (TSDoc on all interfaces and methods)

### Implementation Notes

1. **Trace IDs**: Use `crypto.randomUUID()` for trace IDs (available in Node 14.17+)
2. **Timing**: Use `performance.now()` for high-resolution timing
3. **Memory**: Keep trace tree shallow (don't nest too deep)
4. **Serialization**: Ensure all metrics structures are JSON-serializable
5. **Backwards Compatibility**: Metrics is optional on AgentResult (existing code unaffected)
6. **Type Safety**: Use TypeScript strict mode, no `any` types
7. **Testing**: Test with both metrics enabled and disabled, and with/without OTel
8. **OTel Optional**: OpenTelemetry integration is opt-in via `otelMeterProvider` config
9. **Real-time**: Metrics are emitted to OTel instruments immediately as collected
10. **Dual Purpose**: MetricsCollector both streams to OTel AND builds in-memory object

### Related Documentation

- Python SDK Metrics: `src/strands/telemetry/metrics.py`
- Python SDK AgentResult: `src/strands/agent/agent_result.py`
- Existing ModelMetadataEvent: `src/models/streaming.ts` (lines 209-267)
- Usage and Metrics types: `src/models/streaming.ts` (lines 365-402)
- OpenTelemetry Metrics API: https://opentelemetry.io/docs/instrumentation/js/instrumentation/

### Future Enhancements (Out of Scope)

- Metrics dashboard/visualization
- Metrics persistence/storage
- Metrics comparison across invocations
- Custom metrics via hooks
- Trace sampling/filtering for large executions
- Span/trace integration (beyond metrics)
- Automatic context propagation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent Result and Metrics #70

Agent Result and Metrics

Implementation Requirements

Design Decisions

Metrics Interface Structure

Updated AgentResult Interface

Agent Configuration

Technical Approach

1. Metrics Interfaces (`src/types/metrics.ts`)

2. Metrics Collector (`src/agent/metrics-collector.ts`)

3. Agent Integration (`src/agent/agent.ts`)

4. Model Metadata Handling (`src/models/model.ts`)

5. Tool Execution Metrics (`src/agent/agent.ts`)

OpenTelemetry Integration

Files to Create/Modify

Testing Strategy

Exit Criteria

Implementation Notes

Related Documentation

Future Enhancements (Out of Scope)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agent Result and Metrics #70

Description

Agent Result and Metrics

Implementation Requirements

Design Decisions

Metrics Interface Structure

Updated AgentResult Interface

Agent Configuration

Technical Approach

1. Metrics Interfaces (src/types/metrics.ts)

2. Metrics Collector (src/agent/metrics-collector.ts)

3. Agent Integration (src/agent/agent.ts)

4. Model Metadata Handling (src/models/model.ts)

5. Tool Execution Metrics (src/agent/agent.ts)

OpenTelemetry Integration

Files to Create/Modify

Testing Strategy

Exit Criteria

Implementation Notes

Related Documentation

Future Enhancements (Out of Scope)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Metrics Interfaces (`src/types/metrics.ts`)

2. Metrics Collector (`src/agent/metrics-collector.ts`)

3. Agent Integration (`src/agent/agent.ts`)

4. Model Metadata Handling (`src/models/model.ts`)

5. Tool Execution Metrics (`src/agent/agent.ts`)