Enhanced Load Balancing Metrics for Model and Endpoint Performance Tracking

**Is your feature request related to a problem? Please describe.**
The semantic router currently prioritizes accuracy-based model selection but lacks load balancing metrics to track endpoint and model performance over different time intervals. This leads to potential load imbalances where high accuracy models/endpoints become bottlenecks. We need time-windowed metrics to enable load balancing that can trade off accuracy for latency and distribute load effectively.

**Describe the solution you'd like**
## Time-Windowed Performance Tracking
####  Multiple time horizons
1m, 5m, 15m, 1h, 24h windows (configurable)
#### Key metrics per window:
  * Request rates and completion rates
  * Latency distributions (P50, P95, P99)
  * Token throughput (prompt/completion tokens)
  * Error rates and timeout frequencies

## Sample Endpoint-Level Metrics
```
llm_endpoint_latency_windowed_seconds{endpoint, model, time_window}
llm_endpoint_requests_windowed_total{endpoint, model, time_window}  
llm_endpoint_tokens_windowed_total{endpoint, model, token_type, time_window}
llm_endpoint_utilization_percentage{endpoint, time_window}
llm_endpoint_queue_depth_estimated{endpoint, model}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhanced Load Balancing Metrics for Model and Endpoint Performance Tracking #227

Time-Windowed Performance Tracking

Multiple time horizons

Key metrics per window:

Sample Endpoint-Level Metrics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhanced Load Balancing Metrics for Model and Endpoint Performance Tracking #227

Description

Time-Windowed Performance Tracking

Multiple time horizons

Key metrics per window:

Sample Endpoint-Level Metrics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions