Add performance statistics display to mlc_llm serve command #2

Copilot · 2025-06-21T15:36:21Z

Summary

Adds real-time performance monitoring to the mlc_llm serve command by displaying request completion statistics in the terminal. When a request finishes processing, users will now see performance metrics including token counts and throughput rates.

Changes

Modified cpp/serve/engine_actions/action_commons.cc:
- Added #include <cstdio> header for printf functionality
- Added console output in ProcessFinishedRequestStateEntries function
- Displays prefill/decode token counts and TPS (tokens per second) calculations
- Added safety checks to handle edge cases (zero tokens, zero timing)

Example Output

[Request Completed] Prefill: 50 tokens, Decode: 100 tokens, Prefill TPS: 1200.5, Decode TPS: 25.3

Implementation Details

Minimal impact: Only 18 lines of code added using existing metrics infrastructure
Safe: Includes proper validation to prevent division by zero and invalid data
Efficient: Uses existing RequestMetrics data with no additional computation overhead
Non-blocking: Uses printf with fflush for immediate console output
Style compliant: Code follows project formatting standards via clang-format

Benefits

Real-time monitoring: Users can track model performance without external tools
Performance debugging: Helps identify bottlenecks in prefill vs decode phases
Production monitoring: Enables live performance assessment during serving

The feature only displays statistics when meaningful data is available (both prefill and decode tokens > 0, with valid timing), ensuring clean output without noise from incomplete requests.

Testing

Code compiles successfully with existing build system
Follows project code style guidelines (clang-format compliant)
Uses existing metrics infrastructure with no breaking changes

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

alohachen · 2025-06-21T15:37:06Z

thanks

Co-authored-by: alohachen <[email protected]>

Initial plan for issue

4d28416

Copilot AI assigned Copilot and alohachen Jun 21, 2025

Copilot started work on behalf of alohachen June 21, 2025 15:36 View session

Add performance statistics display to mlc_llm serve command

1d65cdb

Co-authored-by: alohachen <[email protected]>

Copilot AI changed the title ~~[WIP] 合并 PR #1：新增请求完成时打印 TPS 统计信息~~ Add performance statistics display to mlc_llm serve command Jun 21, 2025

Copilot AI requested a review from alohachen June 21, 2025 15:43

Copilot finished work on behalf of alohachen June 21, 2025 15:43

alohachen approved these changes Jun 21, 2025

View reviewed changes

alohachen marked this pull request as ready for review June 21, 2025 15:47

alohachen merged commit 3624511 into main Jun 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add performance statistics display to mlc_llm serve command #2

Add performance statistics display to mlc_llm serve command #2

Uh oh!

Copilot AI commented Jun 21, 2025 •

edited

Loading

Uh oh!

alohachen commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add performance statistics display to mlc_llm serve command #2

Add performance statistics display to mlc_llm serve command #2

Uh oh!

Conversation

Copilot AI commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Example Output

Implementation Details

Benefits

Testing

Uh oh!

alohachen commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jun 21, 2025 •

edited

Loading