Skip to content

Conversation

Copy link

Copilot AI commented Jun 21, 2025

Summary

Adds real-time performance monitoring to the mlc_llm serve command by displaying request completion statistics in the terminal. When a request finishes processing, users will now see performance metrics including token counts and throughput rates.

Changes

  • Modified cpp/serve/engine_actions/action_commons.cc:
    • Added #include <cstdio> header for printf functionality
    • Added console output in ProcessFinishedRequestStateEntries function
    • Displays prefill/decode token counts and TPS (tokens per second) calculations
    • Added safety checks to handle edge cases (zero tokens, zero timing)

Example Output

[Request Completed] Prefill: 50 tokens, Decode: 100 tokens, Prefill TPS: 1200.5, Decode TPS: 25.3

Implementation Details

  • Minimal impact: Only 18 lines of code added using existing metrics infrastructure
  • Safe: Includes proper validation to prevent division by zero and invalid data
  • Efficient: Uses existing RequestMetrics data with no additional computation overhead
  • Non-blocking: Uses printf with fflush for immediate console output
  • Style compliant: Code follows project formatting standards via clang-format

Benefits

  • Real-time monitoring: Users can track model performance without external tools
  • Performance debugging: Helps identify bottlenecks in prefill vs decode phases
  • Production monitoring: Enables live performance assessment during serving

The feature only displays statistics when meaningful data is available (both prefill and decode tokens > 0, with valid timing), ensuring clean output without noise from incomplete requests.

Testing

  • Code compiles successfully with existing build system
  • Follows project code style guidelines (clang-format compliant)
  • Uses existing metrics infrastructure with no breaking changes

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@alohachen
Copy link
Owner

thanks

Copilot AI changed the title [WIP] 合并 PR #1:新增请求完成时打印 TPS 统计信息 Add performance statistics display to mlc_llm serve command Jun 21, 2025
Copilot AI requested a review from alohachen June 21, 2025 15:43
Copilot finished work on behalf of alohachen June 21, 2025 15:43
@alohachen alohachen marked this pull request as ready for review June 21, 2025 15:47
@alohachen alohachen merged commit 3624511 into main Jun 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants