-
Notifications
You must be signed in to change notification settings - Fork 660
keti9 #5090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
keti9 #5090
Conversation
|
root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR appears to be a work-in-progress or development/testing branch (titled "keti9") that introduces conditional logic for selecting between two MOE expert dispatch implementations. The PR lacks proper documentation and contains debugging artifacts.
Key Changes:
- Adds a new
f_ep_moe_expert_dispatchC++ custom operator implementation - Introduces global state management via
setting.pymodule to control CUDA graph usage - Adds conditional dispatch logic in the XPU MOE layer based on
use_cuda_graphflag - Includes test files and debug logging
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| custom_ops/xpu_ops/src/ops/f_moe_ep_dispatch.cc | New C++ operator implementing MOE expert dispatch without CUDA graph support |
| fastdeploy/model_executor/layers/backends/xpu/moe/fused_moe.py | Adds conditional logic to choose between dispatch implementations with debug prints |
| fastdeploy/config.py | Imports and sets global setting.use_cuda_graph state with debug prints |
| fastdeploy/input/ernie4_5_processor.py | Adds debug logging for request processing |
| setting.py | Global module storing use_cuda_graph configuration |
| fastdeploy/cuda_graph_config.py | Duplicate configuration file for CUDA graph setting |
| test_1_query.py | Test script for API server queries |
| scripts/build_and_run.sh | Build and run script for XPU testing |
| free_tensor, | ||
| set_weight_attrs, | ||
| ) | ||
| import setting |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing a top-level setting module creates a problematic global state dependency and breaks modularity. This makes the code difficult to test and creates hidden dependencies. Consider passing the use_cudagraph configuration through the existing config system or as a parameter to the class/method instead of relying on a global import.
| print(f"key is {key}") | ||
| print(f"value is {value}") |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statements should be removed before merging. Use the logging framework (e.g., logger.debug()) instead of print statements for production code.
| import setting | ||
| print(f"setting.use_cuda_graph : {setting.use_cuda_graph}") | ||
| setting.use_cuda_graph = self.use_cudagraph | ||
| print(f"setting.use_cuda_graph : {setting.use_cuda_graph}") |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a tight coupling to a global setting module and contains debug print statements. The configuration should be managed through the class's existing configuration system rather than mutating global state. Remove debug prints and refactor to avoid global state mutation.
| self.moe_quant_type, | ||
| ) | ||
| if setting.use_cuda_graph: | ||
| print(f"调用ep_moe_expert_dispatch") |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement in Chinese should be removed before merging. If this information is needed for debugging, use the logging framework with an appropriate log level (e.g., logger.debug()).
| self.moe_quant_type, | ||
| ) | ||
| else: | ||
| print(f"调用f_ep_moe_expert_dispatch") |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement in Chinese should be removed before merging. If this information is needed for debugging, use the logging framework with an appropriate log level (e.g., logger.debug()).
| @@ -0,0 +1,76 @@ | |||
| import json | |||
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test file should not be in the repository root. Test files should be organized in the tests/ directory with proper naming conventions. If this is a temporary development file, it should be removed before merging.
| # fi | ||
|
|
||
| # export XPUAPI_DEBUG=0x1 | ||
| export PYTHONPATH=/opt/output/work_dir/ssd2/yangshuang/work/yinwei/FD_FOR_KETI9 |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded absolute path specific to a developer's environment should not be committed. This path is machine-specific and will break on other systems. Use relative paths or environment variables that users can configure.
| const int topk = topk_ids_shape[topk_ids_shape.size() - 1]; | ||
| const int expert_num = token_nums_per_expert.size(); | ||
| return {{token_nums_this_rank, hidden_size}, | ||
| {expert_num, m}, |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output shape {expert_num, m} appears incorrect. Comparing to the original moe_ep_dispatch.cc, line 174 shows the correct shape should be {m, topk} for permute_indices_per_token. The dimensions are swapped, which will cause shape mismatch errors.
| {expert_num, m}, | |
| {m, topk}, |
| request["enable_thinking"] = True | ||
|
|
||
| data_processor_logger.info(f"Processed request dict: {request}") | ||
| data_processor_logger.info(f"Processed request dict: {request}, len(request.get('prompt_token_ids')): {len(request.get('prompt_token_ids'))}") |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This long log line should be split into multiple lines or simplified. Logging the entire request dict may expose sensitive information and create excessive log output. Consider logging only essential fields or using debug level for verbose output.
| @@ -0,0 +1,76 @@ | |||
| import json | |||
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'json' is not used.
| import json |
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.