-
Notifications
You must be signed in to change notification settings - Fork 57
support high priority stream #1715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds support for high priority streams in the XCCL process group. Key changes include adding a new Options struct with high priority and group name parameters, introducing a new groupRanks() accessor, and updating constructor and logging logic to reflect high priority stream usage.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/xccl/ProcessGroupXCCL.hpp | Added a high priority stream option and new Options struct for configuration. |
src/xccl/ProcessGroupXCCL.cpp | Updated constructor initialization, logging, and introduced groupRanks(). |
Comments suppressed due to low confidence (1)
src/xccl/ProcessGroupXCCL.hpp:25
- The constant TORCH_XCCL_HIGH_PRIORITY is defined as a non-const vector. Consider renaming and declaring it as a const container (or using constexpr) to clearly indicate its immutability.
static std::vector<std::string> TORCH_XCCL_HIGH_PRIORITY = {
@sys_pytorchxpubot triage result for run 15864761625Triage bot UT analaysis result for reference only, please note unique error message only report once:
Triage bot response: {
"similar_issue_id": 645,
"similar_issue_state": "closed",
"issue_owner": "daisyden",
"issue_description": "UT got failed with FP64 emulation feature. The reporter is mengfei25, and the assignee is daisyden. The issue is closed.",
"root_causes": [
"Issues related to tensor operations and reductions leading to precision mismatches.",
"Potential differences in computation between CPU and XPU implementations.",
"Possible issues with the CrossEntropyLoss implementation on XPU."
],
"suggested_solutions": [
"Investigate the precision handling in CrossEntropyLoss on XPU.",
"Check for any implementation differences causing scalar mismatches.",
"Consider allowing a small tolerance in scalar comparisons for CPU-GPU parity tests.",
"Review and update test cases to handle potential precision discrepancies."
]
} |
Support high priority stream for xccl, test case add in #2049 We need merge this pr first and upstream op register pytorch/pytorch#163049 and then test case could be pass --------- Co-authored-by: mengfei25 <[email protected]>
Support high priority stream for xccl, test case add in #2049
We need merge this pr first and upstream op register pytorch/pytorch#163049 and then test case could be pass