support high priority stream #1715

Chao1Han · 2025-06-04T02:57:19Z

Support high priority stream for xccl, test case add in #2049
We need merge this pr first and upstream op register pytorch/pytorch#163049 and then test case could be pass

Copilot

Pull Request Overview

This pull request adds support for high priority streams in the XCCL process group. Key changes include adding a new Options struct with high priority and group name parameters, introducing a new groupRanks() accessor, and updating constructor and logging logic to reflect high priority stream usage.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/xccl/ProcessGroupXCCL.hpp	Added a high priority stream option and new Options struct for configuration.
src/xccl/ProcessGroupXCCL.cpp	Updated constructor initialization, logging, and introduced groupRanks().

Comments suppressed due to low confidence (1)

src/xccl/ProcessGroupXCCL.hpp:25

The constant TORCH_XCCL_HIGH_PRIORITY is defined as a non-const vector. Consider renaming and declaring it as a const container (or using constexpr) to clearly indicate its immutability.

static std::vector<std::string> TORCH_XCCL_HIGH_PRIORITY = {

src/xccl/ProcessGroupXCCL.cpp

pytorchxpubot · 2025-06-27T09:13:57Z

@sys_pytorchxpubot triage result for run 15864761625

Triage bot UT analaysis result for reference only, please note unique error message only report once:

third_party.torch-xpu-ops.test.xpu.test_modules_xpu.TestModuleXPU test_cpu_gpu_parity_nn_CrossEntropyLoss_xpu_float64 got failed with error message

 AssertionError: Scalars are not close!

Triage bot response:

{
  "similar_issue_id": 645,
  "similar_issue_state": "closed",
  "issue_owner": "daisyden",
  "issue_description": "UT got failed with FP64 emulation feature. The reporter is mengfei25, and the assignee is daisyden. The issue is closed.",
  "root_causes": [
    "Issues related to tensor operations and reductions leading to precision mismatches.",
    "Potential differences in computation between CPU and XPU implementations.",
    "Possible issues with the CrossEntropyLoss implementation on XPU."
  ],
  "suggested_solutions": [
    "Investigate the precision handling in CrossEntropyLoss on XPU.",
    "Check for any implementation differences causing scalar mismatches.",
    "Consider allowing a small tolerance in scalar comparisons for CPU-GPU parity tests.",
    "Review and update test cases to handle potential precision discrepancies."
  ]
}

Support high priority stream for xccl, test case add in #2049 We need merge this pr first and upstream op register pytorch/pytorch#163049 and then test case could be pass --------- Co-authored-by: mengfei25 <[email protected]>

Copilot AI review requested due to automatic review settings June 4, 2025 02:57

Copilot AI reviewed Jun 4, 2025

View reviewed changes

src/xccl/ProcessGroupXCCL.cpp Outdated Show resolved Hide resolved

Chao1Han and others added 3 commits June 4, 2025 18:48

support high priority stream

63caf97

add test

1d2fc1b

Merge branch 'main' into xccl/high_stream

ee8bc32

Chao1Han added 6 commits September 15, 2025 13:25

Merge remote-tracking branch 'origin/main' into xccl/high_stream

8d495aa

Merge branch 'main' into xccl/high_stream

dd445be

update

2c05540

update

a41c77e

rm test case

f99275e

Merge branch 'main' into xccl/high_stream

62daafd

This was referenced Sep 16, 2025

[xpu] Support high stream for ProcessGroupXCCL Chao1Han/pytorch#24

Open

[xpu] Support high stream for ProcessGroupXCCL pytorch/pytorch#163049

Open

zhangxiaoli73 approved these changes Sep 16, 2025

View reviewed changes

Merge branch 'main' into xccl/high_stream

e664c4a

chuanqi129 added this pull request to the merge queue Sep 17, 2025

Merged via the queue into main with commit 74b11bf Sep 17, 2025
25 checks passed

chuanqi129 deleted the xccl/high_stream branch September 17, 2025 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support high priority stream #1715

support high priority stream #1715

Uh oh!

Chao1Han commented Jun 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

pytorchxpubot commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

support high priority stream #1715

support high priority stream #1715

Uh oh!

Conversation

Chao1Han commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

pytorchxpubot commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Chao1Han commented Jun 4, 2025 •

edited

Loading