[UT]XCCL remains the default backend for XPU #1721

Chao1Han · 2025-06-05T05:16:20Z

This test is designed to verify that XCCL remains the default backend for XPU, even when other groups are registered as optional backends for XPU.

Copilot

Pull Request Overview

Adds a new unit test to confirm that XCCL remains the default distributed backend on XPU even after another backend is registered.

Introduces test_xccl_priority to register a dummy backend and run an all-reduce call without specifying a backend.
Leverages existing requires_xccl decorator to skip if XCCL isn’t available.

Comments suppressed due to low confidence (2)

test/xpu/distributed/test_c10d_xccl.py:568

The test currently only invokes all_reduce but doesn't assert that the default backend is actually XCCL. Consider retrieving the process group (e.g., via dist.distributed_c10d._get_default_group()) and asserting its type or backend name to ensure the priority behavior is verified.

dist.all_reduce(a)

test/xpu/distributed/test_c10d_xccl.py:555

[nitpick] The test name test_xccl_priority is a bit generic. Consider renaming to test_default_backend_is_xccl_when_fake_registered for clarity on what scenario is covered.

def test_xccl_priority(self):

Copilot · 2025-06-05T05:17:08Z

test/xpu/distributed/test_c10d_xccl.py

+        dist.Backend.register_backend(
+            "fake",
+            lambda store, rank, size, timeout: dist.ProcessGroup(rank, size),
+            devices=["xpu"],
+        )
+        store = dist.FileStore(self.file_name, self.world_size)
+        dist.init_process_group(
+            world_size=self.world_size,
+            rank=self.rank,
+            store=store,
+        )
+        a = torch.randn(2, device="xpu")
+        dist.all_reduce(a)


After registering the fake backend, consider unregistering it in a finally block or teardown step to avoid side effects on other tests.

Suggested change

dist.Backend.register_backend(

"fake",

lambda store, rank, size, timeout: dist.ProcessGroup(rank, size),

devices=["xpu"],

)

store = dist.FileStore(self.file_name, self.world_size)

dist.init_process_group(

world_size=self.world_size,

rank=self.rank,

store=store,

)

a = torch.randn(2, device="xpu")

dist.all_reduce(a)

try:

dist.Backend.register_backend(

"fake",

lambda store, rank, size, timeout: dist.ProcessGroup(rank, size),

devices=["xpu"],

)

store = dist.FileStore(self.file_name, self.world_size)

dist.init_process_group(

world_size=self.world_size,

rank=self.rank,

store=store,

)

a = torch.randn(2, device="xpu")

dist.all_reduce(a)

finally:

dist.Backend.unregister_backend("fake")

Other cases explicit init with backend xccl, it is safely no unregister.

Chao1Han · 2025-06-16T05:20:03Z

Close due to pytorch/pytorch#155320 merged

Copilot AI review requested due to automatic review settings June 5, 2025 05:16

Copilot AI reviewed Jun 5, 2025

View reviewed changes

Chao1Han mentioned this pull request Jun 5, 2025

Ensure XCCL is the default backend for XPU Chao1Han/pytorch#17

Closed

Chao1Han added 4 commits June 5, 2025 21:07

[UT]XCCL remains the default backend for XPU

d935481

test spetial branch

a1c0726

Merge branch 'main' into xccl/uts

91e1396

Merge branch 'main' into xccl/uts

9f86935

Chao1Han closed this Jun 16, 2025

Chao1Han deleted the xccl/uts branch June 16, 2025 05:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[UT]XCCL remains the default backend for XPU #1721

[UT]XCCL remains the default backend for XPU #1721

Uh oh!

Chao1Han commented Jun 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 5, 2025

Uh oh!

Chao1Han Jun 5, 2025

Uh oh!

Chao1Han commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[UT]XCCL remains the default backend for XPU #1721

[UT]XCCL remains the default backend for XPU #1721

Uh oh!

Conversation

Chao1Han commented Jun 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Chao1Han Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Chao1Han commented Jun 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants