-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue
Description
🐛 Describe the bug
please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/17670118783 or use gh download
gh run download 17670118783 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.9 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
python test/distributed/pipelining/test_stage.py StageTest.test_custom_dw_with_fb_schedule
sdp@3ceceffe05c2:~/xiangdong/pytorch$ python test/distributed/pipelining/test_stage.py StageTest.test_custom_dw_with_fb_schedule
I0918 19:19:02.223000 1627062 site-packages/torch/testing/_internal/common_distributed.py:1776] Testing class StageTest on 4 xpu
[2025-09-18 19:19:03.906] [warning] [sycl_collector.h:388] Another subscriber already subscribed to Sycl runtime events, so PTI will not subscribe to them. It will affect correctness of PTI profile: e.g. report zero XPU time for CPU callers of GPU kernels.
[2025-09-18 19:19:03.912] [warning] [sycl_collector.h:388] Another subscriber already subscribed to Sycl runtime events, so PTI will not subscribe to them. It will affect correctness of PTI profile: e.g. report zero XPU time for CPU callers of GPU kernels.
[2025-09-18 19:19:04.002] [warning] [sycl_collector.h:388] Another subscriber already subscribed to Sycl runtime events, so PTI will not subscribe to them. It will affect correctness of PTI profile: e.g. report zero XPU time for CPU callers of GPU kernels.
[2025-09-18 19:19:04.008] [warning] [sycl_collector.h:388] Another subscriber already subscribed to Sycl runtime events, so PTI will not subscribe to them. It will affect correctness of PTI profile: e.g. report zero XPU time for CPU callers of GPU kernels.
2025:09:18-19:19:04:1627257 |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2025:09:18-19:19:04:1627257 |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2025:09:18-19:19:04:1627258 |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2025:09:18-19:19:04:1627258 |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2025:09:18-19:19:04:1627259 |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2025:09:18-19:19:04:1627259 |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2025:09:18-19:19:04:1627256 |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi
2025:09:18-19:19:04:1627256 |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
hang here
Versions
pytorch: https://github.com/daisyden/pytorch/tree/distributed_2.9
Goldas99
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule: distributedFor distributed feature issueFor distributed feature issue