Skip to content

[distributed] error message does not match #1572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
daisyden opened this issue Apr 11, 2025 · 2 comments
Closed

[distributed] error message does not match #1572

daisyden opened this issue Apr 11, 2025 · 2 comments
Assignees
Labels
bug Something isn't working module: distributed For distributed feature issue
Milestone

Comments

@daisyden
Copy link
Contributor

daisyden commented Apr 11, 2025

🐛 Describe the bug

Unit test test.distributed._composable.fsdp.test_fully_shard_state_dict.TestFullyShardStateDictMultiProcess | test_dp_state_dict_cpu_offload got assertion:

AssertionError: "Found following parameters on non-CPU device: [('0.weight', device(type={device_type}" does not match "FSDP parameters should be materialized on CPU when enabling CPU offloading. For example, load a CPU state dict or call module.to_empty(device="cpu"). Found following parameters on non-CPU device: [('0.weight', device(type='xpu', index=7))]

Versions

2025.1 oneapi
2025.15 oneCCL
4 XELINK 1110

@daisyden daisyden added the module: distributed For distributed feature issue label Apr 11, 2025
@daisyden daisyden added this to the PT2.8 milestone Apr 11, 2025
@zhangxiaoli73
Copy link

@daisyden It seems you have not enabled this test correctly. Please check with developer branch and fix.

@daisyden daisyden added the bug Something isn't working label Apr 14, 2025
@daisyden
Copy link
Contributor Author

daisyden commented May 8, 2025

Fixed and checked into distributed_2.8.

@daisyden daisyden closed this as completed May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: distributed For distributed feature issue
Projects
None yet
Development

No branches or pull requests

2 participants