Commit 65166d8
[MPS] Add regression test for sync deadlock (pytorch#141296)
See pytorch#140725 (comment)
Running `torch.mps.synchronize()` after metal kernel resulted in infinite wait inside `[_MTLCommandBuffer waitUntilCompleted]`
```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001aa919084 Metal`pthread_cond_wait + 12
frame #1: 0x00000001aa78b1b4 Metal`-[_MTLCommandBuffer waitUntilCompleted] + 84
frame #2: 0x00000001032bf358 libtorch_python.dylib`torch::mps::MPSModule_deviceSynchronize(_object*, _object*) + 40
frame #3: 0x0000000100e94c20 Python`cfunction_vectorcall_NOARGS + 100
frame #4: 0x0000000100e389b8 Python`PyObject_Vectorcall + 92
frame #5: 0x0000000100f61e38 Python`_PyEval_EvalFrameDefault + 19040
frame #6: 0x0000000100f5d180 Python`PyEval_EvalCode + 200
frame #7: 0x0000000100fcd1a4 Python`run_eval_code_obj + 104
frame #8: 0x0000000100fccbe4 Python`run_mod + 168
frame #9: 0x0000000100fcb518 Python`pyrun_file + 164
frame #10: 0x0000000100fca854 Python`_PyRun_SimpleFileObject + 256
frame #11: 0x0000000100fca4e8 Python`_PyRun_AnyFileObject + 80
frame #12: 0x0000000100ff2028 Python`pymain_run_file_obj + 164
frame #13: 0x0000000100ff1ce4 Python`pymain_run_file + 72
frame #14: 0x0000000100ff0f74 Python`Py_RunMain + 988
frame #15: 0x0000000100ff1564 Python`pymain_main + 304
frame #16: 0x0000000100ff1604 Python`Py_BytesMain + 40
frame #17: 0x000000019f630274 dyld`start + 2840
```
Pull Request resolved: pytorch#141296
Approved by: https://github.com/huydhn1 parent 25c0b91 commit 65166d8
1 file changed
+8
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8385 | 8385 | | |
8386 | 8386 | | |
8387 | 8387 | | |
| 8388 | + | |
| 8389 | + | |
| 8390 | + | |
| 8391 | + | |
| 8392 | + | |
| 8393 | + | |
| 8394 | + | |
| 8395 | + | |
8388 | 8396 | | |
8389 | 8397 | | |
8390 | 8398 | | |
| |||
0 commit comments