musa: workaround for Guilty Lockup in cleaning src0 in #10032 #10042

yeahdongcn · 2024-10-25T08:16:41Z

We’re encountering an MTGPU Guilty Lockup issue during the model warm-up stage after merging #10032. This PR reverts this change for MUSA only.

I've raised an internal issue and will remove this workaround once it has been resolved.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn · 2024-10-28T01:19:08Z

Hi @JohannesGaessler

Could you please review this PR? I know the code looks ugly, but it works for now.

JohannesGaessler

I assume you're aware that this results in broken K cache quantization.

yeahdongcn · 2024-10-28T08:56:14Z

I assume you're aware that this results in broken K cache quantization.

Yes. After reviewing all contexts, this approach appears to be the only viable solution to avoid a crash.

Thanks for approving this!

Signed-off-by: Xiaodong Ye <[email protected]>

* musa: Update MUSA SDK version to rc3.1.1 Signed-off-by: Xiaodong Ye <[email protected]> * musa: Remove workaround in PR #10042 Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>

* musa: Update MUSA SDK version to rc3.1.1 Signed-off-by: Xiaodong Ye <[email protected]> * musa: Remove workaround in PR ggml-org#10042 Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn marked this pull request as ready for review October 25, 2024 08:16

musa: workaround for Guilty Lockup in cleaning src0

862b959

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn force-pushed the mtgpu_guilty_lockup branch from 1001967 to 862b959 Compare October 25, 2024 08:19

JohannesGaessler approved these changes Oct 28, 2024

View reviewed changes

JohannesGaessler merged commit 524afee into ggml-org:master Oct 28, 2024
53 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

musa: workaround for Guilty Lockup in cleaning src0 (ggml-org#10042)

cf73610

Signed-off-by: Xiaodong Ye <[email protected]>

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

musa: workaround for Guilty Lockup in cleaning src0 (ggml-org#10042)

96ed96e

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit to makllama/llama.cpp that referenced this pull request Feb 12, 2025

musa: Remove workaround in PR ggml-org#10042

100d240

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn mentioned this pull request Feb 12, 2025

musa: bump MUSA SDK version to rc3.1.1 #11822

Merged

yeahdongcn added a commit to makllama/llama.cpp that referenced this pull request Feb 13, 2025

musa: Remove workaround in PR ggml-org#10042

4963f12

Signed-off-by: Xiaodong Ye <[email protected]>

yeahdongcn added a commit to makllama/llama.cpp that referenced this pull request Feb 13, 2025

musa: Remove workaround in PR ggml-org#10042

0e3c589

Signed-off-by: Xiaodong Ye <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

musa: workaround for Guilty Lockup in cleaning src0 in #10032 #10042

musa: workaround for Guilty Lockup in cleaning src0 in #10032 #10042

Uh oh!

yeahdongcn commented Oct 25, 2024

Uh oh!

yeahdongcn commented Oct 28, 2024

Uh oh!

JohannesGaessler left a comment

Uh oh!

yeahdongcn commented Oct 28, 2024

Uh oh!

Uh oh!

Uh oh!

musa: workaround for Guilty Lockup in cleaning src0 in #10032 #10042

musa: workaround for Guilty Lockup in cleaning src0 in #10032 #10042

Uh oh!

Conversation

yeahdongcn commented Oct 25, 2024

Uh oh!

yeahdongcn commented Oct 28, 2024

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

yeahdongcn commented Oct 28, 2024

Uh oh!

Uh oh!

Uh oh!