Skip to content

Vulkan: Fix mmq int dot float cache size #12722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 2, 2025
Merged

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Apr 2, 2025

I don't know how I (and everyone else) missed this, considering it means models are completely incoherent when using the new int dot shaders, but here's the fix. The cache buffer for the quant dm values was too small and overflowed, leading to NaN results.

@0cc4m 0cc4m requested a review from jeffbolznv April 2, 2025 15:29
Copy link
Collaborator

@jeffbolznv jeffbolznv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I didn't try running it.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 2, 2025
@0cc4m 0cc4m merged commit 92e3006 into master Apr 2, 2025
44 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-mmq-dp4a-fix branch April 2, 2025 17:12
@0cc4m
Copy link
Collaborator Author

0cc4m commented Apr 2, 2025

For some reason this change removed a large chunk of the performance increase of the shader, I'm not sure how that happened. It added 8 bytes of register use, maybe that crossed some occupancy limit, but that is very weird. I hope it can be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants