-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Vulkan IQ4_NL Support #8613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan IQ4_NL Support #8613
Conversation
0cc4m
commented
Jul 21, 2024
- I have read the contributing guidelines
- Self-reported review complexity:
- Low
- Medium
- High
- Add IQ4_NL support to Vulkan to resolve the issue with iq4_nl fallbacks in k-quants (llama : change fallback type IQ4_NL -> Q4_0 #8489 and Bug: QWEN2 quantization GGML_ASSERT #7805 (comment)).
- Increase the mat_mul_id matrix multiplication row_ids buffer size to allow larger MoEs (like DeepSeek-Coder-V2-Lite which had the iq4_nl issue) to work with Vulkan.
- Fix Vulkan test code that was broken after the last rework
How much effort is needed to support Iq4xs in additional to iq4nl? |
Can you elaborate what specific cases that would enable? |
IQ4XS is common used among community due to its small size and better PPL than Q4KM. It s a sweet spot in GGUF quant series. |
It's quite a bit of effort, but at least it's easier than the other i-quants. I can't do it now, but should be able to at some point in the not-too-distant future. |
Where there's a will, there's a way =;-) |
While testing this I got tests failures with fp16/fp32 mul mat, but it also happens on master.
|
* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
@@ -3431,7 +3451,7 @@ static void ggml_vk_mul_mat_id_q_f16(ggml_backend_vk_context * ctx, vk_context * | |||
|
|||
const uint64_t nei0 = ids->ne[0]; | |||
const uint64_t nei1 = ids->ne[1]; | |||
GGML_ASSERT(nei0 * nei1 <= 2048); | |||
GGML_ASSERT(nei0 * nei1 <= 3072); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @0cc4m can I check, what exactly is this assert testing for?
ref: LostRuins#1337
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the maximum number of row_ids the mat_mul_id shader can handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deepseek 16B MoE (6/64 experts)
nei0 = 6
nei1 = 1024
nei0 x nei1 = 6144
* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support