-
-
Notifications
You must be signed in to change notification settings - Fork 314
Support for architecture DeepseekV2ForCausalLM #512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is the same issue as #443. The implementation uses shared experts, which would have to be added to the implementation. Since I don't have the hardware to actually run the model (quantized or otherwise), I'd be developing remotely, which is slow and awkward compared to local development where I have a debugger, profiler and all these other tools available. Not to mention, it would be expensive for the kind of server that I would need. Something like $100/day, with some big changes (i.e. quite a few days) needed due to the difficulty of calibrating a model with 162 experts per layer. In the end I don't think there are that many people who could even run the model anyway, or afford to host it, so it seems like a waste of my time. Especially as it's the kind of model that really screams for CPU inference (using llama.cpp or whatever). You could build a fairly cheap CPU server with 256 GB of RAM and probably get quite reasonable speeds that way, since it's a sparse model. |
Thanks a lot to take the time for the explanaition. How about the "Lite" version? https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct Is a 15B parameter, same architecture |
Oh, I didn't know there was a smaller version. That does look more realistic. It would still need a lot of new code, so I'm not sure when exactly I can get to it. But definitely doable. |
There are two small ones, one's aforementioned instruct, the other is base - https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base |
@turboderp - Regarding hardware, do you know of Matt Berman? He vlogs on AI - https://www.youtube.com/@matthew_berman - I emailed with him, it sounds like he may know folks who can help you out, like cloud providers, and some hardware companies, and would love to help. His contact info is there on the YouTube page, feel free to reach out. |
Wellp guess I have to stick with some slow-mo gguf |
FYI - The lite instruct model is amazingly good, easily the best coding model I've used, better than codestral and much faster (even on GGUF) but would really benefit from exllamav2's long-context and KV efficiency. I have 1x 3090 (24GB) and 2x A4000 (2x16GB) if you need me to test anything / run some builds feel free to @ me or contact via profile. |
I have 4x3090, EPYC 48core and 512GB ram system and could provide access too if needed. |
Ultimately it's not hardware I need, it's time. I have no doubt that it's the best-model-ever, but so were Yi, Orion, Gemma, Starcoder, GemMoE, Cohere, DBRX, Phi... Granite? All the time I spent on those architectures may or may not have been worth it, but it definitely took time away from other improvements I'd like to make, and there are core aspects of the library that really need attention as well. And with every new architecture I implement just in time for everyone to become disenchanted with it, I also add technical debt. So yeah, I'm hesitant. 🤷 Maybe. Just not right now, unless someone wants to contribute some code. |
Totally understandable. Thank you for the response. There will be other great models in the future :) |
@turboderp Sorry to bring this up again, but I was wondering if its even remotely possible to merge Ktranformers with exllamav2. That way you could run larger models, but benefit from exllama's optimizations. Ktranformers lets you store some of the experts on dram. I'm not sure if that is compatible with exllama. Not sure if possible or even provides any value, just a thought. Also not suggesting you do it alone, if you can at least consider if it is conceptually feasible and worth doing, I would be willing to put the time into attempting to write the code. |
Anyone know if Deepseek R1 also uses shared experts and has the same issue for Exllama support as V2? |
Hi!
When trying to quantize the new Deepseel Coder V2 https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct I got the following error:
Would it be possible to add support?
The text was updated successfully, but these errors were encountered: