Skip to content

Support for architecture DeepseekV2ForCausalLM #512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RodriMora opened this issue Jun 17, 2024 · 12 comments
Open

Support for architecture DeepseekV2ForCausalLM #512

RodriMora opened this issue Jun 17, 2024 · 12 comments

Comments

@RodriMora
Copy link
Contributor

Hi!

When trying to quantize the new Deepseel Coder V2 https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct I got the following error:

 !! Warning, unknown architecture: DeepseekV2ForCausalLM
 !! Loading as LlamaForCausalLM

Would it be possible to add support?

@turboderp
Copy link
Member

This is the same issue as #443.

The implementation uses shared experts, which would have to be added to the implementation. Since I don't have the hardware to actually run the model (quantized or otherwise), I'd be developing remotely, which is slow and awkward compared to local development where I have a debugger, profiler and all these other tools available. Not to mention, it would be expensive for the kind of server that I would need. Something like $100/day, with some big changes (i.e. quite a few days) needed due to the difficulty of calibrating a model with 162 experts per layer.

In the end I don't think there are that many people who could even run the model anyway, or afford to host it, so it seems like a waste of my time. Especially as it's the kind of model that really screams for CPU inference (using llama.cpp or whatever). You could build a fairly cheap CPU server with 256 GB of RAM and probably get quite reasonable speeds that way, since it's a sparse model.

@RodriMora
Copy link
Contributor Author

Thanks a lot to take the time for the explanaition.

How about the "Lite" version?

https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

Is a 15B parameter, same architecture

@turboderp
Copy link
Member

Oh, I didn't know there was a smaller version. That does look more realistic. It would still need a lot of new code, so I'm not sure when exactly I can get to it. But definitely doable.

@nktice
Copy link

nktice commented Jun 19, 2024

There are two small ones, one's aforementioned instruct, the other is base - https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Base

@nktice
Copy link

nktice commented Jun 21, 2024

@turboderp - Regarding hardware, do you know of Matt Berman? He vlogs on AI - https://www.youtube.com/@matthew_berman - I emailed with him, it sounds like he may know folks who can help you out, like cloud providers, and some hardware companies, and would love to help. His contact info is there on the YouTube page, feel free to reach out.

@matbee-eth
Copy link

Wellp guess I have to stick with some slow-mo gguf

@sammcj
Copy link

sammcj commented Jun 25, 2024

FYI - The lite instruct model is amazingly good, easily the best coding model I've used, better than codestral and much faster (even on GGUF) but would really benefit from exllamav2's long-context and KV efficiency.

I have 1x 3090 (24GB) and 2x A4000 (2x16GB) if you need me to test anything / run some builds feel free to @ me or contact via profile.

@RodriMora
Copy link
Contributor Author

I have 4x3090, EPYC 48core and 512GB ram system and could provide access too if needed.

@turboderp
Copy link
Member

Ultimately it's not hardware I need, it's time. I have no doubt that it's the best-model-ever, but so were Yi, Orion, Gemma, Starcoder, GemMoE, Cohere, DBRX, Phi... Granite? All the time I spent on those architectures may or may not have been worth it, but it definitely took time away from other improvements I'd like to make, and there are core aspects of the library that really need attention as well. And with every new architecture I implement just in time for everyone to become disenchanted with it, I also add technical debt.

So yeah, I'm hesitant. 🤷 Maybe. Just not right now, unless someone wants to contribute some code.

@sammcj
Copy link

sammcj commented Jun 25, 2024

Totally understandable. Thank you for the response. There will be other great models in the future :)

@16x3b
Copy link

16x3b commented Jan 14, 2025

@turboderp Sorry to bring this up again, but I was wondering if its even remotely possible to merge Ktranformers with exllamav2. That way you could run larger models, but benefit from exllama's optimizations. Ktranformers lets you store some of the experts on dram. I'm not sure if that is compatible with exllama.

Not sure if possible or even provides any value, just a thought. Also not suggesting you do it alone, if you can at least consider if it is conceptually feasible and worth doing, I would be willing to put the time into attempting to write the code.

@grimulkan
Copy link

Anyone know if Deepseek R1 also uses shared experts and has the same issue for Exllama support as V2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants