-
Notifications
You must be signed in to change notification settings - Fork 12k
Request Support for Mistral-8x22B #6580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 |
It is not a Mistral Medium, it's a new model. Mistral Medium has different context length, etc. and Mistral Medium was leaked earlier. |
Did someone download the torrent ? Is it an HF model with modeling code or only weights inside without the architecture ? |
Okay, I'll change the title. |
Don't know if usefull but it's already up on huggingface. https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1 (You'll find many uploads). |
It is useful, thanks, I did not notice they changed the org. Let's go then |
It just works. =D https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF/tree/main |
Confirmed the IQ3_XS runs without changes. |
Is it really the exact same architecture though? Perhaps there are some subtle optimizations. |
Unfortunately, convert fails with Mixtral 8x22b instruct: ValueError: Vocab size mismatch (model has 32768, but Mixtral-8x22B-Instruct-v0.1/tokenizer.json has 32769). This off-by-little (sometimes 1, sometimes a few more) is actually a very common problem with older models that I quantize, but because they are older, I haven't bothered reporting it yet. |
That is because of a bug in the original mistral ai upload. Open the file tokenizer.json and change "TOOL_RESULT" into "TOOL_RESULTS" and the conversion should work. https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/6 |
@tholin: indeed, thanks a lot! |
@tholin: while convert.py succeeds, it results in a 11GB output file, so something still doesn't work. (b2699) Update: no longer happens with b2715 |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Feature Description
Support for Mixtral-8x22B
Mistral AI has just opened up a large model, Mistral 8x22B, with magnetic links again, with a model file size of 281.24 GB.
According to the name of the model, Mistral 8x22B is the Super Bowl version of "mixtral-8x7b", which was opened up last year, and the parameter size has more than tripled-it is made up of eight expert networks with 22 billion parameters (8 x 22B).
magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%http://2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce
Motivation
It should be a good model.
The text was updated successfully, but these errors were encountered: