Skip to content

Request Support for Mistral-8x22B #6580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rankaiyx opened this issue Apr 10, 2024 · 16 comments
Closed

Request Support for Mistral-8x22B #6580

rankaiyx opened this issue Apr 10, 2024 · 16 comments
Labels
enhancement New feature or request stale

Comments

@rankaiyx
Copy link
Contributor

Feature Description

Support for Mixtral-8x22B

Mistral AI has just opened up a large model, Mistral 8x22B, with magnetic links again, with a model file size of 281.24 GB.

According to the name of the model, Mistral 8x22B is the Super Bowl version of "mixtral-8x7b", which was opened up last year, and the parameter size has more than tripled-it is made up of eight expert networks with 22 billion parameters (8 x 22B).

magnet:?xt=urn:btih:9238b09245d0d8cd915be09927769d5f7584c1c9&dn=mixtral-8x22b&tr=udp%3A%2F%http://2Fopen.demonii.com%3A1337%2Fannounce&tr=http%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce

Motivation

It should be a good model.

@rankaiyx rankaiyx added the enhancement New feature or request label Apr 10, 2024
@LiuChaoXD
Copy link

+1

@anunknowperson
Copy link

It is not a Mistral Medium, it's a new model. Mistral Medium has different context length, etc. and Mistral Medium was leaked earlier.
They said it's a brand new model.

@phymbert
Copy link
Collaborator

Did someone download the torrent ? Is it an HF model with modeling code or only weights inside without the architecture ?

@rankaiyx
Copy link
Contributor Author

It is not a Mistral Medium, it's a new model. Mistral Medium has different context length, etc. and Mistral Medium was leaked earlier. They said it's a brand new model.

Okay, I'll change the title.

@simsi-andy
Copy link

simsi-andy commented Apr 10, 2024

@phymbert

Don't know if usefull but it's already up on huggingface. https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1

(You'll find many uploads).

@rankaiyx rankaiyx changed the title Request Support for Mistral-Medium:8x22B Request Support for Mistral-8x22B Apr 10, 2024
@phymbert
Copy link
Collaborator

Don't know if usefull but it's already up on huggingface. https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1

It is useful, thanks, I did not notice they changed the org. Let's go then

@simsi-andy
Copy link

It just works. =D

https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF/tree/main

@digiwombat
Copy link
Contributor

Confirmed the IQ3_XS runs without changes.

@Dampfinchen
Copy link

Dampfinchen commented Apr 10, 2024

Is it really the exact same architecture though? Perhaps there are some subtle optimizations.

@phymbert
Copy link
Collaborator

@schmorp
Copy link

schmorp commented Apr 18, 2024

Unfortunately, convert fails with Mixtral 8x22b instruct:

ValueError: Vocab size mismatch (model has 32768, but Mixtral-8x22B-Instruct-v0.1/tokenizer.json has 32769).

This off-by-little (sometimes 1, sometimes a few more) is actually a very common problem with older models that I quantize, but because they are older, I haven't bothered reporting it yet.

@stefanvarunix
Copy link

#6740

@tholin
Copy link

tholin commented Apr 19, 2024

Unfortunately, convert fails with Mixtral 8x22b instruct:

ValueError: Vocab size mismatch (model has 32768, but Mixtral-8x22B-Instruct-v0.1/tokenizer.json has 32769).

This off-by-little (sometimes 1, sometimes a few more) is actually a very common problem with older models that I quantize, but because they are older, I haven't bothered reporting it yet.

That is because of a bug in the original mistral ai upload. Open the file tokenizer.json and change "TOOL_RESULT" into "TOOL_RESULTS" and the conversion should work.

https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1/discussions/6

@schmorp
Copy link

schmorp commented Apr 20, 2024

@tholin: indeed, thanks a lot!

@schmorp
Copy link

schmorp commented Apr 20, 2024

@tholin: while convert.py succeeds, it results in a 11GB output file, so something still doesn't work. (b2699)

Update: no longer happens with b2715

@github-actions github-actions bot added the stale label May 24, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

10 participants