-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Description
Mistral AI released their new model called Mixtral
which is an MoE architecture based on MegaBlocks. It includes 8 experts with the size being 7 billion parameters each.
Here is the model configuration:
- dim: 4096
- n_layers: 32
- head_dim: 128
- hidden_dim: 14336
- n_heads: 32
- n_kv_heads: 8
- norm_eps: 1e-05
- vocab_size: 32000
- moe:
- num_experts_per_tok: 2
- num_experts: 8
Weights: https://twitter.com/MistralAI/status/1733150512395038967
Paper: https://arxiv.org/pdf/2211.15841.pdf
Code: https://github.com/stanford-futuredata/megablocks
CC: @WoosukKwon @zhuohan123 for visibility.
tjtanaa, jqueguiner, wangcx18 and yiakwy-xpu-ml-framework-teamSuperBruceJia
Metadata
Metadata
Assignees
Labels
No labels