Mixtral 8x7B support #2011

pierrestock · 2023-12-11T08:59:05Z

Adding support for mistralai/Mixtral-8x7B-v0.1 and mistralai/Mixtral-8x7B-Instruct-v0.1 models as described in our blogpost.

This is joint work between @zhuohan123, @WoosukKwon from the vLLM project and Mistral AI.

It integrates fast sparse mixture of experts kernels from the Megablocks project.

Update main

zhuohan123

LGTM! Thank you for your contribution and the official support of vLLM from Mistral AI!!

WoosukKwon · 2023-12-11T09:19:33Z

vllm/model_executor/models/mixtral.py

+        hidden_states: Optional[torch.Tensor],
+        sampling_metadata: SamplingMetadata,
+    ) -> SamplerOutput:
+        hidden_states = self.norm(hidden_states)


nit: Can we do this in forward?

Yes, I merged the PR asap from my phone to let people use the model. I think one other small thing is to add mixtral to the supported model list. I am AFK now, can you help fix this if possible?

draganjovanovich · 2023-12-11T10:31:33Z

I installed from latest main, installed stk, megablocks, latest flash_attn, transformers etc...
And got following error:

  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 32, in execute_method
    return executor(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/worker/worker.py", line 88, in profile_num_available_blocks
    self.model_runner.profile_run()
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 321, in profile_run
    self.execute_model(seqs, kv_caches)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 279, in execute_model
    hidden_states = self.model(
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 488, in forward
    hidden_states = layer(
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 439, in forward
    r = self.block_sparse_moe(self.ffn_norm(h))
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/vllm/model_executor/models/mixtral.py", line 353, in forward
    x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/mambaforge/lib/python3.10/site-packages/stk/backend/autocast.py", line 28, in decorate_fwd
    return fwd(*args, **kwargs)
TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given

WoosukKwon · 2023-12-11T10:36:27Z

@draganjovanovich Thanks for reporting the error! Please re-install megablocks with pip install git+https://github.com/stanford-futuredata/[email protected].

draganjovanovich · 2023-12-11T10:47:24Z

Np, I tried installing git+https://github.com/stanford-futuredata/[email protected], but it fails in the build step.

/tmp/pip-install-1xstsqs1/grouped-gemm_8825c4917c174628acfa2bd08ed8c9d2/third_party/cutlass/include/cute/atom/copy_traits_sm90_tma.hpp(587): error: ide
ntifier "cuTensorMapEncodeTiled" is undefined                                                                                                                
          CUresult result = cuTensorMapEncodeTiled(

Failed to build grouped_gemm

Now, I created new env, and start from scratch. I will comment if success.

tgale96 · 2023-12-11T13:33:56Z

Hi, it looks like the errors you're getting are from megablocks. Can you share more details on your environment? The latest issue looks like it may be because of the CUDA toolkit version you're using (maybe lets start a separate issue?).

Ededu1984 · 2023-12-11T18:13:55Z

I tried to install Megablocks and I got this error

CalledProcessError: Command 'pip --disable-pip-version-check install git+https://github.com/stanford-futuredata/[email protected]' returned non-zero exit status 1.

tgale96 · 2023-12-11T18:30:29Z

Hi Edson, you should be able to run Mixtral with by just installing MegaBlocks with pip install megablocks now. Could you give that a try?

See related issues:
#2017
#2022

Co-authored-by: Pierre Stock <[email protected]> Co-authored-by: Zhuohan Li <[email protected]>

pierrestock and others added 2 commits December 11, 2023 09:38

Merge pull request #1 from vllm-project/main

cc85068

Update main

Mixtral 8x7B support

f47ae04

SaulLu mentioned this pull request Dec 11, 2023

"/v1/chat/completions" tokenization issue #2012

Closed

zhuohan123 approved these changes Dec 11, 2023

View reviewed changes

zhuohan123 merged commit b5f882c into vllm-project:main Dec 11, 2023

Swiftyos mentioned this pull request Dec 11, 2023

llama : add Mixtral support ggml-org/llama.cpp#4381

Closed

WoosukKwon reviewed Dec 11, 2023

View reviewed changes

WoosukKwon mentioned this pull request Dec 11, 2023

Add Mixture of Experts: Mixtral 8x 7B release #1991

Closed

WoosukKwon mentioned this pull request Dec 11, 2023

TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given #2017

Closed

Dampfinchen mentioned this pull request Dec 11, 2023

llama : add Mixtral support ggml-org/llama.cpp#4406

Merged

3 tasks

0-hero mentioned this pull request Dec 11, 2023

Mixtral - KeyError: 'model.layers.10.block_sparse_moe.experts.0.w1.weight' #2020

Closed

simon-mo mentioned this pull request Dec 11, 2023

Does the Mixtral implementation follow the official code? #2023

Closed

tgale96 mentioned this pull request Dec 11, 2023

Inference code databricks/megablocks#48

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Mixtral 8x7B support (vllm-project#2011)

3e1dec3

Co-authored-by: Pierre Stock <[email protected]> Co-authored-by: Zhuohan Li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mixtral 8x7B support #2011

Mixtral 8x7B support #2011

Uh oh!

pierrestock commented Dec 11, 2023 •

edited

Loading

Uh oh!

zhuohan123 left a comment

Uh oh!

WoosukKwon Dec 11, 2023

Uh oh!

zhuohan123 Dec 11, 2023

Uh oh!

draganjovanovich commented Dec 11, 2023

Uh oh!

WoosukKwon commented Dec 11, 2023

Uh oh!

draganjovanovich commented Dec 11, 2023 •

edited

Loading

Uh oh!

tgale96 commented Dec 11, 2023

Uh oh!

Ededu1984 commented Dec 11, 2023

Uh oh!

tgale96 commented Dec 11, 2023

Uh oh!

Uh oh!

Uh oh!

Mixtral 8x7B support #2011

Mixtral 8x7B support #2011

Uh oh!

Conversation

pierrestock commented Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

zhuohan123 Dec 11, 2023

Choose a reason for hiding this comment

Uh oh!

draganjovanovich commented Dec 11, 2023

Uh oh!

WoosukKwon commented Dec 11, 2023

Uh oh!

draganjovanovich commented Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgale96 commented Dec 11, 2023

Uh oh!

Ededu1984 commented Dec 11, 2023

Uh oh!

tgale96 commented Dec 11, 2023

Uh oh!

Uh oh!

pierrestock commented Dec 11, 2023 •

edited

Loading

draganjovanovich commented Dec 11, 2023 •

edited

Loading