Mixtral-8x7B-v0.1 TP 8 GPUS EDIT: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given

fixed Error:
KeyError: 'model.layers.13.block_sparse_moe.experts.4.w2.weight'
fixed with 'pt' as mentioned in https://github.com/vllm-project/vllm/issues/2020

the new error is
TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given

model:
https://huggingface.co/mistralai/Mixtral-8x7B-v0.1

python: 3.10

gpus:
g5.48xlarge
8 x 24G A10 on AWS

logs:

```
023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: llm = LLM(model=model_name, tensor_parallel_size=num_gpus, dtype=dtype, trust_remote_code=trust_remote_code,
-- | --
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.llm_engine = LLMEngine.from_engine_args(engine_args)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: engine = cls(*engine_configs,
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 107, in __init__
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._init_workers_ray(placement_group)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 194, in _init_workers_ray
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._run_workers(
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._run_workers_in_batch(workers, method, *args, **kwargs))
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: all_outputs = ray.get(all_outputs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return fn(*args, **kwargs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: raise value.as_instanceof_cause()
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: ray.exceptions.RayTaskError(KeyError): #033[36mray::RayWorkerVllm.execute_method()#033[39m (pid=5603, ip=169.254.181.2, actor_id=9f26cd20cd05e16c7c08648d01000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f84a1d667a0>)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return executor(*args, **kwargs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 72, in load_model
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.model_runner.load_model()
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 36, in load_model
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.model = get_model(self.model_config)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 124, in get_model
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: model.load_weights(model_config.model, model_config.download_dir,
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 531, in load_weights
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: param = params_dict[name]
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: KeyError: 'model.layers.13.block_sparse_moe.experts.4.w2.weight'
```

now the problem is with the logs
```
llm = LLM(model=model_name, tensor_parallel_size=num_gpus, dtype=dtype, trust_remote_code=trust_remote_code,
--
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.llm_engine = LLMEngine.from_engine_args(engine_args)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: engine = cls(*engine_configs,
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 112, in __init__
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self._init_cache()
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 208, in _init_cache
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: num_blocks = self._run_workers(
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self._run_workers_in_batch(workers, method, *args, **kwargs))
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: all_outputs = ray.get(all_outputs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return fn(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: raise value.as_instanceof_cause()
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: ray.exceptions.RayTaskError(TypeError): #033[36mray::RayWorkerVllm.execute_method()#033[39m (pid=4660, ip=169.254.181.2, actor_id=f424ccef141bef01e9887cf001000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7ed62ad827a0>)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return executor(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 88, in profile_num_available_blocks
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.model_runner.profile_run()
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 321, in profile_run
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.execute_model(seqs, kv_caches)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 279, in execute_model
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: hidden_states = self.model(
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 488, in forward
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: hidden_states = layer(
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 439, in forward
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: r = self.block_sparse_moe(self.ffn_norm(h))
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 353, in forward
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return super().apply(*args, **kwargs) # type: ignore[misc]
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/stk/backend/autocast.py", line 28, in decorate_fwd
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return fwd(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given
```

requirements.txt:

langdetect
fastapi
uvicorn[standard]
ninja  # For faster builds.
psutil
ray==2.6.3
numpy
huggingface-hub>=0.16.4
wrapt-timeout-decorator
pydantic < 2  # Required for OpenAI server.
scipy
pandas
pyarrow
safetensors
sentencepiece
einops
torch==2.1.0
torchvision==0.16.0
deepspeed>=0.12.3
transformers==4.36.0
accelerate>=0.24.1
peft>=0.6.2
bitsandbytes>=0.41.2.post2
auto_gptq>=0.5.1
datasets==2.15.0
megablocks #mixtral
stanford-stk #mixtral
git+https://github.com/vllm-project/vllm.git@b5f882cc98e2c9c6dde7357dbac2ec0c2c57d8cd



Note:
with TP 4 I can't even run it (ofc OOM problems, the gpu memory for 4 gpus is not enough I guess)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mixtral-8x7B-v0.1 TP 8 GPUS EDIT: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given #2022

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Mixtral-8x7B-v0.1 TP 8 GPUS EDIT: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given #2022

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions