Skip to content

Mixtral-8x7B-v0.1 TP 8 GPUS EDIT: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given #2022

@orellavie1212

Description

@orellavie1212

fixed Error:
KeyError: 'model.layers.13.block_sparse_moe.experts.4.w2.weight'
fixed with 'pt' as mentioned in #2020

the new error is
TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given

model:
https://huggingface.co/mistralai/Mixtral-8x7B-v0.1

python: 3.10

gpus:
g5.48xlarge
8 x 24G A10 on AWS

logs:

023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: llm = LLM(model=model_name, tensor_parallel_size=num_gpus, dtype=dtype, trust_remote_code=trust_remote_code,
-- | --
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.llm_engine = LLMEngine.from_engine_args(engine_args)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: engine = cls(*engine_configs,
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 107, in __init__
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._init_workers_ray(placement_group)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 194, in _init_workers_ray
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._run_workers(
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._run_workers_in_batch(workers, method, *args, **kwargs))
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: all_outputs = ray.get(all_outputs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return fn(*args, **kwargs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: raise value.as_instanceof_cause()
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: ray.exceptions.RayTaskError(KeyError): #033[36mray::RayWorkerVllm.execute_method()#033[39m (pid=5603, ip=169.254.181.2, actor_id=9f26cd20cd05e16c7c08648d01000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f84a1d667a0>)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return executor(*args, **kwargs)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 72, in load_model
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.model_runner.load_model()
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 36, in load_model
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.model = get_model(self.model_config)
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 124, in get_model
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: model.load_weights(model_config.model, model_config.download_dir,
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 531, in load_weights
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: param = params_dict[name]
  | 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: KeyError: 'model.layers.13.block_sparse_moe.experts.4.w2.weight'

now the problem is with the logs

llm = LLM(model=model_name, tensor_parallel_size=num_gpus, dtype=dtype, trust_remote_code=trust_remote_code,
--
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.llm_engine = LLMEngine.from_engine_args(engine_args)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: engine = cls(*engine_configs,
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 112, in __init__
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self._init_cache()
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 208, in _init_cache
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: num_blocks = self._run_workers(
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self._run_workers_in_batch(workers, method, *args, **kwargs))
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: all_outputs = ray.get(all_outputs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return fn(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: raise value.as_instanceof_cause()
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: ray.exceptions.RayTaskError(TypeError): #033[36mray::RayWorkerVllm.execute_method()#033[39m (pid=4660, ip=169.254.181.2, actor_id=f424ccef141bef01e9887cf001000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7ed62ad827a0>)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return executor(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 88, in profile_num_available_blocks
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.model_runner.profile_run()
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 321, in profile_run
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.execute_model(seqs, kv_caches)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 279, in execute_model
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: hidden_states = self.model(
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 488, in forward
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: hidden_states = layer(
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 439, in forward
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: r = self.block_sparse_moe(self.ffn_norm(h))
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 353, in forward
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return super().apply(*args, **kwargs) # type: ignore[misc]
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/stk/backend/autocast.py", line 28, in decorate_fwd
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return fwd(*args, **kwargs)
  | 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given

requirements.txt:

langdetect
fastapi
uvicorn[standard]
ninja # For faster builds.
psutil
ray==2.6.3
numpy
huggingface-hub>=0.16.4
wrapt-timeout-decorator
pydantic < 2 # Required for OpenAI server.
scipy
pandas
pyarrow
safetensors
sentencepiece
einops
torch==2.1.0
torchvision==0.16.0
deepspeed>=0.12.3
transformers==4.36.0
accelerate>=0.24.1
peft>=0.6.2
bitsandbytes>=0.41.2.post2
auto_gptq>=0.5.1
datasets==2.15.0
megablocks #mixtral
stanford-stk #mixtral
git+https://github.com/vllm-project/vllm.git@b5f882cc98e2c9c6dde7357dbac2ec0c2c57d8cd

Note:
with TP 4 I can't even run it (ofc OOM problems, the gpu memory for 4 gpus is not enough I guess)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions