-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
fixed Error:
KeyError: 'model.layers.13.block_sparse_moe.experts.4.w2.weight'
fixed with 'pt' as mentioned in #2020
the new error is
TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given
model:
https://huggingface.co/mistralai/Mixtral-8x7B-v0.1
python: 3.10
gpus:
g5.48xlarge
8 x 24G A10 on AWS
logs:
023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: llm = LLM(model=model_name, tensor_parallel_size=num_gpus, dtype=dtype, trust_remote_code=trust_remote_code,
-- | --
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.llm_engine = LLMEngine.from_engine_args(engine_args)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: engine = cls(*engine_configs,
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 107, in __init__
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._init_workers_ray(placement_group)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 194, in _init_workers_ray
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._run_workers(
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self._run_workers_in_batch(workers, method, *args, **kwargs))
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: all_outputs = ray.get(all_outputs)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return fn(*args, **kwargs)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return func(*args, **kwargs)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: raise value.as_instanceof_cause()
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: ray.exceptions.RayTaskError(KeyError): #033[36mray::RayWorkerVllm.execute_method()#033[39m (pid=5603, ip=169.254.181.2, actor_id=9f26cd20cd05e16c7c08648d01000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f84a1d667a0>)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: return executor(*args, **kwargs)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 72, in load_model
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.model_runner.load_model()
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 36, in load_model
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: self.model = get_model(self.model_config)
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/model_loader.py", line 124, in get_model
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: model.load_weights(model_config.model, model_config.download_dir,
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 531, in load_weights
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: param = params_dict[name]
| 2023-12-11T17:25:22.531+02:00 | [WARN ] PyProcess - W-1511-model-stderr: KeyError: 'model.layers.13.block_sparse_moe.experts.4.w2.weight'
now the problem is with the logs
llm = LLM(model=model_name, tensor_parallel_size=num_gpus, dtype=dtype, trust_remote_code=trust_remote_code,
--
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/llm.py", line 93, in __init__
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.llm_engine = LLMEngine.from_engine_args(engine_args)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 246, in from_engine_args
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: engine = cls(*engine_configs,
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 112, in __init__
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self._init_cache()
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 208, in _init_cache
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: num_blocks = self._run_workers(
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 750, in _run_workers
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self._run_workers_in_batch(workers, method, *args, **kwargs))
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 727, in _run_workers_in_batch
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: all_outputs = ray.get(all_outputs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return fn(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2524, in get
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: raise value.as_instanceof_cause()
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: ray.exceptions.RayTaskError(TypeError): #033[36mray::RayWorkerVllm.execute_method()#033[39m (pid=4660, ip=169.254.181.2, actor_id=f424ccef141bef01e9887cf001000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7ed62ad827a0>)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/ray_utils.py", line 32, in execute_method
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return executor(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 88, in profile_num_available_blocks
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.model_runner.profile_run()
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 321, in profile_run
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: self.execute_model(seqs, kv_caches)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 279, in execute_model
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: hidden_states = self.model(
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 488, in forward
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: hidden_states = layer(
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 439, in forward
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: r = self.block_sparse_moe(self.ffn_norm(h))
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return self._call_impl(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return forward_call(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return func(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 353, in forward
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: x = ops.padded_gather(x, indices, bin_ids, bins, padded_bins,
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return super().apply(*args, **kwargs) # type: ignore[misc]
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: File "/usr/local/lib/python3.10/dist-packages/stk/backend/autocast.py", line 28, in decorate_fwd
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: return fwd(*args, **kwargs)
| 2023-12-11T18:13:26.782+02:00 | [WARN ] PyProcess - W-575-model-stderr: TypeError: PaddedGatherOp.forward() takes 6 positional arguments but 7 were given
requirements.txt:
langdetect
fastapi
uvicorn[standard]
ninja # For faster builds.
psutil
ray==2.6.3
numpy
huggingface-hub>=0.16.4
wrapt-timeout-decorator
pydantic < 2 # Required for OpenAI server.
scipy
pandas
pyarrow
safetensors
sentencepiece
einops
torch==2.1.0
torchvision==0.16.0
deepspeed>=0.12.3
transformers==4.36.0
accelerate>=0.24.1
peft>=0.6.2
bitsandbytes>=0.41.2.post2
auto_gptq>=0.5.1
datasets==2.15.0
megablocks #mixtral
stanford-stk #mixtral
git+https://github.com/vllm-project/vllm.git@b5f882cc98e2c9c6dde7357dbac2ec0c2c57d8cd
Note:
with TP 4 I can't even run it (ofc OOM problems, the gpu memory for 4 gpus is not enough I guess)