-
Notifications
You must be signed in to change notification settings - Fork 533
Description
Your current environment
使用NPU跑GLM-4.1V-9B-Thinging
A2能正常启动该模型
A3机器使用镜像vllm-ascend:v0.10.2rc1-a3-openeuler启动,报rope算子报错:
rs: 40) with 43 sizes
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:01, 2.04it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.69it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:01<00:00, 1.72it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.81it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.79it/s]
(EngineCore_DPO pid=7998)
(EngineCore_DPO pid=7998) INFO 10-23 17:02:10 [default_loader.py:268] Loading weights took 2.36 seconds
(EngineCore_DPO pid=7998) INFO 10-23 17:02:11 [model_runner_v1.py:2373] Loading model weights took 19.2315 GB
(EngineCore_DPO pid=7998) INFO 10-23 17:02:17 [backends.py:539] Using cache directory: /root/.cache/vllm/torch_compile_cache/e40cf8839e/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DPO pid=7998) INFO 10-23 17:02:17 [backends.py:550] Dynamo bytecode transform time: 5.88 s
(EngineCore_DPO pid=7998) INFO 10-23 17:02:19 [backends.py:215] Compiling a graph for dynamic shape takes 1.93 s
[rank0]:[E1023 17:02:23.062203480 compiler_depend.ts:429] RopeOperation setup failed!
Exception raised from OperationSetup at build/third_party/op-plugin/op_plugin/CMakeFiles/op_plugin_atb.dir/compiler_depend.ts:151 (most recent call first):
frame #0: c10::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xd4 (0xfffffa77e3ea4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe4 (0xfffffa7783e44 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: atb::OperationSetup(atb::VariantPack, atb::Operation*, atb::Context*) + 0x254 (0xffffded16ac24 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #3: + 0x8b7bc (0xffffded16b7bc in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #4: + 0x22887d4 (0xffffdfe1987d4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: + 0x8fb170 (0xffffdfc80b170 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame
How would you like to use vllm on ascend
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.