Skip to content

[Usage]:openEuler22.03 aarch64系统,Atlas300I-Duo(310P3)单卡,使用v0.10.0rc1的时候,openEuler镜像和Ubuntu镜像 在容器内启动报错 #2720

@1448163534

Description

@1448163534

Your current environment

openEuler22.03 aarch64系统,Atlas300I-Duo(310P3)单卡,使用v0.10.0rc1的时候,openEuler镜像和Ubuntu镜像

在容器内启动:vllm serve Qwen/Qwen2.5-0.5B-Instruct &
报错:
INFO 09-03 07:21:08 [default_loader.py:262] Loading weights took 0.22 seconds
INFO 09-03 07:21:09 [model_runner_v1.py:2114] Loading model weights took 0.9278 GB
INFO 09-03 07:21:15 [backends.py:530] Using cache directory: /root/.cache/vllm/torch_compile_cache/fdf6355244/rank_0_0/backbone for vLLM's torch.compile
INFO 09-03 07:21:15 [backends.py:541] Dynamo bytecode transform time: 5.92 s
INFO 09-03 07:21:18 [backends.py:215] Compiling a graph for dynamic shape takes 1.81 s
.[rank0]:[E903 07:21:23.263967696 compiler_depend.ts:429] RopeOperation setup failed!
Exception raised from OperationSetup at build/third_party/op-plugin/op_plugin/CMakeFiles/op_plugin_atb.dir/compiler_depend.ts:151 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xd4 (0xffff8c4e3ea4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe4 (0xffff8c483e44 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: atb::OperationSetup(atb::VariantPack, atb::Operation*, atb::Context*) + 0x254 (0xfffe92f2ac24 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #3: + 0x8b7bc (0xfffe92f2b7bc in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #4: + 0x22887d4 (0xfffeab1787d4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: + 0x8fb170 (0xfffea97eb170 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: + 0x8fd504 (0xfffea97ed504 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: + 0x8f9e2c (0xfffea97e9e2c in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #8: + 0xd31fc (0xffff8c2f31fc in /lib/aarch64-linux-gnu/libstdc++.so.6)
frame #9: + 0x7d5b8 (0xffff985bd5b8 in /lib/aarch64-linux-gnu/libc.so.6)
frame #10: + 0xe5edc (0xffff98625edc in /lib/aarch64-linux-gnu/libc.so.6)

Traceback (most recent call last):
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/fx/graph_module.py", line 393, in call
return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<eval_with_key>.9", line 9, in forward
linear = torch.C.nn.linear(view_1, l_self_modules_layers_modules_3_modules_self_attn_modules_o_proj_parameters_weight, None); view_1 = l_self_modules_layers_modules_3_modules_self_attn_modules_o_proj_parameters_weight = None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is RopeOperation.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2025-09-03-07:21:23 (PID:5146, Device:0, RankID:-1) ERR00100 PTA call acl api failed.

Call using an FX-traced Module, line 9 of the traced Module's generated forward function:
view_1 = output_7.view(-1, 896); output_7 = None
linear = torch.C.nn.linear(view_1, l_self_modules_layers_modules_3_modules_self_attn_modules_o_proj_parameters_weight, None); view_1 = l_self_modules_layers_modules_3_modules_self_attn_modules_o_proj_parameters_weight = None

    npu_add_rms_norm = torch.ops.npu.npu_add_rms_norm(linear, residual_6, l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_, 1e-06);  linear = residual_6 = l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_ = None

    getitem_1 = npu_add_rms_norm[0]

ERROR 09-03 07:21:23 [core.py:632] EngineCore failed to start.
ERROR 09-03 07:21:23 [core.py:632] Traceback (most recent call last):






### How would you like to use vllm on ascend

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions