Skip to content

[Usage]: Support for GLM-4.1V-9B-Thinging #3706

@jinshurui618

Description

@jinshurui618

Your current environment

使用NPU跑GLM-4.1V-9B-Thinging

A2能正常启动该模型

A3机器使用镜像vllm-ascend:v0.10.2rc1-a3-openeuler启动,报rope算子报错:

rs: 40) with 43 sizes
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:01, 2.04it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.69it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:01<00:00, 1.72it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.81it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00, 1.79it/s]
(EngineCore_DPO pid=7998)
(EngineCore_DPO pid=7998) INFO 10-23 17:02:10 [default_loader.py:268] Loading weights took 2.36 seconds
(EngineCore_DPO pid=7998) INFO 10-23 17:02:11 [model_runner_v1.py:2373] Loading model weights took 19.2315 GB
(EngineCore_DPO pid=7998) INFO 10-23 17:02:17 [backends.py:539] Using cache directory: /root/.cache/vllm/torch_compile_cache/e40cf8839e/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DPO pid=7998) INFO 10-23 17:02:17 [backends.py:550] Dynamo bytecode transform time: 5.88 s
(EngineCore_DPO pid=7998) INFO 10-23 17:02:19 [backends.py:215] Compiling a graph for dynamic shape takes 1.93 s
[rank0]:[E1023 17:02:23.062203480 compiler_depend.ts:429] RopeOperation setup failed!
Exception raised from OperationSetup at build/third_party/op-plugin/op_plugin/CMakeFiles/op_plugin_atb.dir/compiler_depend.ts:151 (most recent call first):
frame #0: c10::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xd4 (0xfffffa77e3ea4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe4 (0xfffffa7783e44 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: atb::OperationSetup(atb::VariantPack, atb::Operation*, atb::Context*) + 0x254 (0xffffded16ac24 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #3: + 0x8b7bc (0xffffded16b7bc in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libop_plugin_atb.so)
frame #4: + 0x22887d4 (0xffffdfe1987d4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: + 0x8fb170 (0xffffdfc80b170 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame

How would you like to use vllm on ascend

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions