Skip to content

Misc. bug: Server crash with use of lora on CPU #12587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amakropoulos opened this issue Mar 26, 2025 · 0 comments · Fixed by #12593
Closed

Misc. bug: Server crash with use of lora on CPU #12587

amakropoulos opened this issue Mar 26, 2025 · 0 comments · Fixed by #12593

Comments

@amakropoulos
Copy link
Contributor

Name and Version

version: 4960 (fd7855f)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -m qwen2-0_5b-instruct-q4_k_m.gguf -c 8192 -b 512 -np 1 --lora Qwen2-0.5B-Instruct-ru-lora.gguf

Problem description & steps to reproduce

The server crashes when using a lora in the latest master with segmentation fault when running on CPU (AVX2).

Model: qwen2-0_5b-instruct-q4_k_m.gguf
Lora: Qwen2-0.5B-Instruct-ru-lora.gguf

I tracked down the commit that causes the issue: 3d82dbc.
If the commit is reverted the problem is fixed.

First Bad Commit

3d82dbc

Relevant log output

Stacktrace:

llama_adapter_lora_init_impl: loading lora adapter from '/home/benuix/.config/LLMUnity/models/Qwen2-0.5B-Instruct-ru-lora.gguf' ...
llama_adapter_lora_init_impl: CPU_Mapped LoRA buffer size =    14.67 MiB
llama_adapter_lora_init_impl: CPU_AARCH64 LoRA buffer size =     2.11 MiB

Thread 1 "llama-server" received signal SIGSEGV, Segmentation fault.
0x00007ffff7ecc32a in ggml_backend_cpu_aarch64_buffer_set_tensor (buffer=0x5555581abd90, tensor=0x5555564b2700, data=0x5555564d0150, offset=0, size=155648) at /home/benuix/codes/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:5632
5632	    auto OK            = tensor_traits->repack(tensor, data, size);
(gdb) where
#0  0x00007ffff7ecc32a in ggml_backend_cpu_aarch64_buffer_set_tensor (buffer=0x5555581abd90, 
    tensor=0x5555564b2700, data=0x5555564d0150, offset=0, size=155648)
    at /home/benuix/codes/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:5632
#1  0x00007ffff772ac03 in ggml_backend_tensor_set (tensor=0x5555564b2700, data=0x5555564d0150, 
    offset=0, size=155648) at /home/benuix/codes/llama.cpp/ggml/src/ggml-backend.cpp:268
#2  0x00007ffff7b6dcc7 in operator() (__closure=0x7fffffff94b0, orig=0x555555b53a80, dev=0x5555564b2700)
    at /home/benuix/codes/llama.cpp/src/llama-adapter.cpp:316
#3  0x00007ffff7b6efc0 in llama_adapter_lora_init_impl (model=..., 
    path_lora=0x555555adf860 "/home/benuix/.config/LLMUnity/models/Qwen2-0.5B-Instruct-ru-lora.gguf", 
    adapter=...) at /home/benuix/codes/llama.cpp/src/llama-adapter.cpp:321
#4  0x00007ffff7b6f619 in llama_adapter_lora_init (model=0x555555b13930, 
    path_lora=0x555555adf860 "/home/benuix/.config/LLMUnity/models/Qwen2-0.5B-Instruct-ru-lora.gguf")
    at /home/benuix/codes/llama.cpp/src/llama-adapter.cpp:333
#5  0x000055555582ab0a in common_init_from_params (params=...)
    at /home/benuix/codes/llama.cpp/common/common.cpp:993
#6  0x0000555555645333 in server_context::load_model (this=0x7fffffffc370, params=...)
    at /home/benuix/codes/llama.cpp/examples/server/server.cpp:1849
#7  0x000055555560127d in main (argc=11, argv=0x7fffffffdb28)
    at /home/benuix/codes/llama.cpp/examples/server/server.cpp:4488
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant