Closed
Description
Name and Version
llama-server
or llama-mtmd-cli
from commit 05f6ac6
Operating systems
Linux
GGML backends
CPU
Hardware
Ryzen Threadripper 7970X with or without GPUs
Models
Both Llama 4 Scout/Maverick
Gemma 3 works fine in the same build / environment.
Problem description & steps to reproduce
Generate an empty 400x800 image
convert -size 400x800 xc:none 400x800.png
Quick debug
llama-mtmd-cli -c 32768 --mmproj ~/models/mmproj-llama4-109b.gguf -m ~/models/llama4-109b-q4_0.gguf -dev none
> /image /home/david/Desktop/400x800.png
/home/david/Desktop/400x800.png image loaded
> test
encoding image slice...
image slice encoded in 447 ms
decoding image batch 1/1, n_tokens_batch = 144
image decoded (batch 1/1) in 1483 ms
encoding image slice...
image slice encoded in 430 ms
decoding image batch 1/1, n_tokens_batch = 144
image decoded (batch 1/1) in 1481 ms
encoding image slice...
Thread 1 "llama-mtmd-cli" received signal SIGSEGV, Segmentation fault.
0x00005555555d7a03 in mtmd_encode ()
(gdb) bt
#0 0x00005555555d7a03 in mtmd_encode ()
#1 0x00005555555df748 in mtmd_helper_eval_chunk_single ()
#2 0x00005555555dfbd1 in mtmd_helper_eval_chunks ()
#3 0x00005555555d2e92 in eval_message(mtmd_cli_context&, common_chat_msg&, bool) ()
#4 0x00005555555d2831 in main ()
(gdb)
llama-server -c 32768 --no-context-shift --jinja --mmproj ~/models/mmproj-llama4-109b.gguf -m ~/models/llama4-109b-q4_0.gguf -dev none -ctk q8_0 -ctv q8_0 -fa --temp 0.6 --min-p 0.01 --top-p 0.9 --host 0.0.0.0 --port 8000 --no-mmap -t 32
Use Open WebUI to upload above 400x800 PNG image
Thread 6 "llama-server" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffccdfb6c0 (LWP 89543)]
0x00005555556b7dbd in mtmd_input_chunk_get_n_pos ()
(gdb) bt
#0 0x00005555556b7dbd in mtmd_input_chunk_get_n_pos ()
#1 0x000055555568dc7e in server_tokens::push_back(mtmd_input_chunk const*) ()
#2 0x000055555560743d in main::$_2::operator()(server_task_type, nlohmann::json_abi_v3_11_3::basic_json<nlohmann::json_abi_v3_11_3::ordered_map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_3::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> >, void>&, std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&, std::function<bool ()> const&, httplib::Response&, oaicompat_type) const ()
#3 0x000055555560d380 in std::_Function_handler<void (httplib::Request const&, httplib::Response&), main::$_14>::_M_invoke(std::_Any_data const&, httplib::Request const&, httplib::Response&) ()
#4 0x000055555565b4ea in httplib::Server::routing(httplib::Request&, httplib::Response&, httplib::Stream&) ()
#5 0x000055555565927a in httplib::Server::process_request(httplib::Stream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, bool, bool&, std::function<void (httplib::Request&)> const&) ()
#6 0x0000555555657c76 in httplib::detail::process_server_socket<httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}>(std::atomic<int> const&, int, unsigned long, long, long, long, long, long, httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1})::{lambda(bool, bool&)#1}::operator()(bool, bool&) const ()
#7 0x000055555562dcbf in httplib::Server::process_and_close_socket(int) ()
#8 0x0000555555630ff1 in httplib::ThreadPool::worker::operator()() ()
#9 0x00007fffe74e1224 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fffe729cb7b in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:448
#11 0x00007fffe731a7b8 in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
First Bad Commit
No response
Relevant log output
In above quick debug section