-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Description
I tested with a Qwen-7B-Chat.q4_0.gguf, using a long prompt. It was running perfectly couple of weeks ago, but with the current code, it doesn't run.
Command
CUDA_VISIBLE_DEVICES=0 ./main -ngl 99 -m /data1/models/qwen/Qwen-7B-Chat/gguf/Qwen-7B-Chat.q4_0.gguf
-c 1024 -b 512 -n 512 -s 19861102 -p "xxxxxxxxxxxx"
During prompt reading, here is the error
ggml_tallocr_alloc: not enough space in the buffer (needed 155582464, largest block available 151388160)
GGML_ASSERT: ggml-alloc.c:114: !"not enough space in the buffer"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
I tested with a smaller batch size like 128, still doesn't work.
I tested with Minstral-7B-Instruct, it's working.
I tested with Llama2 13B, it's also working.
So I think some code changes specifically affected Qwen model,