Skip to content

[not enough space in the buffer error] Qwen model long prompt  #5082

@JianbangZ

Description

@JianbangZ

I tested with a Qwen-7B-Chat.q4_0.gguf, using a long prompt. It was running perfectly couple of weeks ago, but with the current code, it doesn't run.

Command
CUDA_VISIBLE_DEVICES=0 ./main -ngl 99 -m /data1/models/qwen/Qwen-7B-Chat/gguf/Qwen-7B-Chat.q4_0.gguf
-c 1024 -b 512 -n 512 -s 19861102 -p "xxxxxxxxxxxx"

During prompt reading, here is the error
ggml_tallocr_alloc: not enough space in the buffer (needed 155582464, largest block available 151388160)
GGML_ASSERT: ggml-alloc.c:114: !"not enough space in the buffer"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.

I tested with a smaller batch size like 128, still doesn't work.
I tested with Minstral-7B-Instruct, it's working.
I tested with Llama2 13B, it's also working.
So I think some code changes specifically affected Qwen model,

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions