[not enough space in the buffer error] Qwen model long prompt 

I tested with a Qwen-7B-Chat.q4_0.gguf, using a long prompt. It was running perfectly couple of weeks ago, but with the current code, it doesn't run.

Command 
CUDA_VISIBLE_DEVICES=0 ./main -ngl 99 -m /data1/models/qwen/Qwen-7B-Chat/gguf/Qwen-7B-Chat.q4_0.gguf \
-c 1024 -b 512 -n 512 -s 19861102 -p "xxxxxxxxxxxx" 

During prompt reading, here is the error
**ggml_tallocr_alloc: not enough space in the buffer (needed 155582464, largest block available 151388160)
GGML_ASSERT: ggml-alloc.c:114: !"not enough space in the buffer"
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.**

I tested with a smaller batch size like 128, still doesn't work.
I tested with Minstral-7B-Instruct, it's working. 
I tested with Llama2 13B, it's also working.
So I think some code changes specifically affected Qwen model,



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[not enough space in the buffer error] Qwen model long prompt #5082

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[not enough space in the buffer error] Qwen model long prompt #5082

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions