Skip to content

[not enough space in the buffer error] Qwen model long prompt #5082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JianbangZ opened this issue Jan 22, 2024 · 1 comment · Fixed by #5086
Closed

[not enough space in the buffer error] Qwen model long prompt #5082

JianbangZ opened this issue Jan 22, 2024 · 1 comment · Fixed by #5086
Assignees

Comments

@JianbangZ
Copy link

I tested with a Qwen-7B-Chat.q4_0.gguf, using a long prompt. It was running perfectly couple of weeks ago, but with the current code, it doesn't run.

Command
CUDA_VISIBLE_DEVICES=0 ./main -ngl 99 -m /data1/models/qwen/Qwen-7B-Chat/gguf/Qwen-7B-Chat.q4_0.gguf
-c 1024 -b 512 -n 512 -s 19861102 -p "xxxxxxxxxxxx"

During prompt reading, here is the error
ggml_tallocr_alloc: not enough space in the buffer (needed 155582464, largest block available 151388160)
GGML_ASSERT: ggml-alloc.c:114: !"not enough space in the buffer"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.

I tested with a smaller batch size like 128, still doesn't work.
I tested with Minstral-7B-Instruct, it's working.
I tested with Llama2 13B, it's also working.
So I think some code changes specifically affected Qwen model,

@JianbangZ
Copy link
Author

I located the commit that causes the issue
6df465a
#5049
@slaren would you please take a look? The above issue occured after this commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants