Skip to content

Bug: llama-parallel crashes when adding more tokens to llama_batch than context size #9667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matiaslin opened this issue Sep 27, 2024 · 0 comments · Fixed by #9668
Closed
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)

Comments

@matiaslin
Copy link
Contributor

matiaslin commented Sep 27, 2024

What happened?

Observation

  • The executable llama-parallel crashes with a Segmentation fault when the number of tokens added to a batch exceeds the context size.

Command to reproduce:

$ ./llama-parallel -m [MODEL] -ngl 100 -np 100 -ns 100 -c 512 -n 1024

Explanation of error:

  • We create llama_batch with n_ctx size. If our n-parallel parameter is large enough (and/or the input prompts are long enough) to exceed the n_ctx, we observe a Segmentation fault when attempting to call llama_batch_add.

Expected behavior:

  • Return 1 with a clear explanation of the error (e.g. number of tokens being processed within a batch exceeds context size), and a recommendation of what to do (e.g. increase context size).

Name and Version

$ ./llama-cli --version
version: 3828 (95bc82fb)
built with gcc (GCC) 11.3.0 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

...
main: Simulating parallel requests from clients:
main: n_parallel = 100, n_sequences = 100, cont_batching = 1, system tokens = 271

main: Evaluating the system prompt ...

Processing requests ...

main: clearing the KV cache
Client   0, seq    0, started decoding ...
Client   1, seq    1, started decoding ...
Client   2, seq    2, started decoding ...
Client   3, seq    3, started decoding ...
Client   4, seq    4, started decoding ...
Client   5, seq    5, started decoding ...
Client   6, seq    6, started decoding ...
Client   7, seq    7, started decoding ...
Client   8, seq    8, started decoding ...
Client   9, seq    9, started decoding ...
Client  10, seq   10, started decoding ...
Client  11, seq   11, started decoding ...
Client  12, seq   12, started decoding ...
Client  13, seq   13, started decoding ...
Client  14, seq   14, started decoding ...
Client  15, seq   15, started decoding ...
Client  16, seq   16, started decoding ...
Client  17, seq   17, started decoding ...
Client  18, seq   18, started decoding ...
Client  19, seq   19, started decoding ...
Client  20, seq   20, started decoding ...
Client  21, seq   21, started decoding ...
Client  22, seq   22, started decoding ...
Client  23, seq   23, started decoding ...
Client  24, seq   24, started decoding ...
Client  25, seq   25, started decoding ...
Client  26, seq   26, started decoding ...
Client  27, seq   27, started decoding ...
Client  28, seq   28, started decoding ...
Client  29, seq   29, started decoding ...
Client  30, seq   30, started decoding ...
Client  31, seq   31, started decoding ...
Client  32, seq   32, started decoding ...
Client  33, seq   33, started decoding ...
Client  34, seq   34, started decoding ...
Client  35, seq   35, started decoding ...
Client  36, seq   36, started decoding ...
Client  37, seq   37, started decoding ...
Client  38, seq   38, started decoding ...
Client  39, seq   39, started decoding ...
Segmentation fault
@matiaslin matiaslin added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant