Bug: llama-parallel crashes when adding more tokens to llama_batch than context size #9667

matiaslin · 2024-09-27T17:16:03Z

What happened?

Observation

The executable llama-parallel crashes with a Segmentation fault when the number of tokens added to a batch exceeds the context size.

Command to reproduce:

$ ./llama-parallel -m [MODEL] -ngl 100 -np 100 -ns 100 -c 512 -n 1024

Explanation of error:

We create llama_batch with n_ctx size. If our n-parallel parameter is large enough (and/or the input prompts are long enough) to exceed the n_ctx, we observe a Segmentation fault when attempting to call llama_batch_add.

Expected behavior:

Return 1 with a clear explanation of the error (e.g. number of tokens being processed within a batch exceeds context size), and a recommendation of what to do (e.g. increase context size).

Name and Version

$ ./llama-cli --version
version: 3828 (95bc82fb)
built with gcc (GCC) 11.3.0 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

...
main: Simulating parallel requests from clients:
main: n_parallel = 100, n_sequences = 100, cont_batching = 1, system tokens = 271

main: Evaluating the system prompt ...

Processing requests ...

main: clearing the KV cache
Client   0, seq    0, started decoding ...
Client   1, seq    1, started decoding ...
Client   2, seq    2, started decoding ...
Client   3, seq    3, started decoding ...
Client   4, seq    4, started decoding ...
Client   5, seq    5, started decoding ...
Client   6, seq    6, started decoding ...
Client   7, seq    7, started decoding ...
Client   8, seq    8, started decoding ...
Client   9, seq    9, started decoding ...
Client  10, seq   10, started decoding ...
Client  11, seq   11, started decoding ...
Client  12, seq   12, started decoding ...
Client  13, seq   13, started decoding ...
Client  14, seq   14, started decoding ...
Client  15, seq   15, started decoding ...
Client  16, seq   16, started decoding ...
Client  17, seq   17, started decoding ...
Client  18, seq   18, started decoding ...
Client  19, seq   19, started decoding ...
Client  20, seq   20, started decoding ...
Client  21, seq   21, started decoding ...
Client  22, seq   22, started decoding ...
Client  23, seq   23, started decoding ...
Client  24, seq   24, started decoding ...
Client  25, seq   25, started decoding ...
Client  26, seq   26, started decoding ...
Client  27, seq   27, started decoding ...
Client  28, seq   28, started decoding ...
Client  29, seq   29, started decoding ...
Client  30, seq   30, started decoding ...
Client  31, seq   31, started decoding ...
Client  32, seq   32, started decoding ...
Client  33, seq   33, started decoding ...
Client  34, seq   34, started decoding ...
Client  35, seq   35, started decoding ...
Client  36, seq   36, started decoding ...
Client  37, seq   37, started decoding ...
Client  38, seq   38, started decoding ...
Client  39, seq   39, started decoding ...
Segmentation fault

The text was updated successfully, but these errors were encountered:

matiaslin added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Sep 27, 2024

matiaslin mentioned this issue Sep 27, 2024

common: ensure token addition to batch does not exceed llama_batch size #9668

Merged

4 tasks

ggerganov closed this as completed in #9668 Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: llama-parallel crashes when adding more tokens to llama_batch than context size #9667

Bug: llama-parallel crashes when adding more tokens to llama_batch than context size #9667

matiaslin commented Sep 27, 2024 •

edited

Loading

Bug: llama-parallel crashes when adding more tokens to llama_batch than context size #9667

Bug: llama-parallel crashes when adding more tokens to llama_batch than context size #9667

Comments

matiaslin commented Sep 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

matiaslin commented Sep 27, 2024 •

edited

Loading