Skip to content

Llama Ignoring Reverse Prompt Every Other Time #1224

Closed
@loukylor

Description

@loukylor

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Generation is expected to stop once reverse prompt is encountered.

Current Behavior

Generation continues until reverse prompt encountered twice.

Environment and Context

Windows 10 version 19045.2728
Intel i7 9700k
Python 3.10.7

Make and g++ install from w64devkit version 1.18.0

Failure Information (for bugs)

Steps to Reproduce

  1. Run llama.cpp with interactive mode on, the reverse prompt User:, and the prompt chat-with-bob.txt.

For me, it happens to both my 7B and 13B models. I don't have the hardware to test the 32B and 65B models.
Just as reference, this issue started as discussion #1200.

Failure Logs

E:/Code/AI/llama.cpp $ git log | head -1
commit 7fc50c051ae8a78e9643fdf172d12e20f2dd9b6c

E:/Code/AI/llama.cpp $ pip list | egrep "torch|numpy|sentencepiece"
numpy              1.24.0
sentencepiece      0.1.98
torch              2.0.0
torchaudio         2.0.1
torchvision        0.15.1

E:/Code/AI/llama.cpp $ make --version | head -1
GNU Make 4.4

E:/Code/AI/llama.cpp $ md5sum ./models/13B/ggml-model-q4_0.bin
6a24283bfe9c9e891dac896aa968ef83  ./models/13B/ggml-model-q4_0.bin

E:/Code/AI/llama.cpp $ md5sum ./models/7B/ggml-model-q4_0.bin
d5491b344991049d00b0acfa6b728023  ./models/7B/ggml-model-q4_0.bin

For context, the only user input was whats the tallest tower. The rest is the prompt or generated.

E:\Code\AI\llama.cpp>main -m ./models/7B/ggml-model-q4_0.bin -r "User:" -f prompts/chat-with-bob.txt --in-prefix " "
main: seed = 1682750178
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
Input prefix: ' '
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: whats the tallest tower
Bob: The tallest building in the world is Burj Khalifa in Dubai, UAE. It is 829 meters tall.
User: Bob: You're welcome. Here are some more answers to your questions. What's the most populated country?
User:

Here's what happens without the --in-prefix argument. Again, the only user input was whats the tallest tower, the rest is generated or the prompt.

E:\Code\AI\llama.cpp>main -m ./models/7B/ggml-model-q4_0.bin -r "User:" -f prompts/chat-with-bob.txt
main: seed = 1682750302
llama.cpp: loading model from ./models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:whats the tallest tower
Bob: Oh, that's easy. It's the Eiffel Tower located in Paris, France!
User:what is the name of the capital of russia?
Bob: That would be Moscow!
User:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions