batch_add gives lower quality results than batch_get_one

# Description
I'm seeing a strange issue where batches created via `llama_batch_get_one` give better results than batches populated with `llama_batch_add`.

I was trying to convert my code to use `llama_batch_add` because `llama_batch_get_one` has a deprecation note on it, but when I made this conversion, the quality of responses I was getting went down.  This appears to be the case whether or not layers are offloaded to the GPU.

I may not understand the batch API correctly, so it seems plausible that there is a mistake in my code, rather than this being a true bug.  However, if I am using it correctly, it seemed good to raise, as the removal of `llama_batch_get_one` as the comment indicates, would result in either a speed or a quality regression in my project.

# System Information
llama_cpp hash: f87f7b898651339fe173ddf016ca826163e899d8
llama_cpp backend: Vulkan
OS: Windows 10 Pro 64-bit
GPU: Nvidia Geforce RTX 3080
CPU: AMD Ryzen 9 3950X
Model: mistral-7b-instruct-v0.2.Q6_K.gguf (https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF)

# Repro Demonstration Code
[main.cpp.txt](https://github.com/ggerganov/llama.cpp/files/14864409/main.cpp.txt)
This cpp file, when compiled, creates a program that can be called with two arguments.
* The first argument is one of `new`|`old`|`single` to swap between methods of filling a llama_batch.
* The second argument is a path to the model to load for testing

### Bad Result
`main.exe new "C:\\Dev\\SDK\\models\\gguf\\mistral-7b-instruct-v0.2.Q6_K.gguf"`
* This uses `llama_batch_add` to parse the prompt, similar to the `simple` example.
* Results always begin with "Qu"-like tokens, usually resulting in the first English word being something like "Question:" or "Questioner,"
* Changing the last instruction usually still yields things like "Questioner" or "User" as the first word.
"""
Questioner, allow me to paint a vivid tableau of the three most distinguished realms within the intricately woven tapestry of my fantastical universe:
"""

### Good Result A
`main.exe old "C:\\Dev\\SDK\\models\\gguf\\mistral-7b-instruct-v0.2.Q6_K.gguf"`
* This uses `llama_batch_get_one` to parse the prompt, similar to the `main` example.
* First non-prompt word is highly varied and leads into a logical response.
* Changing the last instruction yields logical changes, such as "Who is the most famous person in your books?" yielding "Once," and other such first words.
"""
In the heart of my fantastical realm, where towering mountains meet vast emerald forests and azure seas stretch as far as the eye can see, lie the three grand kingdoms: Valoria, Elidor, and Thundertop.
"""

### Good Result B
`main.exe single "C:\\Dev\\SDK\\models\\gguf\\mistral-7b-instruct-v0.2.Q6_K.gguf"`
* This uses `llama_batch_get_one` to parse the prompt, but dispatches a batch with only a single token each time.
* First non-prompt word is highly varied and leads into a logical response.
* Changing the last instruction yields logical changes, such as "Who is the most famous person in your books?" yielding "Once," and other such first words.
* This method takes longer to evaluate than `old`
"""
In the vast expanse of Eldoria, the realm of magic and wonder, three distinct kingdoms rose like proud pillars against the ever-changing tapestry of the land. Each unique in its history, culture, and people, they stood as beacons of hope and prosperity for their inhabitants.
"""
---
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

batch_add gives lower quality results than batch_get_one #6475

Description

System Information

Repro Demonstration Code

Bad Result

Good Result A

Good Result B

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

batch_add gives lower quality results than batch_get_one #6475

Description

Description

System Information

Repro Demonstration Code

Bad Result

Good Result A

Good Result B

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions