Skip to content

Commit 2e17dfd

Browse files
rabidcopyslarenggerganov
authored
Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode (#333)
* Improve interactive mode's coherence after EOS Aims to improve coherence and ability to resume the interactive session when the user is given input back after an end of text token is reached. Not sure what token 13 is or why it seems to help. See conversation for examples. * Make newline token a constant * dynamically determine newline token * relocate previous newline token const * cleanup whitespace * print a new line on end of text in interactive this may need to be looked into further when not using a reverse prompt * only print manual newline with reverse prompt fix formatting of reverse prompts so they don't end up at the end of the current line while not introducing unnecessary new lines otherwise * alternate approach to replace end of text tokens * Inject the reverse prompt again after eos in interactive mode * tokenize reverse prompt when needed makes this PR compatible with #330 * tokenize and inject only first reverse prompt thanks to tjohnman * tokenize first reverse prompt once * add newline token * add newline token * tokenize/inject reverse prompt for refactor this doesn't seem right though * tokenize nothing for antiprompt if no reverse * Update main.cpp * Update main.cpp * tokenize and inject reverse prompt as needed this doesn't seem to work if the reverse prompt is tokenized outside earlier on * not needed * remove newline token * remove newline token * tokenize newline token * add space to comment * Update main.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Slaren <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
1 parent 20a1a4e commit 2e17dfd

File tree

1 file changed

+15
-6
lines changed

1 file changed

+15
-6
lines changed

main.cpp

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,9 @@ int main(int argc, char ** argv) {
258258
params.interactive = true;
259259
}
260260

261+
// determine newline token
262+
auto llama_token_newline = ::llama_tokenize(ctx, "\n", false);
263+
261264
fprintf(stderr, "\n");
262265
fprintf(stderr, "%s: prompt: '%s'\n", __func__, params.prompt.c_str());
263266
fprintf(stderr, "%s: number of tokens in prompt = %zu\n", __func__, embd_inp.size());
@@ -359,6 +362,16 @@ int main(int argc, char ** argv) {
359362
last_n_tokens.push_back(id);
360363
}
361364

365+
// replace end of text token with newline token when in interactive mode
366+
if (id == llama_token_eos() && params.interactive) {
367+
id = llama_token_newline.front();
368+
if (params.antiprompt.size() != 0) {
369+
// tokenize and inject first reverse prompt
370+
const auto first_antiprompt = ::llama_tokenize(ctx, params.antiprompt.front(), false);
371+
embd_inp.insert(embd_inp.end(), first_antiprompt.begin(), first_antiprompt.end());
372+
}
373+
}
374+
362375
// add it to the context
363376
embd.push_back(id);
364377

@@ -451,12 +464,8 @@ int main(int argc, char ** argv) {
451464

452465
// end of text token
453466
if (embd.back() == llama_token_eos()) {
454-
if (params.interactive) {
455-
is_interacting = true;
456-
} else {
457-
fprintf(stderr, " [end of text]\n");
458-
break;
459-
}
467+
fprintf(stderr, " [end of text]\n");
468+
break;
460469
}
461470

462471
// In interactive mode, respect the maximum number of tokens and drop back to user input when reached.

0 commit comments

Comments
 (0)