Skip to content

Reset token budget after every user intervention. #306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1054,11 +1054,11 @@ int main(int argc, char ** argv) {
embd_inp.insert(embd_inp.end(), inp_sfx.begin(), inp_sfx.end());
}

remaining_tokens -= line_inp.size();
remaining_tokens = params.n_predict - line_inp.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can get bigger then remaining space in context.
and after https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L850 remaining_tokens, are actually all the space that is left. no?

Copy link
Collaborator

@Green-Sky Green-Sky Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resetting remaining_tokens to params.n_predict would only make sense when we reset the memory, which we don't right now. see #71

Copy link
Contributor Author

@tjohnman tjohnman Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, going over the context size can be a problem. But remaining_tokens usually is smaller than the size of the context (because params.n_predict is), so it should still be reset after every interaction with the user so that the series of tokens can be as long as the first one or as long as the remaining context space allows. It should be clamped to never go over the remaining space in the context.

std::min(params.n_predict, model.hparams.n_ctx - (int) embd_inp.size()) is exactly the formula that should be used instead of the simple assignment I did to reset it so that it doesn't overflow the context.

And in fact, it should be used also when resetting it due to running out of tokens, too. Good catch.


input_noecho = true; // do not echo this again
is_interacting = false;
}
is_interacting = false;
}

// end of text token
Expand Down