Skip to content

Commit ad3a6b9

Browse files
authored
Fix two bugs in kv-cache backtrack loop (mlc-ai#856)
Fix two bugs in kv-cache pop loop Bug 1: old code would stop early because output_ids was shortened in-place during the loop Bug 2: off-by-one in backoff size due to break
1 parent 898db76 commit ad3a6b9

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

cpp/llm_chat.cc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1107,10 +1107,9 @@ class LLMChat {
11071107
// back tracking, find the first set of token that is smaller
11081108
// than the length
11091109
size_t backoff = 0;
1110-
for (; backoff < output_ids_.size(); ++backoff) {
1110+
for (; (output_ids_.size() > 0) && (output_message_.length() > stop_pos); ++backoff) {
11111111
output_ids_.pop_back();
11121112
output_message_ = tokenizer_->Decode(output_ids_);
1113-
if (output_message_.length() <= stop_pos) break;
11141113
}
11151114
// resize kv to remove the context
11161115
ft_.fkvcache_array_popn_(kv_cache_, backoff);

0 commit comments

Comments
 (0)