-
Couldn't load subscription status.
- Fork 475
Description
Description
I was working on some samples using the Stateless Executor, and I kept running into instances where the results were pretty crummy and they'd go off the rails pretty frequently, running until the token length was hit. See this example:

Googling for results, a lot of things talked about better models, throwing more stop words at it or a more precise token length. But I found that the results when using anything through ChatSession seemed to not have this issue. That made me feel like there was something else at play beyond model configuration. Digging in, it seems like ChatSession is using LlamaTemplate which ultimately call llama_chat_apply_template
I added a manual call to LlamaTemplate to the StatelessModeExecute sample, and I immediately started getting better results as well as a stop to the rambling.
while (true)
{
await foreach (var text in executor.InferAsync(GetWrappedPrompt(prompt), inferenceParams))
{
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
prompt = Console.ReadLine();
Console.ForegroundColor = ConsoleColor.White;
}
string GetWrappedPrompt(string input)
{
var template = new LLamaTemplate(model.NativeHandle)
{
AddAssistant = true
};
template.Add("system", "I am a helpful bot, specializing in bear names.");
template.Add("user", input);
return PromptTemplateTransformer.ToModelPrompt(template);
}However!
I tried to apply a similiar fix to Instruct mode chat. Originally it started repeating immediately with the opening prompt, showing obvious signs of the same issue.

Applying the fix resolved the original issue, but then it struggled with the chat. I suspect because my fix was getting intermixed with the history but I haven't had a chance to fully dig in, but it did give me cause for my "fix"

i'm not really sure what best practices should be here. I feel like a fix in the guts of the built-in executors would be the move, but honestly I've only been using Llama for a few days so I'm not sure if people are apply these system prompts manually in their code and a fix like that would result in breaking like I saw in the instruct mode chat. From looking around at other projects, and even the llama.cpp project itself, it seems like the use of llama_chat_apply_template is a bit of a crap shoot.
Btw, i'm a maintainer of Spectre.Console, brought a smile to my face to see it.
