llama_chat_apply_template best practices?

### Description

I was working on some samples using the Stateless Executor, and I kept running into instances where the results were pretty crummy and they'd go off the rails pretty frequently, running until the token length was hit. See this example:
![before](https://github.com/user-attachments/assets/37ac47d5-45cf-49ef-89f4-fde6c51f6d50)

Googling for results, a lot of things talked about better models, throwing more stop words at it or a more precise token length. But I found that the results when using anything through `ChatSession` seemed to not have this issue. That made me feel like there was something else at play beyond model configuration. Digging in, it seems like `ChatSession` is using `LlamaTemplate` which ultimately call `llama_chat_apply_template`

I added a manual call to `LlamaTemplate` to the StatelessModeExecute sample, and I immediately started getting better results as well as a stop to the rambling.

```csharp
while (true)
{
    await foreach (var text in executor.InferAsync(GetWrappedPrompt(prompt), inferenceParams))
    {
        Console.Write(text);
    }
    Console.ForegroundColor = ConsoleColor.Green;
    prompt = Console.ReadLine();
    Console.ForegroundColor = ConsoleColor.White;
}

string GetWrappedPrompt(string input)
{
    var template = new LLamaTemplate(model.NativeHandle)
    {
        AddAssistant = true
    };
    template.Add("system", "I am a helpful bot, specializing in bear names.");
    template.Add("user", input);
    return PromptTemplateTransformer.ToModelPrompt(template);
}
```

![after](https://github.com/user-attachments/assets/8ed886c5-978f-4e9a-9840-8372ae9b2b1a)


However!

I tried to apply a similiar fix to Instruct mode chat. Originally it started repeating immediately with the opening prompt, showing obvious signs of the same issue.
![before-instruct-mode](https://github.com/user-attachments/assets/651bcee4-442f-4ae0-ad91-e2b76718be23)

Applying the fix resolved the original issue, but then it struggled with the chat. I suspect because my fix was getting intermixed with the history but I haven't had a chance to fully dig in, but it did give me cause for my "fix"
![after-instruct](https://github.com/user-attachments/assets/947912ea-e148-494d-9c92-509aa290aff0)


i'm not really sure what best practices should be here. I *feel* like a fix in the guts of the built-in executors would be the move, but honestly I've only been using Llama for a few days so I'm not sure if people are apply these system prompts manually in their code and a fix like that would result in breaking like I saw in the instruct mode chat. From looking around at other projects, and even the llama.cpp project itself, it seems like the use of `llama_chat_apply_template` is a bit of a crap shoot.

Btw, i'm a maintainer of Spectre.Console, brought a smile to my face to see it. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

llama_chat_apply_template best practices? #1021

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

llama_chat_apply_template best practices? #1021

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions