Feature Request: the rest api does not allow to retrieve chat completions in raw tokens

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Without raw tokens, we cannot properly use models like gpt-oss because the de-tokenisation ruins the Harmony schema, making it impossible for us to cleanly parse the output.

Either this, or you need to parse the harmony from the model, and output deltas with role=thinking and role=assistant or even generally, role=<channel_name> with the Harmony frames omitted.

I checked this is up to date:
```bash
$ llama-server --version
version: 6100 (65c797c4)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0
```

### Motivation

We need gpt-oss model to work properly, and since ollama and llamacpp have not seemed to have implemented Harmony, we need to have the tools to make it work ourselves. Hence raw tokens, please. I am tripping over the fact that the detokenised stream does not include some of the Harmony markers (like <|end|> is missing) which trips up the openai_harmony python library when stream decoding

### Possible Implementation

Ollama can already dump raw tokens from llamacpp using c-api, you just need to change the rest endpoint so that if we set a "raw" flag in our request, we get back tokens instead of the current partially de-tokenised response (like why are you letting <|start|> out but not <|end|>?)

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: the rest api does not allow to retrieve chat completions in raw tokens #15731

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: the rest api does not allow to retrieve chat completions in raw tokens #15731

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions