Skip to content

Conversation

@IsaacDynamo
Copy link
Contributor

Add parse_special option to /tokenize endpoint.

The parse_special option is explained here #9379.

Tested with:
llama-server.exe --model C:\tools\llm\qwen2.5-coder-1.5b-instruct-q4_k_m.gguf --port 8088

New behavior when parse_special = false:

curl --request POST --url http://localhost:8088/tokenize --header "Content-Type: application/json" 
    --data '{"content": "<|im_start|>Hello World<|im_end|>", "with_pieces": true, "parse_special": false}'
{"tokens":[
    {"id":27,"piece":"<"},
    {"id":91,"piece":"|"},
    {"id":318,"piece":"im"},
    {"id":4906,"piece":"_start"},
    {"id":91,"piece":"|"},
    {"id":29,"piece":">"},
    {"id":9707,"piece":"Hello"},
    {"id":4337,"piece":" World"},
    {"id":27,"piece":"<"},
    {"id":91,"piece":"|"},
    {"id":318,"piece":"im"},
    {"id":6213,"piece":"_end"},
    {"id":91,"piece":"|"},
    {"id":29,"piece":">"}
]}

Old / default behavior (parse_special = true) is unchanged:

curl --request POST --url http://localhost:8088/tokenize --header "Content-Type: application/json" 
    --data '{"content": "<|im_start|>Hello World<|im_end|>", "with_pieces": true}'
{"tokens":[
    {"id":151644,"piece":"<|im_start|>"},
    {"id":9707,"piece":"Hello"},
    {"id":4337,"piece":" World"},
    {"id":151645,"piece":"<|im_end|>"}
]}

@ggerganov ggerganov merged commit b4efd77 into ggml-org:master Jul 21, 2025
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants