Skip to content

Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770

Closed
@DreamGenX

Description

@DreamGenX

Hello!

Using this GGUF: https://huggingface.co/LoneStriker/opus-v1.2-llama-3-8b-GGUF

When the output contains any of the special tokens, like <|im_start|> or <|im_end|>, they are rendered as empty string. This breaks custom stopping string functionality (e.g. adding "<|im_end|>" to stop strings does not work as it relies on string comparison).

The tokens are tokenized correctly, just not rendered:

main: prompt: '<|im_end|>'
main: number of tokens in prompt = 1
128009 -> ''
main: prompt: '<|im_start|>'
main: number of tokens in prompt = 1
128006 -> ''

I first tested this with old commit:

version: 2243 (201294ae)
201294ae177b308fb3a99dc504dd6d27e8afa907

And replicated with fresh main:

version: 2698 (637e9a86)
637e9a86c220718d008b54842dfd294aa96d3b7a

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions