-
Notifications
You must be signed in to change notification settings - Fork 13k
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
tokenizer.json from gemma2 has this token: "[toxicity=0]": 255968.
When tokenizing that text using llamacpp, we get [235309, 1373, 235293, 235276, 235307]
If I ask llamacapp gemma2 to repeat this text, [toxicity=0], it does so effortlessly.
If I ask corpo hosted gemma2 to repeat it, it fails, thinking there's no text there:
Name and Version
version: 3317 (8e55830)
built with MSVC 19.29.30154.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output
No response
oldgithubman
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)