Skip to content

UGM tokenizer cost a long time than others #9180

@walsons

Description

@walsons

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

UGM tokenizer is time-consuming when embedding text and is slower than others. When I debugged in the source code, I found the cause was the function llama_tokenize_internal() in the llama-vocab.cpp. llm_tokenizer_ugm will be constructed every time, but its constructor does a for-loop which is very time-consuming. Is there any way to accelerate the process, such as do the for-loop once and cache the needed data if the vocab does not change?

Motivation

It will benefit users for llama.cpp who use an embedding model with UGM tokenizer like multilanguage-e5 or bge-m3...

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions