BPE Tokenizer doesn't have `model > vocab`. #5180

likejazz · 2024-01-29T02:24:30Z

In a recent patch(https://github.com/ggerganov/llama.cpp/blame/d2f650cb5b04ee2726663e79b47da5efe196ce00/convert.py#L337), you imported the vocab list from self.bpe_tokenizer['model']['vocab'], which was originally taken from vocab.json file. However, the BPE Tokenizer's vocab.json file does not have a model > vocab. It does not contain any other metadata and consists only of a vocabulary list.

So, in my opinion, 337 line should be modified as follows:

self.vocab = self.bpe_tokenizer

I hope this helps. Thanks.

The text was updated successfully, but these errors were encountered:

ggerganov · 2024-01-29T08:08:52Z

If I modify it like this, it will stop working for all other cases that have ["model"]["vocab"]. You can probably add some check to see which of the 2 options are available and use that one

likejazz · 2024-01-29T09:16:13Z

OK, I'll send you a little patch for that.

likejazz mentioned this issue Jan 29, 2024

Support for all cases that have/haven't ["model"]["vocab"]. #5189

Merged

likejazz closed this as completed Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BPE Tokenizer doesn't have `model > vocab`. #5180

BPE Tokenizer doesn't have `model > vocab`. #5180

likejazz commented Jan 29, 2024 •

edited

Loading

ggerganov commented Jan 29, 2024

Uh oh!

likejazz commented Jan 29, 2024

Uh oh!

BPE Tokenizer doesn't have model > vocab. #5180

BPE Tokenizer doesn't have model > vocab. #5180

Comments

likejazz commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ggerganov commented Jan 29, 2024

Uh oh!

likejazz commented Jan 29, 2024

Uh oh!

BPE Tokenizer doesn't have `model > vocab`. #5180

BPE Tokenizer doesn't have `model > vocab`. #5180

likejazz commented Jan 29, 2024 •

edited

Loading