[Bug]: Bert tokenizer is tokenizing some tokens as `UNK`

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of `python collect_env.py` here
```

</details>


### Model Input Dumps

_No response_

### 🐛 Describe the bug

With some Bert and Roberta models like `sentence-transformers/all-MiniLM-L12-v2` I found that the output is not similar to the one generated by `sentence-transformers`. If I place the following prints in `_normalize_prompt_text_to_input()` in `serving_engine.py`
```
        print(f"{input_ids=}")
```
I get `[101, 100, 3007, 1997, 100, 2003, 100, 1012, 102]` for the sentence "The capital of France is Paris.". 100 is the `UNK` token.  When I run with sentence-transformers, I get `[ 101, 1996, 3007, 1997, 2605, 2003, 3000, 1012,  102]` . This problem happens both with `--tokenizer-mode auto` and `--tokenizer-mode slow`.

cc: @DarkLight1337 

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Bert tokenizer is tokenizing some tokens as `UNK` #11184

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Bert tokenizer is tokenizing some tokens as UNK #11184

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Bug]: Bert tokenizer is tokenizing some tokens as `UNK` #11184