[Bug]: [V1] wrong output when using kv cache fp8

### Your current environment

KV fp8 is ok when disabling V1 Engine, But when using V1 engine, the output is totally wrong.

vllm version 0.7.1, GPU H100

```
port = 10011
model = "/data/models/qwen2.5_72b-FP8"

#cmd = f"VLLM_USE_V1=1 python3 -m vllm.entrypoints.openai.api_server \
cmd = f"VLLM_USE_V1=1 python3 -m vllm.entrypoints.openai.api_server \
        --port {port} \
        --model {model} \
        --dtype auto \
        -tp 2 \
        --max-model-len 8192 \
        --max-num-seqs 512 \
        --gpu-memory-utilization 0.9 \
        --enable-prefix-caching \
        --swap-space 16 \
        --disable-log-stats \
        --disable-log-requests \
        --trust-remote-code --kv-cache-dtype fp8"
```

### 🐛 Describe the bug

When disabling kv fp8, output length is 140. But when enabling it, output length is 1000+, and the text is unreadable. 

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: [V1] wrong output when using kv cache fp8 #13133

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: [V1] wrong output when using kv cache fp8 #13133

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions