[Bug]: [TPU] Prefix caching + w8a8 + long context results in degraded performance and corrupted output

### Your current environment

<details>
<summary>The environment is the TPU nightly docker image</summary>

</details>

### Model Input Dumps

_No response_

### 🐛 Describe the bug

Model: https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8
Machine: TPU v6e-8
Image: vlm/vllm-tpu:2fc6944c5e69d5d0ce15d09a855452c795d75c3c

I would suggest running this in the TPU VM using tmux

First, start the server
```
docker run --privileged -it --network host --rm -v /dev/shm:/data -e HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN} -e VLLM_XLA_CACHE_PATH=/data/jax --shm-size=10.24gb vllm/vllm-tpu:2fc6944c5e69d5d0ce15d09a855452c795d75c3c python3 -m vllm.entrypoints.openai.api_server --host=0.0.0.0 --port=8000 --tensor-parallel-size=8 --max-model-len=65536 --gpu-memory-utilization=0.75 --max-num-seqs=32 --model=neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 --download-dir /data --disable-log-requests --enable_prefix_caching
```

Run the benchmark from another container instance (tmux pane)
```
docker run -it --rm --network host vllm/vllm-tpu:2fc6944c5e69d5d0ce15d09a855452c795d75c3c  
python3 -m pip install -r requirements-test.txt  
cd benchmarks  
python3 benchmark_serving.py --model neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 --dataset-name sonnet --dataset-path sonnet.txt --num-prompts 32 --sonnet-input-len 65536 --sonnet-output-len 4096 --sonnet-prefix-len 32768 --port 8000
```

Performance is really degraded:

![Image](https://github.com/user-attachments/assets/278da05f-3973-4df8-8d36-864651c4b5d6)

Under certain conditions that are difficult to replicate on deman but have occured twice, the server eventually gets locked into a corrupted state and the client just gets `max_tokens` worth of garbage output

```
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<|reserved_special_token_247|>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<|reserved_special_token_247|><|reserved_special_token_247|>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247|><|reserved_special_token_247>
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: [TPU] Prefix caching + w8a8 + long context results in degraded performance and corrupted output #12371

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: [TPU] Prefix caching + w8a8 + long context results in degraded performance and corrupted output #12371

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions