[Performance]: [V1] duplicated prefill tokens for n>1

### Proposal to improve performance

The following statement comes from https://github.com/vllm-project/vllm/pull/10980

> The vLLM v1 engine can exploit APC when a prompt repeats within a batch, even if that prompt was not seen in a previous batch. Therefore, no warmup request is required.

Could you please show me the PR for this feature? I've tested on `v0.7.3` and it seems a warmup request is still required for n>1 cases.

Here is a simple command to reproduce this problem.
```bash
VLLM_USE_V1=1 python3 benchmarks/benchmark_latency.py --model meta-llama/Llama-3.1-8B -tp 1 --input-len 10 --n 2 --output-len 1 --batch-size 1 --trust-remote-code --num-iters 1 --num-iters-warmup 0 --load-format dummy
```
You can see `input_ids` set to `LlamaModel.forward` repeats twice, which leads to computation wastes on prefill tokens. 

### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: [V1] duplicated prefill tokens for n>1 #14686

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: [V1] duplicated prefill tokens for n>1 #14686

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions