[Bug]: Chunked prefill returning gibberish in some cases.

### Your current environment

main branch Dockerfile.rocm default dependencies.


### 🐛 Describe the bug

--max-num-batched-tokens=131072 --enable-chunked-prefill -- perfect response. temp 0
--max-num-batched-tokens=16384 --enable-chunked-prefill -- gibberish response.  temp 0

Using a prompt of 100001 seq length & generate 100 tokens.

With temp 0, the gibberish does NOT match itself across iterations
E.g.: Good response 1 = ``` What? What?”\n\n“Why, the bridge was mined [...]```
Bad response 1 = ``` So far as Jiedgilliesgillies-illies-illies-er. A Jemel-er-illies-ied-: \xa0 [...]```
Bad response 2 is entirely different from 1 = ```\xa0gillies in England-ied. A Jiedgeld-eren [...]```

I haven't looked into the VLLM impl yet. It seems like maybe the tensors are not initialized correctly somewhere and are inheriting whatever values were already in memory at the time.

I have seen this kind of thing happen before when someone uses `x = torch.empty(size)` -- which initializes to whatever memory already had set for that segment -- when they meant to use / wanted zeros.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Chunked prefill returning gibberish in some cases. #4697

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Chunked prefill returning gibberish in some cases. #4697

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions