Skip to content

[Bug]: Chunked prefill returning gibberish in some cases. #4697

@fmmoret

Description

@fmmoret

Your current environment

main branch Dockerfile.rocm default dependencies.

🐛 Describe the bug

--max-num-batched-tokens=131072 --enable-chunked-prefill -- perfect response. temp 0
--max-num-batched-tokens=16384 --enable-chunked-prefill -- gibberish response. temp 0

Using a prompt of 100001 seq length & generate 100 tokens.

With temp 0, the gibberish does NOT match itself across iterations
E.g.: Good response 1 = What? What?”\n\n“Why, the bridge was mined [...]
Bad response 1 = So far as Jiedgilliesgillies-illies-illies-er. A Jemel-er-illies-ied-: \xa0 [...]
Bad response 2 is entirely different from 1 = \xa0gillies in England-ied. A Jiedgeld-eren [...]

I haven't looked into the VLLM impl yet. It seems like maybe the tensors are not initialized correctly somewhere and are inheriting whatever values were already in memory at the time.

I have seen this kind of thing happen before when someone uses x = torch.empty(size) -- which initializes to whatever memory already had set for that segment -- when they meant to use / wanted zeros.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions