-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Description
Your current environment
main branch Dockerfile.rocm default dependencies.
🐛 Describe the bug
--max-num-batched-tokens=131072 --enable-chunked-prefill -- perfect response. temp 0
--max-num-batched-tokens=16384 --enable-chunked-prefill -- gibberish response. temp 0
Using a prompt of 100001 seq length & generate 100 tokens.
With temp 0, the gibberish does NOT match itself across iterations
E.g.: Good response 1 = What? What?”\n\n“Why, the bridge was mined [...]
Bad response 1 = So far as Jiedgilliesgillies-illies-illies-er. A Jemel-er-illies-ied-: \xa0 [...]
Bad response 2 is entirely different from 1 = \xa0gillies in England-ied. A Jiedgeld-eren [...]
I haven't looked into the VLLM impl yet. It seems like maybe the tensors are not initialized correctly somewhere and are inheriting whatever values were already in memory at the time.
I have seen this kind of thing happen before when someone uses x = torch.empty(size)
-- which initializes to whatever memory already had set for that segment -- when they meant to use / wanted zeros.