-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Bug: runtime error in llama_get_logits_ith
after simplify Mamba with advanced batch splits
commit.
#9224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Author of #8526 here.
Basically it should not happen if it worked before. It's possible the internal changes in batch splits caused some external changes of behavior, but I tried to keep the external behavior identical. I'm curious about what exactly is causing the problem you've noticed.
My guess is you need to change the default to If that's solves it, good! Not sure why it worked before, though. If the problem is still there, try to find other divergences from upstream If that still doesn't fix it, the easiest way to debug this is to get a backtrace (with line numbers) from where the problem is detected. If you compile Then you should be able to guess which batch (from which part of the code) might have caused this and then I'd like to know:
|
Hi, @compilade ! Thank you for the answer! Changing default idx to I looked at the change, and it was introduced in e3c337d (5 month ago!) which I probably missed. It looks like the feature was unused before your PR. Was it supposed to affect all model architectures though? From your PR it reads like a big change, but the name of that commit overly simplifies the changes. I also see a significant decrease in prompt processing speed for relatively long prompts (400+ characters) after merging |
That is very good to know!
That's wierd because I don't really see how the batch splits relate to negative indices in
It should not make it slower. For Transformer-based models it avoids copies and uses view into the logical batch to split it into physical batches. From some tests with extremely small models (with 50k parameters) which make most overhead very measurable, prompt processing did not slow down for heterogenous prompts like in the HellaSwag benchmark. At least not after I made it avoid copies for Transformer-based models (it was already avoiding copies before). Basically that change was mostly internal refactoring to allow more elaborate batch splits for recurrent models. It also improves how the legacy batch API is handled (e.g. |
Interesting. To clarify, I currently test it on CPU only, compiled with OpenBLAS. No such issues happened previously, but I disabled OpenMP after Threadpool 2 commit due to slightly slower prompt processing and inference. Now it seems to be required to get adequate processing speeds. I will keep testing it, maybe the prompt processing issue will fix itself after I update sampling completely. In any case, thanks again for pointing out the root cause. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
What happened?
Today I merged latest changes from llama.cpp to my app and suddenly got this error instead of inference:
llama_get_logits_ith: invalid logits id 0, reason: batch.logits[0] != true
After checking multiple past commits I figured out that is was a1631e5 - however, I don't use Mamba models, which is also why I didn't merge this commit earlier.
This issue happens with Nemo and Llama3 (and probably all other models). Previous commits work fine. Why can this happen? What can I do to fix it?
Name and Version
b3614
What operating system are you seeing the problem on?
Windows
Relevant log output
No response
The text was updated successfully, but these errors were encountered: