Skip to content

Bug: runtime error in llama_get_logits_ith after simplify Mamba with advanced batch splits commit. #9224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MaggotHATE opened this issue Aug 28, 2024 · 5 comments
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) stale

Comments

@MaggotHATE
Copy link
Contributor

What happened?

Today I merged latest changes from llama.cpp to my app and suddenly got this error instead of inference:

llama_get_logits_ith: invalid logits id 0, reason: batch.logits[0] != true

After checking multiple past commits I figured out that is was a1631e5 - however, I don't use Mamba models, which is also why I didn't merge this commit earlier.

This issue happens with Nemo and Llama3 (and probably all other models). Previous commits work fine. Why can this happen? What can I do to fix it?

Name and Version

b3614

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

@MaggotHATE MaggotHATE added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Aug 28, 2024
@compilade
Copy link
Collaborator

compilade commented Sep 2, 2024

Author of #8526 here.

Why can this happen?

Basically it should not happen if it worked before.

It's possible the internal changes in batch splits caused some external changes of behavior, but I tried to keep the external behavior identical. I'm curious about what exactly is causing the problem you've noticed.

What can I do to fix it?

My guess is you need to change the default to -1 here: https://github.com/MaggotHATE/Llama_chat/blob/b9a9c2775d61824a8225284f700029dd81390a63/base/sampling.h#L145
(otherwise this diverges from upstream llama.cpp)

If that's solves it, good! Not sure why it worked before, though.

If the problem is still there, try to find other divergences from upstream llama.cpp which shouldn't be different (apart from the extra samplers), and then correct them if relevant.

If that still doesn't fix it, the easiest way to debug this is to get a backtrace (with line numbers) from where the problem is detected. If you compile llama.cpp without NDEBUG, then there should be a GGML_ASSERT triggered in llama_get_logits_ith which should create a backtrace if you have gdb or lldb installed.

Then you should be able to guess which batch (from which part of the code) might have caused this and then I'd like to know:

  1. What is calling llama_get_logits_ith?
  2. Is llama_batch_get_one used to create the problematic batch?
  3. Was there more than one token in the batch? (if so, the suggested change of using -1 should fix the problem)
    • why did it work before? No idea.
  4. Was there only a single token in the batch?
    • I don't see why this would cause the problem you've seen.

@MaggotHATE
Copy link
Contributor Author

Hi, @compilade ! Thank you for the answer! Changing default idx to -1 helps, the error no longer occurs. I guess I will have to update my sampling.h/cpp soon anyway - especially since sampling refactoring PR is ready.

I looked at the change, and it was introduced in e3c337d (5 month ago!) which I probably missed. It looks like the feature was unused before your PR.

Was it supposed to affect all model architectures though? From your PR it reads like a big change, but the name of that commit overly simplifies the changes.

I also see a significant decrease in prompt processing speed for relatively long prompts (400+ characters) after merging ggml.h/cpp and llama.h/.cpp. I had to use OpenMP again to fix it. Just to clarify, can new batch splits affect prompt processing? I didn't test long prompts back then because of the runtime error, so I'm not sure if was there too.

@compilade
Copy link
Collaborator

Thank you for the answer! Changing default idx to -1 helps, the error no longer occurs.

That is very good to know!

It looks like the feature was unused before your PR.

That's wierd because I don't really see how the batch splits relate to negative indices in llama_get_logits_ith. Like I said, I don't know why it worked for you for a while with the default idx at 0 even when the upsteam code assumes -1. It should have failed months ago.

Just to clarify, can new batch splits affect prompt processing?

It should not make it slower. For Transformer-based models it avoids copies and uses view into the logical batch to split it into physical batches.

From some tests with extremely small models (with 50k parameters) which make most overhead very measurable, prompt processing did not slow down for heterogenous prompts like in the HellaSwag benchmark. At least not after I made it avoid copies for Transformer-based models (it was already avoiding copies before).

Basically that change was mostly internal refactoring to allow more elaborate batch splits for recurrent models. It also improves how the legacy batch API is handled (e.g. llama_batch_get_one) by re-using the same buffers instead of re-allocating new ones at each llama_decode. But the behavior should be the same.

@MaggotHATE
Copy link
Contributor Author

Interesting. To clarify, I currently test it on CPU only, compiled with OpenBLAS. No such issues happened previously, but I disabled OpenMP after Threadpool 2 commit due to slightly slower prompt processing and inference. Now it seems to be required to get adequate processing speeds.

I will keep testing it, maybe the prompt processing issue will fix itself after I update sampling completely. In any case, thanks again for pointing out the root cause.

@github-actions github-actions bot added the stale label Oct 3, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) stale
Projects
None yet
Development

No branches or pull requests

2 participants