-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Description
Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
🐛 Describe the bug
Recently, we met some illegal memory access problems. When we turned off flashinfer sampling, the problem disappeared, or at least it was much harder to trigger. Unfortunately, I can't provide a minimal reproduction here.
After some investigation, I believe the problem is caused by apply_temperature
. Suppose we have a decode batch with both random sampling and greedy sampling. Here will set the temperature for greedy sampling to -1.0, and here will div all the logits against temperature. If the logits (for greedy sampling) contains -inf
, it will become inf
after the division, and hence the probs tensor will be fully nan
, I guess maybe it will cause bad things when using flashinfer top_p sampling. I'm not 100% sure about this, because I use probs with nan
to run against flashinfer top_p sampling, and compute-sanitizer can't find any problems. But anyway, nan
in probs is dangerous, I think it should be fixed.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.