You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
whisper_process_logits() was computing soft max on its own many times so
I changed it to call my vectorized expf() function ggml_vec_soft_max_f32
which was upstreamed to llama.cpp a few months ago. Since this is pretty
much the only CPU operation that happens in GPU mode, it has a very huge
impact on performance compared to llama's large language model inference
0 commit comments