Commit b3bdc62

committed

Make whisperfile 12% faster in GPU mode

whisper_process_logits() was computing soft max on its own many times so I changed it to call my vectorized expf() function ggml_vec_soft_max_f32 which was upstreamed to llama.cpp a few months ago. Since this is pretty much the only CPU operation that happens in GPU mode, it has a very huge impact on performance compared to llama's large language model inference

1 parent 0849f32 commit b3bdc62Copy full SHA for b3bdc62

4 files changed

+101

-122

lines changed

llama.cpp
whisper.cpp
- whisper.cpp

4 files changed

+101

-122

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit b3bdc62

4 files changed

4 files changed

File tree

4 files changed

4 files changed

0 commit comments