Description
I compared an older version from Nov 23 with Apr 24, and the older version is much faster.
total time = 6225.76 ms
vs
total time = 3817.54 ms
Same CPU, same compiler and settings, same test:
- git clone whisper.cpp
- git reset --hard $COMMIT (with the commits below)
- make -j
- bash ./models/download-ggml-model.sh base.en
- ./bench -w 0
CPU: AMD Ryzen 9 7950X3D 16-Core
- commit 858452d Date: Wed Apr 24 14:56:30 2024 +0300
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_init_state: kv self size = 16.52 MB
whisper_init_state: kv cross size = 18.43 MB
whisper_init_state: compute buffer (conv) = 16.39 MB
whisper_init_state: compute buffer (encode) = 132.07 MB
whisper_init_state: compute buffer (cross) = 4.78 MB
whisper_init_state: compute buffer (decode) = 96.48 MB
system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0
whisper_print_timings: load time = 64.61 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 878.59 ms / 1 runs ( 878.59 ms per run)
whisper_print_timings: decode time = 935.20 ms / 256 runs ( 3.65 ms per run)
whisper_print_timings: batchd time = 544.69 ms / 320 runs ( 1.70 ms per run)
whisper_print_timings: prompt time = 3865.51 ms / 4096 runs ( 0.94 ms per run)
whisper_print_timings: total time = 6225.76 ms
- commit d03c60d Date: Wed Nov 8 04:53:31 2023 +0700
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: model ctx = 140.66 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
whisper_init_state: compute buffer (conv) = 18.50 MB
whisper_init_state: compute buffer (encode) = 81.95 MB
whisper_init_state: compute buffer (cross) = 4.49 MB
whisper_init_state: compute buffer (decode) = 24.70 MB
system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |
whisper_print_timings: load time = 83.24 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 0.00 ms
whisper_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: encode time = 693.48 ms / 1 runs ( 693.48 ms per run)
whisper_print_timings: decode time = 874.80 ms / 256 runs ( 3.42 ms per run)
whisper_print_timings: prompt time = 2249.08 ms / 16 runs ( 140.57 ms per run)
whisper_print_timings: total time = 3817.54 ms
See #89 (comment)