CPU Performance Regression? (Older version much faster)

I compared an older version from Nov 23 with Apr 24, and the older version is much faster.

total time =  6225.76 ms
vs
total time = 3817.54 ms

Same CPU, same compiler and settings, same test: 

- git clone whisper.cpp 
- git reset --hard $COMMIT (with the commits below)
- make -j
- bash ./models/download-ggml-model.sh base.en
- ./bench -w 0

CPU: AMD Ryzen 9 7950X3D 16-Core

- commit 858452d58dba3acdc3431c9bced2bb8cfd9bf418 Date:   Wed Apr 24 14:56:30 2024 +0300

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_init_state: kv self size  =   16.52 MB
whisper_init_state: kv cross size =   18.43 MB
whisper_init_state: compute buffer (conv)   =   16.39 MB
whisper_init_state: compute buffer (encode) =  132.07 MB
whisper_init_state: compute buffer (cross)  =    4.78 MB
whisper_init_state: compute buffer (decode) =   96.48 MB

system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0

whisper_print_timings:     load time =    64.61 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =   878.59 ms /     1 runs (  878.59 ms per run)
whisper_print_timings:   decode time =   935.20 ms /   256 runs (    3.65 ms per run)
whisper_print_timings:   batchd time =   544.69 ms /   320 runs (    1.70 ms per run)
whisper_print_timings:   prompt time =  3865.51 ms /  4096 runs (    0.94 ms per run)
whisper_print_timings:    total time =  6225.76 ms

- commit d03c60dd7fa94f3df927c5c90db5a038412ef0b6 Date:   Wed Nov 8 04:53:31 2023 +0700

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   18.50 MB
whisper_init_state: compute buffer (encode) =   81.95 MB
whisper_init_state: compute buffer (cross)  =    4.49 MB
whisper_init_state: compute buffer (decode) =   24.70 MB

system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

whisper_print_timings:     load time =    83.24 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =   693.48 ms /     1 runs (  693.48 ms per run)
whisper_print_timings:   decode time =   874.80 ms /   256 runs (    3.42 ms per run)
whisper_print_timings:   prompt time =  2249.08 ms /    16 runs (  140.57 ms per run)
whisper_print_timings:    total time =  3817.54 ms


See https://github.com/ggerganov/whisper.cpp/issues/89#issuecomment-2081571638


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU Performance Regression? (Older version much faster) #2099

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CPU Performance Regression? (Older version much faster) #2099

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions