Skip to content

Commit 18f794e

Browse files
readme : add OpenVINO support details (ggml-org#1112)
1 parent c601f93 commit 18f794e

File tree

1 file changed

+80
-0
lines changed

1 file changed

+80
-0
lines changed

README.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
2222
- [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
2323
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
2424
- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
25+
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
2526
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
2627

2728
Supported platforms:
@@ -311,6 +312,85 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
311312
312313
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
313314
315+
## OpenVINO support
316+
317+
On platforms that support [OpenVINO](https://github.com/openvinotoolkit/openvino), the Encoder inference can be executed
318+
on OpenVINO-supported devices including x86 CPUs and Intel GPUs (integrated & discrete).
319+
320+
This can result in significant speedup in encoder performance. Here are the instructions for generating the OpenVINO model and using it with `whisper.cpp`:
321+
322+
- First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
323+
324+
Windows:
325+
```
326+
cd models
327+
python -m venv openvino_conv_env
328+
openvino_conv_env\Scripts\activate
329+
python -m pip install --upgrade pip
330+
pip install -r openvino-conversion-requirements.txt
331+
```
332+
333+
Linux and macOS:
334+
```
335+
cd models
336+
python3 -m venv openvino_conv_env
337+
source openvino_conv_env/bin/activate
338+
python -m pip install --upgrade pip
339+
pip install -r openvino-conversion-requirements.txt
340+
```
341+
342+
- Generate an OpenVINO encoder model. For example, to generate a `base.en` model, use:
343+
344+
```
345+
python convert-whisper-to-openvino.py --model base.en
346+
```
347+
348+
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
349+
is the default location that the OpenVINO extension will search at runtime.
350+
351+
- Build `whisper.cpp` with OpenVINO support:
352+
353+
Download OpenVINO package from [release page](https://github.com/openvinotoolkit/openvino/releases). The recommended version to use is [2023.0.0](https://github.com/openvinotoolkit/openvino/releases/tag/2023.0.0).
354+
355+
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
356+
357+
Linux:
358+
```bash
359+
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
360+
```
361+
362+
Windows (cmd):
363+
```
364+
C:\Path\To\w_openvino_toolkit_windows_2023.0.0.10926.b4452d56304_x86_64\setupvars.bat
365+
```
366+
367+
And then build the project using cmake:
368+
```bash
369+
cd build
370+
cmake -DWHISPER_OPENVINO=1 ..
371+
```
372+
373+
- Run the examples as usual. For example:
374+
```bash
375+
./main -m models/ggml-base.en.bin -f samples/jfk.wav
376+
377+
...
378+
379+
whisper_ctx_init_openvino_encoder: loading OpenVINO model from 'models/ggml-base.en-encoder-openvino.xml'
380+
whisper_ctx_init_openvino_encoder: first run on a device may take a while ...
381+
whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = GPU, cache_dir = models/ggml-base.en-encoder-openvino-cache
382+
whisper_ctx_init_openvino_encoder: OpenVINO model loaded
383+
384+
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 1 |
385+
386+
...
387+
```
388+
389+
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
390+
cached for the next run.
391+
392+
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
393+
314394
## NVIDIA GPU support via cuBLAS
315395
316396
With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.

0 commit comments

Comments
 (0)