You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -311,6 +312,85 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
311
312
312
313
For more information about the Core ML implementation please refer to PR [#566](https://github.com/ggerganov/whisper.cpp/pull/566).
313
314
315
+
## OpenVINO support
316
+
317
+
On platforms that support [OpenVINO](https://github.com/openvinotoolkit/openvino), the Encoder inference can be executed
318
+
on OpenVINO-supported devices including x86 CPUs and Intel GPUs (integrated & discrete).
319
+
320
+
This can result in significant speedup in encoder performance. Here are the instructions for generating the OpenVINO model and using it with `whisper.cpp`:
321
+
322
+
- First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
349
+
is the default location that the OpenVINO extension will search at runtime.
350
+
351
+
- Build `whisper.cpp` with OpenVINO support:
352
+
353
+
Download OpenVINO package from [release page](https://github.com/openvinotoolkit/openvino/releases). The recommended version to use is [2023.0.0](https://github.com/openvinotoolkit/openvino/releases/tag/2023.0.0).
354
+
355
+
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
390
+
cached for the next run.
391
+
392
+
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
393
+
314
394
## NVIDIA GPU support via cuBLAS
315
395
316
396
With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
0 commit comments