|
| 1 | +# whisper.cpp for SYCL |
| 2 | + |
| 3 | +[Background](#background) |
| 4 | + |
| 5 | +[OS](#os) |
| 6 | + |
| 7 | +[Intel GPU](#intel-gpu) |
| 8 | + |
| 9 | +[Linux](#linux) |
| 10 | + |
| 11 | +[Environment Variable](#environment-variable) |
| 12 | + |
| 13 | +[Known Issue](#known-issue) |
| 14 | + |
| 15 | +[Todo](#todo) |
| 16 | + |
| 17 | +## Background |
| 18 | + |
| 19 | +SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators—such as CPUs, GPUs, and FPGAs. It is a single-source embedded domain-specific language based on pure C++17. |
| 20 | + |
| 21 | +oneAPI is a specification that is open and standards-based, supporting multiple architecture types including but not limited to GPU, CPU, and FPGA. The spec has both direct programming and API-based programming paradigms. |
| 22 | + |
| 23 | +Intel uses the SYCL as direct programming language to support CPU, GPUs and FPGAs. |
| 24 | + |
| 25 | +To avoid re-inventing the wheel, this code refers other code paths in llama.cpp (like OpenBLAS, cuBLAS, CLBlast). We use a open-source tool [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) (Commercial release [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html)) migrate to SYCL. |
| 26 | + |
| 27 | +The whisper.cpp for SYCL is used to support Intel GPUs. |
| 28 | + |
| 29 | +For Intel CPU, recommend to use whisper.cpp for X86 (Intel MKL build). |
| 30 | + |
| 31 | +## OS |
| 32 | + |
| 33 | +|OS|Status|Verified| |
| 34 | +|-|-|-| |
| 35 | +|Linux|Support|Ubuntu 22.04| |
| 36 | +|Windows|Ongoing| | |
| 37 | + |
| 38 | + |
| 39 | +## Intel GPU |
| 40 | + |
| 41 | +|Intel GPU| Status | Verified Model| |
| 42 | +|-|-|-| |
| 43 | +|Intel Data Center Max Series| Support| Max 1550| |
| 44 | +|Intel Data Center Flex Series| Support| Flex 170| |
| 45 | +|Intel Arc Series| Support| Arc 770| |
| 46 | +|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake| |
| 47 | +|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7| |
| 48 | + |
| 49 | + |
| 50 | +## Linux |
| 51 | + |
| 52 | +### Setup Environment |
| 53 | + |
| 54 | +1. Install Intel GPU driver. |
| 55 | + |
| 56 | +a. Please install Intel GPU driver by official guide: [Install GPU Drivers](https://dgpu-docs.intel.com/driver/installation.html). |
| 57 | + |
| 58 | +Note: for iGPU, please install the client GPU driver. |
| 59 | + |
| 60 | +b. Add user to group: video, render. |
| 61 | + |
| 62 | +``` |
| 63 | +sudo usermod -aG render username |
| 64 | +sudo usermod -aG video username |
| 65 | +``` |
| 66 | + |
| 67 | +Note: re-login to enable it. |
| 68 | + |
| 69 | +c. Check |
| 70 | + |
| 71 | +``` |
| 72 | +sudo apt install clinfo |
| 73 | +sudo clinfo -l |
| 74 | +``` |
| 75 | + |
| 76 | +Output (example): |
| 77 | + |
| 78 | +``` |
| 79 | +Platform #0: Intel(R) OpenCL Graphics |
| 80 | + `-- Device #0: Intel(R) Arc(TM) A770 Graphics |
| 81 | +
|
| 82 | +
|
| 83 | +Platform #0: Intel(R) OpenCL HD Graphics |
| 84 | + `-- Device #0: Intel(R) Iris(R) Xe Graphics [0x9a49] |
| 85 | +``` |
| 86 | + |
| 87 | +2. Install Intel® oneAPI Base toolkit. |
| 88 | + |
| 89 | + |
| 90 | +a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). |
| 91 | + |
| 92 | +Recommend to install to default folder: **/opt/intel/oneapi**. |
| 93 | + |
| 94 | +Following guide use the default folder as example. If you use other folder, please modify the following guide info with your folder. |
| 95 | + |
| 96 | +b. Check |
| 97 | + |
| 98 | +``` |
| 99 | +source /opt/intel/oneapi/setvars.sh |
| 100 | +
|
| 101 | +sycl-ls |
| 102 | +``` |
| 103 | + |
| 104 | +There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**. |
| 105 | + |
| 106 | +Output (example): |
| 107 | +``` |
| 108 | +[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000] |
| 109 | +[opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000] |
| 110 | +[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50] |
| 111 | +[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918] |
| 112 | +
|
| 113 | +``` |
| 114 | + |
| 115 | +2. Build locally: |
| 116 | + |
| 117 | +``` |
| 118 | +mkdir -p build |
| 119 | +cd build |
| 120 | +source /opt/intel/oneapi/setvars.sh |
| 121 | +
|
| 122 | +#for FP16 |
| 123 | +#cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DWHISPER_SYCL_F16=ON |
| 124 | +
|
| 125 | +#for FP32 |
| 126 | +cmake .. -DWHISPER_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx |
| 127 | +
|
| 128 | +#build example/main only |
| 129 | +#cmake --build . --config Release --target main |
| 130 | +
|
| 131 | +#build all binary |
| 132 | +cmake --build . --config Release -v |
| 133 | +
|
| 134 | +``` |
| 135 | + |
| 136 | +or |
| 137 | + |
| 138 | +``` |
| 139 | +./examples/sycl/build.sh |
| 140 | +``` |
| 141 | + |
| 142 | +Note: |
| 143 | + |
| 144 | +- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only. |
| 145 | + |
| 146 | +### Run |
| 147 | + |
| 148 | +1. Put model file to folder **models** |
| 149 | + |
| 150 | +2. Enable oneAPI running environment |
| 151 | + |
| 152 | +``` |
| 153 | +source /opt/intel/oneapi/setvars.sh |
| 154 | +``` |
| 155 | + |
| 156 | +3. List device ID |
| 157 | + |
| 158 | +Run without parameter: |
| 159 | + |
| 160 | +``` |
| 161 | +./build/bin/ls-sycl-device |
| 162 | +
|
| 163 | +or |
| 164 | +
|
| 165 | +./build/bin/main |
| 166 | +``` |
| 167 | + |
| 168 | +Check the ID in startup log, like: |
| 169 | + |
| 170 | +``` |
| 171 | +found 4 SYCL devices: |
| 172 | + Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3, |
| 173 | + max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 |
| 174 | + Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2, |
| 175 | + max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280 |
| 176 | + Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0, |
| 177 | + max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280 |
| 178 | + Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0, |
| 179 | + max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136 |
| 180 | +
|
| 181 | +``` |
| 182 | + |
| 183 | +|Attribute|Note| |
| 184 | +|-|-| |
| 185 | +|compute capability 1.3|Level-zero running time, recommended | |
| 186 | +|compute capability 3.0|OpenCL running time, slower than level-zero in most cases| |
| 187 | + |
| 188 | +4. Set device ID and execute whisper.cpp |
| 189 | + |
| 190 | +Set device ID = 0 by **GGML_SYCL_DEVICE=0** |
| 191 | + |
| 192 | +``` |
| 193 | +GGML_SYCL_DEVICE=0 ./build/bin/main -m models/ggml-base.en.bin -f samples/jfk.wav |
| 194 | +``` |
| 195 | +or run by script: |
| 196 | + |
| 197 | +``` |
| 198 | +./examples/sycl/run_whisper.sh |
| 199 | +``` |
| 200 | + |
| 201 | + |
| 202 | + |
| 203 | +5. Check the device ID in output |
| 204 | + |
| 205 | +Like: |
| 206 | +``` |
| 207 | +Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device |
| 208 | +``` |
| 209 | + |
| 210 | + |
| 211 | +## Environment Variable |
| 212 | + |
| 213 | +#### Build |
| 214 | + |
| 215 | +|Name|Value|Function| |
| 216 | +|-|-|-| |
| 217 | +|WHISPER_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, WHISPER_SYCL=ON is mandatory.| |
| 218 | +|WHISPER_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path.For FP32, do not set it.| |
| 219 | +|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path| |
| 220 | +|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path| |
| 221 | + |
| 222 | +#### Running |
| 223 | + |
| 224 | + |
| 225 | +|Name|Value|Function| |
| 226 | +|-|-|-| |
| 227 | +|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output| |
| 228 | +|GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG| |
| 229 | + |
| 230 | +## Known Issue |
| 231 | + |
| 232 | +- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`. |
| 233 | + |
| 234 | + Miss to enable oneAPI running environment. |
| 235 | + |
| 236 | + Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`. |
| 237 | + |
| 238 | + |
| 239 | +- Hang during startup |
| 240 | + |
| 241 | + llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block. |
| 242 | + |
| 243 | + Solution: add **--no-mmap**. |
| 244 | + |
| 245 | +## Todo |
| 246 | + |
| 247 | +- Support to build in Windows. |
| 248 | + |
| 249 | +- Support multiple cards. |
0 commit comments