Skip to content

Commit 8c4aa67

Browse files
NeoZhangJianyuabhilash1910
authored andcommitted
support SYCL backend windows build (ggml-org#5208)
* support SYCL backend windows build * add windows build in CI * add for win build CI * correct install oneMKL * fix install issue * fix ci * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix install cmd * fix win build * fix win build * fix win build * restore other CI part * restore as base * rm no new line * fix no new line issue, add -j * fix grammer issue * allow to trigger manually, fix format issue * fix format * add newline * fix format * fix format * fix format issuse --------- Co-authored-by: Abhilash Majumder <[email protected]>
1 parent 98725e8 commit 8c4aa67

File tree

9 files changed

+279
-12
lines changed

9 files changed

+279
-12
lines changed

.github/workflows/build.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,31 @@ jobs:
565565
path: |
566566
cudart-llama-bin-win-cu${{ matrix.cuda }}-x64.zip
567567
568+
windows-latest-cmake-sycl:
569+
runs-on: windows-latest
570+
defaults:
571+
run:
572+
shell: bash
573+
574+
env:
575+
WINDOWS_BASEKIT_URL: https://registrationcenter-download.intel.com/akdlm/IRC_NAS/62641e01-1e8d-4ace-91d6-ae03f7f8a71f/w_BaseKit_p_2024.0.0.49563_offline.exe
576+
WINDOWS_DPCPP_MKL: intel.oneapi.win.cpp-dpcpp-common:intel.oneapi.win.mkl.devel
577+
578+
579+
steps:
580+
- name: Clone
581+
id: checkout
582+
uses: actions/checkout@v3
583+
with:
584+
fetch-depth: 0
585+
586+
- name: Install
587+
run: scripts/install-oneapi.bat $WINDOWS_BASEKIT_URL $WINDOWS_DPCPP_MKL
588+
589+
- name: Build
590+
id: cmake_build
591+
run: examples/sycl/win-build-sycl.bat
592+
568593
ios-xcode-build:
569594
runs-on: macos-latest
570595

.github/workflows/editorconfig.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
name: EditorConfig Checker
22

33
on:
4+
workflow_dispatch: # allows manual triggering
5+
inputs:
6+
create_release:
7+
description: 'Create new release'
8+
required: true
9+
type: boolean
410
push:
511
branches:
612
- master

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,3 +89,4 @@ examples/jeopardy/results.txt
8989

9090
poetry.lock
9191
poetry.toml
92+
nppBackup

CMakeLists.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -507,7 +507,11 @@ if (LLAMA_SYCL)
507507
set(GGML_HEADERS_SYCL ggml.h ggml-sycl.h)
508508
set(GGML_SOURCES_SYCL ggml-sycl.cpp)
509509

510-
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} sycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
510+
if (WIN32)
511+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} -fsycl sycl7 OpenCL mkl_sycl_blas_dll.lib mkl_intel_ilp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib)
512+
else()
513+
set(LLAMA_EXTRA_LIBS ${LLAMA_EXTRA_LIBS} -fsycl OpenCL mkl_core pthread m dl mkl_sycl_blas mkl_intel_ilp64 mkl_tbb_thread)
514+
endif()
511515
endif()
512516

513517
if (LLAMA_KOMPUTE)

README_sycl.md renamed to README-sycl.md

Lines changed: 184 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,14 @@
88

99
[Linux](#linux)
1010

11+
[Windows](#windows)
12+
1113
[Environment Variable](#environment-variable)
1214

1315
[Known Issue](#known-issue)
1416

17+
[Q&A](#q&a)
18+
1519
[Todo](#todo)
1620

1721
## Background
@@ -33,7 +37,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
3337
|OS|Status|Verified|
3438
|-|-|-|
3539
|Linux|Support|Ubuntu 22.04|
36-
|Windows|Ongoing| |
40+
|Windows|Support|Windows 11|
3741

3842

3943
## Intel GPU
@@ -42,7 +46,7 @@ For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).
4246
|-|-|-|
4347
|Intel Data Center Max Series| Support| Max 1550|
4448
|Intel Data Center Flex Series| Support| Flex 170|
45-
|Intel Arc Series| Support| Arc 770|
49+
|Intel Arc Series| Support| Arc 770, 730M|
4650
|Intel built-in Arc GPU| Support| built-in Arc GPU in Meteor Lake|
4751
|Intel iGPU| Support| iGPU in i5-1250P, i7-1165G7|
4852

@@ -131,6 +135,7 @@ cmake .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
131135
#build all binary
132136
cmake --build . --config Release -v
133137
138+
cd ..
134139
```
135140

136141
or
@@ -195,7 +200,7 @@ GGML_SYCL_DEVICE=0 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "Building
195200
or run by script:
196201

197202
```
198-
./examples/sycl/run_llama2.sh
203+
./examples/sycl/run-llama2.sh
199204
```
200205

201206
Note:
@@ -205,11 +210,175 @@ Note:
205210

206211
5. Check the device ID in output
207212

208-
Like
213+
Like:
209214
```
210215
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
211216
```
212217

218+
## Windows
219+
220+
### Setup Environment
221+
222+
1. Install Intel GPU driver.
223+
224+
Please install Intel GPU driver by official guide: [Install GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
225+
226+
2. Install Intel® oneAPI Base toolkit.
227+
228+
a. Please follow the procedure in [Get the Intel® oneAPI Base Toolkit ](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html).
229+
230+
Recommend to install to default folder: **/opt/intel/oneapi**.
231+
232+
Following guide uses the default folder as example. If you use other folder, please modify the following guide info with your folder.
233+
234+
b. Enable oneAPI running environment:
235+
236+
- In Search, input 'oneAPI'.
237+
238+
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
239+
240+
- In Run:
241+
242+
In CMD:
243+
```
244+
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
245+
```
246+
247+
c. Check GPU
248+
249+
In oneAPI command line:
250+
251+
```
252+
sycl-ls
253+
```
254+
255+
There should be one or more level-zero devices. Like **[ext_oneapi_level_zero:gpu:0]**.
256+
257+
Output (example):
258+
```
259+
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
260+
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
261+
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [31.0.101.5186]
262+
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
263+
264+
```
265+
266+
3. Install cmake & make
267+
268+
a. Download & install cmake for windows: https://cmake.org/download/
269+
270+
b. Download & install make for windows provided by mingw-w64: https://www.mingw-w64.org/downloads/
271+
272+
273+
### Build locally:
274+
275+
In oneAPI command line window:
276+
277+
```
278+
mkdir -p build
279+
cd build
280+
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
281+
282+
:: for FP16
283+
:: faster for long-prompt inference
284+
:: cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON
285+
286+
:: for FP32
287+
cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release
288+
289+
290+
:: build example/main only
291+
:: make main
292+
293+
:: build all binary
294+
make -j
295+
cd ..
296+
```
297+
298+
or
299+
300+
```
301+
.\examples\sycl\win-build-sycl.bat
302+
```
303+
304+
Note:
305+
306+
- By default, it will build for all binary files. It will take more time. To reduce the time, we recommend to build for **example/main** only.
307+
308+
### Run
309+
310+
1. Put model file to folder **models**
311+
312+
2. Enable oneAPI running environment
313+
314+
- In Search, input 'oneAPI'.
315+
316+
Search & open "Intel oneAPI command prompt for Intel 64 for Visual Studio 2022"
317+
318+
- In Run:
319+
320+
In CMD:
321+
```
322+
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64
323+
```
324+
325+
3. List device ID
326+
327+
Run without parameter:
328+
329+
```
330+
build\bin\ls-sycl-device.exe
331+
332+
or
333+
334+
build\bin\main.exe
335+
```
336+
337+
Check the ID in startup log, like:
338+
339+
```
340+
found 4 SYCL devices:
341+
Device 0: Intel(R) Arc(TM) A770 Graphics, compute capability 1.3,
342+
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
343+
Device 1: Intel(R) FPGA Emulation Device, compute capability 1.2,
344+
max compute_units 24, max work group size 67108864, max sub group size 64, global mem size 67065057280
345+
Device 2: 13th Gen Intel(R) Core(TM) i7-13700K, compute capability 3.0,
346+
max compute_units 24, max work group size 8192, max sub group size 64, global mem size 67065057280
347+
Device 3: Intel(R) Arc(TM) A770 Graphics, compute capability 3.0,
348+
max compute_units 512, max work group size 1024, max sub group size 32, global mem size 16225243136
349+
350+
```
351+
352+
|Attribute|Note|
353+
|-|-|
354+
|compute capability 1.3|Level-zero running time, recommended |
355+
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|
356+
357+
4. Set device ID and execute llama.cpp
358+
359+
Set device ID = 0 by **set GGML_SYCL_DEVICE=0**
360+
361+
```
362+
set GGML_SYCL_DEVICE=0
363+
build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0
364+
```
365+
or run by script:
366+
367+
```
368+
.\examples\sycl\win-run-llama2.bat
369+
```
370+
371+
Note:
372+
373+
- By default, mmap is used to read model file. In some cases, it leads to the hang issue. Recommend to use parameter **--no-mmap** to disable mmap() to skip this issue.
374+
375+
376+
5. Check the device ID in output
377+
378+
Like:
379+
```
380+
Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
381+
```
213382

214383
## Environment Variable
215384

@@ -220,7 +389,7 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
220389
|LLAMA_SYCL|ON (mandatory)|Enable build with SYCL code path. <br>For FP32/FP16, LLAMA_SYCL=ON is mandatory.|
221390
|LLAMA_SYCL_F16|ON (optional)|Enable FP16 build with SYCL code path. Faster for long-prompt inference. <br>For FP32, not set it.|
222391
|CMAKE_C_COMPILER|icx|Use icx compiler for SYCL code path|
223-
|CMAKE_CXX_COMPILER|icpx|use icpx for SYCL code path|
392+
|CMAKE_CXX_COMPILER|icpx (Linux), icx (Windows)|use icpx/icx for SYCL code path|
224393

225394
#### Running
226395

@@ -232,18 +401,23 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
232401

233402
## Known Issue
234403

404+
- Hang during startup
405+
406+
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
407+
408+
Solution: add **--no-mmap**.
409+
410+
## Q&A
411+
235412
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
236413

237414
Miss to enable oneAPI running environment.
238415

239416
Install oneAPI base toolkit and enable it by: `source /opt/intel/oneapi/setvars.sh`.
240417

418+
- In Windows, no result, not error.
241419

242-
- Hang during startup
243-
244-
llama.cpp use mmap as default way to read model file and copy to GPU. In some system, memcpy will be abnormal and block.
245-
246-
Solution: add **--no-mmap**.
420+
Miss to enable oneAPI running environment.
247421

248422
## Todo
249423

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
1010

1111
### Hot topics
1212

13+
- ⚠️ Incoming backends: https://github.com/ggerganov/llama.cpp/discussions/5138
14+
- [SYCL backend](README-sycl.md) is ready (1/28/2024), support Linux/Windows in Intel GPUs (iGPU, Arc/Flex/Max series)
1315
- New SOTA quantized models, including pure 2-bits: https://huggingface.co/ikawrakow
1416
- Collecting Apple Silicon performance stats:
1517
- M-series: https://github.com/ggerganov/llama.cpp/discussions/4167
@@ -604,7 +606,7 @@ Building the program with BLAS support may lead to some performance improvements
604606

605607
llama.cpp based on SYCL is used to support Intel GPU (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).
606608

607-
For detailed info, please refer to [llama.cpp for SYCL](README_sycl.md).
609+
For detailed info, please refer to [llama.cpp for SYCL](README-sycl.md).
608610

609611

610612
### Prepare Data & Run

examples/sycl/win-build-sycl.bat

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
2+
:: MIT license
3+
:: Copyright (C) 2024 Intel Corporation
4+
:: SPDX-License-Identifier: MIT
5+
6+
mkdir -p build
7+
cd build
8+
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
9+
10+
:: for FP16
11+
:: faster for long-prompt inference
12+
:: cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release -DLLAMA_SYCL_F16=ON
13+
14+
:: for FP32
15+
cmake -G "MinGW Makefiles" .. -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx -DCMAKE_BUILD_TYPE=Release
16+
17+
18+
:: build example/main only
19+
:: make main
20+
21+
:: build all binary
22+
make -j
23+
cd ..

examples/sycl/win-run-llama2.bat

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
:: MIT license
2+
:: Copyright (C) 2024 Intel Corporation
3+
:: SPDX-License-Identifier: MIT
4+
5+
INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
6+
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
7+
8+
9+
set GGML_SYCL_DEVICE=0
10+
rem set GGML_SYCL_DEBUG=1
11+
.\build\bin\main.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 33 -s 0
12+
13+

0 commit comments

Comments
 (0)