Skip to content

Commit b94a9b0

Browse files
committed
Merge branch 'master' into Nexes_CQ_10
2 parents 18677c8 + 6374743 commit b94a9b0

17 files changed

+965
-475
lines changed

CONTRIBUTING.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,23 @@
11
# Pull requests (for contributors)
22

33
- Test your changes:
4-
- Using the commands in the [`tests`](tests) folder. For instance, running the `./tests/test-backend-ops` command tests different backend implementations of the GGML library
4+
- Using the commands in the [`tests`](tests) folder. For instance, running the `./tests/test-backend-ops` command tests different backend implementations of the `ggml` library
55
- Execute [the full CI locally on your machine](ci/README.md) before publishing
6-
- Please rate the complexity of your PR (i.e. `Review Complexity : Low`, `Review Complexity : Medium`, `Review Complexity : High`). This makes it easier for maintainers to triage the PRs.
7-
- The PR template has a series of review complexity checkboxes `[ ]` that [you can mark as](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/about-task-lists) `[X]` for your convenience
8-
- Consider allowing write access to your branch for faster review
6+
- Optionally rate the complexity of your PR (i.e. `Review Complexity : Low`, `Review Complexity : Medium`, `Review Complexity : High`). This makes it easier for maintainers to triage the PRs
7+
- Consider allowing write access to your branch for faster reviews, as reviewers can push commits directly
98
- If your PR becomes stale, don't hesitate to ping the maintainers in the comments
109

1110
# Pull requests (for collaborators)
1211

1312
- Squash-merge PRs
1413
- Use the following format for the squashed commit title: `<module> : <commit title> (#<issue_number>)`. For example: `utils : fix typo in utils.py (#1234)`
15-
- Optionally, pick a `<module>` from here: https://github.com/ggerganov/llama.cpp/wiki/Modules
14+
- Optionally pick a `<module>` from here: https://github.com/ggerganov/llama.cpp/wiki/Modules
1615

1716
# Coding guidelines
1817

1918
- Avoid adding third-party dependencies, extra files, extra headers, etc.
2019
- Always consider cross-compatibility with other operating systems and architectures
21-
- Avoid fancy looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
20+
- Avoid fancy-looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
2221
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
2322
- Clean-up any trailing whitespaces, use 4 spaces for indentation, brackets on the same line, `void * ptr`, `int & a`
2423
- Naming usually optimizes for common prefix (see https://github.com/ggerganov/ggml/pull/302#discussion_r1243240963)

docs/android.md

Lines changed: 55 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,82 @@
22
# Android
33

44
## Build on Android using Termux
5-
[Termux](https://github.com/termux/termux-app#installation) is a method to execute `llama.cpp` on an Android device (no root required).
5+
6+
[Termux](https://termux.dev/en/) is an Android terminal emulator and Linux environment app (no root required). As of writing, Termux is available experimentally in the Google Play Store; otherwise, it may be obtained directly from the project repo or on F-Droid.
7+
8+
With Termux, you can install and run `llama.cpp` as if the environment were Linux. Once in the Termux shell:
9+
10+
```
11+
$ apt update && apt upgrade -y
12+
$ apt install git cmake
13+
```
14+
15+
Then, follow the [build instructions](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md), specifically for CMake.
16+
17+
Once the binaries are built, download your model of choice (e.g., from Hugging Face). It's recommended to place it in the `~/` directory for best performance:
18+
619
```
7-
apt update && apt upgrade -y
8-
apt install git make cmake
20+
$ curl -L {model-url} -o ~/{model}.gguf
921
```
1022

11-
It's recommended to move your model inside the `~/` directory for best performance:
23+
Then, if you are not already in the repo directory, `cd` into `llama.cpp` and:
24+
1225
```
13-
cd storage/downloads
14-
mv model.gguf ~/
26+
$ ./build/bin/llama-simple -m ~/{model}.gguf -c {context-size} -p "{your-prompt}"
1527
```
1628

17-
[Get the code](https://github.com/ggerganov/llama.cpp#get-the-code) & [follow the Linux build instructions](https://github.com/ggerganov/llama.cpp#build) to build `llama.cpp`.
29+
Here, we show `llama-simple`, but any of the executables under `examples` should work, in theory. Be sure to set `context-size` to a reasonable number (say, 4096) to start with; otherwise, memory could spike and kill your terminal.
30+
31+
To see what it might look like visually, here's an old demo of an interactive session running on a Pixel 5 phone:
32+
33+
https://user-images.githubusercontent.com/271616/225014776-1d567049-ad71-4ef2-b050-55b0b3b9274c.mp4
34+
35+
## Cross-compile using Android NDK
36+
It's possible to build `llama.cpp` for Android on your host system via CMake and the Android NDK. If you are interested in this path, ensure you already have an environment prepared to cross-compile programs for Android (i.e., install the Android SDK). Note that, unlike desktop environments, the Android environment ships with a limited set of native libraries, and so only those libraries are available to CMake when building with the Android NDK (see: https://developer.android.com/ndk/guides/stable_apis.)
1837

19-
## Building the Project using Android NDK
20-
Obtain the [Android NDK](https://developer.android.com/ndk) and then build with CMake.
38+
Once you're ready and have cloned `llama.cpp`, invoke the following in the project directory:
2139

22-
Execute the following commands on your computer to avoid downloading the NDK to your mobile. Alternatively, you can also do this in Termux:
2340
```
24-
$ mkdir build-android
25-
$ cd build-android
26-
$ export NDK=<your_ndk_directory>
27-
$ cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
28-
$ make
41+
$ cmake \
42+
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
43+
-DANDROID_ABI=arm64-v8a \
44+
-DANDROID_PLATFORM=android-28 \
45+
-DCMAKE_C_FLAGS="-march=armv8.7a" \
46+
-DCMAKE_CXX_FLAGS="-march=armv8.7a" \
47+
-DGGML_OPENMP=OFF \
48+
-DGGML_LLAMAFILE=OFF \
49+
-B build-android
2950
```
3051

31-
Install [termux](https://github.com/termux/termux-app#installation) on your device and run `termux-setup-storage` to get access to your SD card (if Android 11+ then run the command twice).
52+
Notes:
53+
- While later versions of Android NDK ship with OpenMP, it must still be installed by CMake as a dependency, which is not supported at this time
54+
- `llamafile` does not appear to support Android devices (see: https://github.com/Mozilla-Ocho/llamafile/issues/325)
55+
56+
The above command should configure `llama.cpp` with the most performant options for modern devices. Even if your device is not running `armv8.7a`, `llama.cpp` includes runtime checks for available CPU features it can use.
3257

33-
Finally, copy these built `llama` binaries and the model file to your device storage. Because the file permissions in the Android sdcard cannot be changed, you can copy the executable files to the `/data/data/com.termux/files/home/bin` path, and then execute the following commands in Termux to add executable permission:
58+
Feel free to adjust the Android ABI for your target. Once the project is configured:
3459

35-
(Assumed that you have pushed the built executable files to the /sdcard/llama.cpp/bin path using `adb push`)
3660
```
37-
$cp -r /sdcard/llama.cpp/bin /data/data/com.termux/files/home/
38-
$cd /data/data/com.termux/files/home/bin
39-
$chmod +x ./*
61+
$ cmake --build build-android --config Release -j{n}
62+
$ cmake --install build-android --prefix {install-dir} --config Release
4063
```
4164

42-
Download model [llama-2-7b-chat.Q4_K_M.gguf](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q4_K_M.gguf), and push it to `/sdcard/llama.cpp/`, then move it to `/data/data/com.termux/files/home/model/`
65+
After installing, go ahead and download the model of your choice to your host system. Then:
4366

4467
```
45-
$mv /sdcard/llama.cpp/llama-2-7b-chat.Q4_K_M.gguf /data/data/com.termux/files/home/model/
68+
$ adb shell "mkdir /data/local/tmp/llama.cpp"
69+
$ adb push {install-dir} /data/local/tmp/llama.cpp/
70+
$ adb push {model}.gguf /data/local/tmp/llama.cpp/
71+
$ adb shell
4672
```
4773

48-
Now, you can start chatting:
74+
In the `adb shell`:
75+
4976
```
50-
$cd /data/data/com.termux/files/home/bin
51-
$./llama-cli -m ../model/llama-2-7b-chat.Q4_K_M.gguf -n 128 -cml
77+
$ cd /data/local/tmp/llama.cpp
78+
$ LD_LIBRARY_PATH=lib ./bin/llama-simple -m {model}.gguf -c {context-size} -p "{your-prompt}"
5279
```
5380

54-
Here's a demo of an interactive session running on Pixel 5 phone:
81+
That's it!
5582

56-
https://user-images.githubusercontent.com/271616/225014776-1d567049-ad71-4ef2-b050-55b0b3b9274c.mp4
83+
Be aware that Android will not find the library path `lib` on its own, so we must specify `LD_LIBRARY_PATH` in order to run the installed executables. Android does support `RPATH` in later API levels, so this could change in the future. Refer to the previous section for information about `context-size` (very important!) and running other `examples`.

examples/main/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ In this section, we cover the most commonly used options for running the `llama-
6969
- `-c N, --ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference.
7070
- `-mli, --multiline-input`: Allows you to write or paste multiple lines without ending each in '\'
7171
- `-t N, --threads N`: Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has.
72-
- - `-ngl N, --n-gpu-layers N`: When compiled with GPU support, this option allows offloading some layers to the GPU for computation. Generally results in increased performance.
72+
- `-ngl N, --n-gpu-layers N`: When compiled with GPU support, this option allows offloading some layers to the GPU for computation. Generally results in increased performance.
7373

7474
## Input Prompts
7575

flake.lock

Lines changed: 10 additions & 10 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

ggml/include/ggml-backend.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,8 @@ extern "C" {
127127
bool async;
128128
// pinned host buffer
129129
bool host_buffer;
130+
// creating buffers from host ptr
131+
bool buffer_from_host_ptr;
130132
// event synchronization
131133
bool events;
132134
};
@@ -168,6 +170,7 @@ extern "C" {
168170

169171
// Functions that may be obtained using ggml_backend_reg_get_proc_address
170172
typedef ggml_backend_buffer_type_t (*ggml_backend_split_buffer_type_t)(const float *);
173+
typedef void (*ggml_backend_set_n_threads_t)(ggml_backend_t, int);
171174

172175
//
173176
// Backend registry

ggml/include/ggml-blas.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ GGML_API bool ggml_backend_is_blas(ggml_backend_t backend);
1717
// for openblas and blis, this will also set the number of threads used for blas operations
1818
GGML_API void ggml_backend_blas_set_n_threads(ggml_backend_t backend_blas, int n_threads);
1919

20+
GGML_API ggml_backend_reg_t ggml_backend_blas_reg(void);
21+
2022

2123
#ifdef __cplusplus
2224
}

ggml/include/ggml-metal.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,9 @@ GGML_API ggml_backend_t ggml_backend_metal_init(void);
4343

4444
GGML_API bool ggml_backend_is_metal(ggml_backend_t backend);
4545

46-
GGML_API ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t size, size_t max_size);
46+
GGML_DEPRECATED(
47+
GGML_API ggml_backend_buffer_t ggml_backend_metal_buffer_from_ptr(void * data, size_t size, size_t max_size),
48+
"obsoleted by the new device interface - https://github.com/ggerganov/llama.cpp/pull/9713");
4749

4850
GGML_API void ggml_backend_metal_set_abort_callback(ggml_backend_t backend, ggml_abort_callback abort_callback, void * user_data);
4951

@@ -57,6 +59,8 @@ GGML_API bool ggml_backend_metal_supports_family(ggml_backend_t backend, int fam
5759
// capture all command buffers committed the next time `ggml_backend_graph_compute` is called
5860
GGML_API void ggml_backend_metal_capture_next_compute(ggml_backend_t backend);
5961

62+
GGML_API ggml_backend_reg_t ggml_backend_metal_reg(void);
63+
6064
#ifdef __cplusplus
6165
}
6266
#endif

ggml/src/CMakeLists.txt

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -190,22 +190,24 @@ if (GGML_BLAS)
190190
# see https://gitlab.kitware.com/cmake/cmake/-/issues/20268
191191
find_package(PkgConfig REQUIRED)
192192
if (${GGML_BLAS_VENDOR} MATCHES "Generic")
193-
pkg_check_modules(DepBLAS REQUIRED blas)
193+
pkg_check_modules(DepBLAS blas)
194194
elseif (${GGML_BLAS_VENDOR} MATCHES "OpenBLAS")
195195
# As of openblas v0.3.22, the 64-bit is named openblas64.pc
196196
pkg_check_modules(DepBLAS openblas64)
197197
if (NOT DepBLAS_FOUND)
198-
pkg_check_modules(DepBLAS REQUIRED openblas)
198+
pkg_check_modules(DepBLAS openblas)
199199
endif()
200200
elseif (${GGML_BLAS_VENDOR} MATCHES "FLAME")
201-
pkg_check_modules(DepBLAS REQUIRED blis)
201+
add_compile_definitions(GGML_BLAS_USE_BLIS)
202+
pkg_check_modules(DepBLAS blis)
202203
elseif (${GGML_BLAS_VENDOR} MATCHES "ATLAS")
203-
pkg_check_modules(DepBLAS REQUIRED blas-atlas)
204+
pkg_check_modules(DepBLAS blas-atlas)
204205
elseif (${GGML_BLAS_VENDOR} MATCHES "FlexiBLAS")
205-
pkg_check_modules(DepBLAS REQUIRED flexiblas_api)
206+
pkg_check_modules(DepBLAS flexiblas_api)
206207
elseif (${GGML_BLAS_VENDOR} MATCHES "Intel")
208+
add_compile_definitions(GGML_BLAS_USE_MKL)
207209
# all Intel* libraries share the same include path
208-
pkg_check_modules(DepBLAS REQUIRED mkl-sdl)
210+
pkg_check_modules(DepBLAS mkl-sdl)
209211
elseif (${GGML_BLAS_VENDOR} MATCHES "NVHPC")
210212
# this doesn't provide pkg-config
211213
# suggest to assign BLAS_INCLUDE_DIRS on your own
@@ -1361,6 +1363,10 @@ if (MATH_LIBRARY)
13611363
endif()
13621364
endif()
13631365

1366+
if (CMAKE_SYSTEM_NAME MATCHES "Android")
1367+
list(APPEND GGML_EXTRA_LIBS_PRIVATE dl) # Must be linked explicitly
1368+
endif()
1369+
13641370
list(REMOVE_DUPLICATES GGML_EXTRA_LIBS_PRIVATE)
13651371
list(REMOVE_DUPLICATES GGML_EXTRA_LIBS_PUBLIC)
13661372
target_link_libraries(ggml PRIVATE ${GGML_EXTRA_LIBS_PRIVATE} PUBLIC ${GGML_EXTRA_LIBS_PUBLIC})

ggml/src/ggml-backend-impl.h

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ extern "C" {
8888

8989
void (*free)(ggml_backend_t backend);
9090

91+
// Will be moved to the device interface
9192
// buffer allocation
9293
ggml_backend_buffer_type_t (*get_default_buffer_type)(ggml_backend_t backend);
9394

@@ -112,17 +113,9 @@ extern "C" {
112113

113114
// IMPORTANT: these functions have been moved to the device interface and will be removed from the backend interface
114115
// new backends should implement the device interface instead
115-
116116
// These functions are being moved to the device interface
117-
// check if the backend can compute an operation
118117
bool (*supports_op) (ggml_backend_t backend, const struct ggml_tensor * op);
119-
120-
// check if the backend can use tensors allocated in a buffer type
121118
bool (*supports_buft)(ggml_backend_t backend, ggml_backend_buffer_type_t buft);
122-
123-
// check if the backend wants to run an operation, even if the weights are allocated in a CPU buffer
124-
// these should be expensive operations with large batch sizes that may benefit from running on this backend
125-
// even if the weight has to be copied from the CPU temporarily
126119
bool (*offload_op) (ggml_backend_t backend, const struct ggml_tensor * op);
127120

128121
// (optional) event synchronization
@@ -184,9 +177,8 @@ extern "C" {
184177
// check if the backend can use tensors allocated in a buffer type
185178
bool (*supports_buft)(ggml_backend_dev_t dev, ggml_backend_buffer_type_t buft);
186179

187-
// check if the backend wants to run an operation, even if the weights are allocated in a CPU buffer
188-
// these should be expensive operations with large batch sizes that may benefit from running on this backend
189-
// even if the weight has to be copied from the CPU temporarily
180+
// (optional) check if the backend wants to run an operation, even if the weights are allocated in an incompatible buffer
181+
// these should be expensive operations that may benefit from running on this backend instead of the CPU backend
190182
bool (*offload_op)(ggml_backend_dev_t dev, const struct ggml_tensor * op);
191183

192184
// (optional) event synchronization

0 commit comments

Comments
 (0)