How to utilize GPU on Android to accelerate inference? #8705

ElaineWu66 · 2024-07-26T07:30:12Z

Discussed in #8704

^{Originally posted by ElaineWu66 July 26, 2024}
I am trying to compile and run llama.cpp demo on my android device (QUALCOMM Adreno) with linux and termux.
Any suggestion on how to utilize the GPU?
I have followed tutorial https://github.com/JackZeng0208/llama.cpp-android-tutorial, since the OpenCL is broken and removed now, it's not working.

Thanks!!!

ngxson · 2024-07-26T09:41:57Z

The android docs can be found here (please let me know if it's still up-to-date): https://github.com/ggerganov/llama.cpp/blob/master/docs/android.md

A while ago I saw someone build android+vulkan that kinda work but buggy. I couldn't test it myself, but you can give a try: https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+android+vulkan+

ElaineWu66 · 2024-07-26T09:58:39Z

The android docs can be found here (please let me know if it's still up-to-date): https://github.com/ggerganov/llama.cpp/blob/master/docs/android.md

A while ago I saw someone build android+vulkan that kinda work but buggy. I couldn't test it myself, but you can give a try: https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+android+vulkan+

I was able to compile and run with Android CPU following the android docs instruction. I just wanna know how I can utilize the GPU.

I've seen the posts saying they build android+vulkan and it was buggy, but there is no detailed instruction.. I'm very new to vulkan, is there any step-by-step tutorial that I can follow? Really appreciate it, thanks a ton!!!

ngxson · 2024-07-26T10:16:16Z

Unfortunately, I'm not working on vulkan or android so I can't help much. It would be nice if someone can share how to do that. Probably you can follow this thread for some clues: #5186

FranzKafkaYu · 2024-07-29T09:39:49Z

I am also trying to accelerate inference by enable GPU&Vulkan backend in Android，but I encountered a build error like this:

franzkafka95@franzkafka95:~/Desktop/llama/llama.cpp/build-android$ make -j8
[  1%] Built target build_info
[  1%] Built target sha256
[  2%] Built target xxhash
[  3%] Built target sha1
[  4%] Built target vulkan-shaders-gen
[  5%] Generate vulkan shaders
/bin/sh: 1: vulkan-shaders-gen: not found
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:123: ggml/src/ggml-vulkan-shaders.hpp] Error 127
make[1]: *** [CMakeFiles/Makefile2:1617: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

build configuration:

cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod  -DGGML_VULKAN=1 ..

I am using Ubuntu 22.0,Cmake Version 3.22.0,Android NDK r25,I have installed Vulkan SDK,like these libraries:

libvulkan-dev/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed]
libvulkan1/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
lunarg-vulkan-layers/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
mesa-vulkan-drivers/jammy-updates,now 23.2.1-1ubuntu3.1~22.04.2 amd64 [installed,automatic]
vulkan-extensionlayer/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-headers/jammy,jammy,now 1.3.290.0~rc1-1lunarg22.04-1 all [installed,automatic]
vulkan-profiles/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-sdk/jammy,jammy,now 1.3.290.0~rc1-1lunarg22.04-1 all [installed]
vulkan-tools/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-utility-libraries-dev/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-utility-libraries/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-validationlayers/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
vulkancapsviewer/jammy,now 3.41~rc1-1lunarg22.04-1 amd64 [installed,automatic]

anyone can help me fix this compile error?

FranzKafkaYu · 2024-07-29T10:08:25Z

Guess this problem is because the compiled vulkan-shaders-gen binary is for the ARM architecture, and when it's called in CMake or in a script to generate *.hpp files, it cannot be executed.

I have tried move the x86_64 arch vulkan-shaders-gen binary to the build-android/bin/ dir while still got this error.

FranzKafkaYu · 2024-07-29T10:46:20Z

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

ElaineWu66 · 2024-07-29T13:23:13Z

I have encountered this issue when compiling with ' make GGML_VULKAN=1 ' through termux on my Android device.
It seems like that I do not have the root permission to create the output directory

I guess the only way out is to cross compile with my laptop? Any suggestion on how to do that.... (I'm quite new to Android and I really need some followable tutorials to deal with the parameters in compilation)

Or any suggestion on how I can compile with termux on the Android device?

Big thanksss!!!

FranzKafkaYu · 2024-07-30T00:46:56Z

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

Well, of course it can work; it is just a matter of development. However, in its current state, you have to manually disable feature checks and contend with 1 GB of VRAM, which either means a model as smart as a parakeet or splitting layers between GPU and CPU, which will probably make inference slower than pure CPU.

Thank you very much. I am new to AI and appreciate the other frameworks you recommended. I will try to understand and learn more about them. As for this issue, I still hope that a developer can investigate this problem

jsamol · 2024-08-06T13:56:19Z

I found myself looking for an answer to the same question and struggling with the same issues. Eventually, after gathering solutions from many other issues, I prepared this Dockerfile that helps me cross-compile the library with Vulkan support, maybe someone will find it useful:

FROM ubuntu:24.04

### Prepare environment ###

# Install essential tools
RUN apt-get update -qqy && \
    apt-get install -qqy build-essential cmake make default-jre wget unzip git && \
    apt-get clean && \
    apt-get autoremove -y

# Set env vars
ENV ANDROID_TARGET_PLATFORM=android-33
ENV NDK_VERSION=27.0.12077973
ENV VULKAN_VERSION=1.3.292

ENV ANDROID_HOME=/usr/lib/android-sdk
ENV ANDROID_NDK_HOME=${ANDROID_HOME}/ndk/${NDK_VERSION}

### Install Android NDK ###

# Download command line tools
RUN wget -q https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O android-commandlinetools.zip && \
    unzip -q android-commandlinetools.zip -d ${ANDROID_HOME} && \
    mv ${ANDROID_HOME}/cmdline-tools ${ANDROID_HOME}/latest && \
    mkdir ${ANDROID_HOME}/cmdline-tools && \
    mv ${ANDROID_HOME}/latest ${ANDROID_HOME}/cmdline-tools && \
    rm android-commandlinetools.zip

ENV PATH=${PATH}:${ANDROID_HOME}/cmdline-tools/latest/bin

# Accept licenses
RUN yes | sdkmanager --licenses
# Download NDK
RUN sdkmanager "ndk;${NDK_VERSION}"

### Install Vulkan SDK ###

RUN wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | tee /etc/apt/trusted.gpg.d/lunarg.asc
RUN wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list http://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list

RUN apt-get update -qqy && \
    apt-get install -qqy vulkan-sdk

# Replace outdated Vulkan headers in Android NDK
RUN wget -q https://github.com/KhronosGroup/Vulkan-Headers/archive/refs/tags/v${VULKAN_VERSION}.zip -O vulkan-headers.zip && \
    unzip -q vulkan-headers.zip -d . && \
    rm -r ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/vk_video && \
    rm -r ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/vulkan && \
    mv ./Vulkan-Headers-${VULKAN_VERSION}/include/* ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include && \
    rm -r ./Vulkan-Headers-${VULKAN_VERSION} && \
    rm vulkan-headers.zip

### Build ###

RUN mkdir proj
COPY . proj
WORKDIR /proj

# Compile for host
RUN cmake -B build -DGGML_VULKAN=1
RUN cmake --build build --config Release

# Use host vulkan-shaders-gen
RUN mkdir bin
RUN mv build/bin/vulkan-shaders-gen bin/
RUN rm -rf build
ENV PATH=${PATH}:/proj/bin

# Compile for target (arm64-v8a)
RUN cmake -B build-android/arm64-v8a -DGGML_VULKAN=1 \
    -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=${ANDROID_TARGET_PLATFORM} \
    -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod
RUN cmake --build build-android/arm64-v8a --config Release

Unfortunately, when I then copy the binaries to my device and try to follow the next steps described in the guide, I run into CANNOT LINK EXECUTABLE "./llama-cli": library "libllama.so" not found. The library seems to be working, however, when I copy the output shared and static libraries instead and use them inside the llama.android example.

FranzKafkaYu · 2024-08-07T14:52:39Z

May I ask you one question why we need Replace outdated Vulkan headers in Android NDK?

jsamol · 2024-08-08T10:38:35Z

It appears that the ndk version I've been using ships with an incomplete set of headers, missing some that llama.cpp depends on. Without updating them in the ndk, the compilation fails with a missing headers error.

dermotfix · 2024-09-17T02:25:29Z

I have encountered this issue when compiling with ' make GGML_VULKAN=1 ' through termux on my Android device. It seems like that I do not have the root permission to create the output directory

I guess the only way out is to cross compile with my laptop? Any suggestion on how to do that.... (I'm quite new to Android and I really need some followable tutorials to deal with the parameters in compilation)

Or any suggestion on how I can compile with termux on the Android device?

Big thanksss!!!

replace "/tmp" in vulkan-shaders-gen.cpp with just "tmp"

awgr · 2024-10-25T03:00:46Z

@jsamol
As one small note, I was under the impression that ggml-vulkan is implemented against vulkan 1.2, so using vulkan 1.3 with ndk 27 seems like it might risk some compatibility issues to me, though ggml-vulkan does make references to both 1.2 and 1.3 versions of the API.

https://github.com/ggerganov/llama.cpp/blob/master/ggml/src/ggml-vulkan.cpp#L31

Also, Android NDK doesn't ship vulkan.hpp because they don't want to support them; there's a sort of anecdote in the other discussion about vulkan.hpp from the right header version probably working though. So, this probably works.
android/ndk#1767

@0cc4m If you're around, it seems like a lot of people want to build llama.cpp for adreno using vulkan backend for gpu support, but they're struggling to build it; Do you think there's a way I can make it easier for people to build it, or similar? would it be weird to contrib an include dir with the vulkan.hpp headers at the correct version? ty

awgr · 2024-10-25T03:03:06Z

Really, I got confused there by the Occ4m vs. @0cc4m.

0cc4m · 2024-10-25T10:14:17Z

@jsamol As one small note, I was under the impression that ggml-vulkan is implemented against vulkan 1.2, so using vulkan 1.3 with ndk 27 seems like it might risk some compatibility issues to me, though ggml-vulkan does make references to both 1.2 and 1.3 versions of the API.

https://github.com/ggerganov/llama.cpp/blob/master/ggml/src/ggml-vulkan.cpp#L31

I've tried to stick with Vulkan 1.2, but it's not always obvious which version is required by a feature. That's the goal, however. But, Vulkan is fully backwards compatible, so if you have a Vulkan 1.3 runtime, it will run Vulkan 1.2 code just fine.

Also, Android NDK doesn't ship vulkan.hpp because they don't want to support them; there's a sort of anecdote in the other discussion about vulkan.hpp from the right header version probably working though. So, this probably works. android/ndk#1767

@0cc4m If you're around, it seems like a lot of people want to build llama.cpp for adreno using vulkan backend for gpu support, but they're struggling to build it; Do you think there's a way I can make it easier for people to build it, or similar? would it be weird to contrib an include dir with the vulkan.hpp headers at the correct version? ty

We shouldn't include external dependencies in this repo if not absolutely necessary. What you could do is figure out a reliable way to install the header file on Android and get the project to build, then document the steps in the BUILD.md file. If you run into issues, those can be handled separately.

Sadly, in my experience Qualcomm is not handling Vulkan in a proper way, like AMD, Nvidia and Intel are, so there are a lot of quirks to deal with. It would be cool to support mobile devices, but even if we get it to build properly, a lot of work would have to go into optimizing the matrix multiplication shader for Adreno and other phone GPUs. I don't have the time or motivation for that, but if someone else wants to try it, I'll do my best to help.

Nottlespike · 2024-11-15T18:47:09Z

Ok so I may have found an oddity to improve Android/ARM/Qualcomm Snapdragon 8 Gen 2 preformance for Q4_0_4_4.gguf quants. I simply built llama.cpp build ae8de6d5 following the current Android docs. I did the basic cmake CPU build

cmake -B build cmake --build build --config Release

I did not add -DGGML_LLAMAFILE=OFF even though it's suggested for Q4_0_4_4 quants.

Based on timestamp as I couldn't find do_sample=false it is about ~2x faster with -ngl 99 vs -ngl 0

Video attaached.

Screen_Recording_20241115_103340_Termux.mp4

slaren · 2024-11-16T01:03:35Z

I am not sure if I understand correctly, but -ngl should have no effect on CPU-only builds.

github-actions · 2024-12-31T01:07:27Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Sep 8, 2024

github-actions bot removed the stale label Sep 18, 2024

ngxson mentioned this issue Sep 28, 2024

Update building for Android #9672

Merged

4 tasks

github-actions bot added the stale label Oct 18, 2024

github-actions bot removed the stale label Oct 26, 2024

github-actions bot added the stale label Dec 16, 2024

github-actions bot closed this as completed Dec 31, 2024

How to utilize GPU on Android to accelerate inference? #8705

How to utilize GPU on Android to accelerate inference? #8705

Comments

ElaineWu66 commented Jul 26, 2024

Discussed in #8704

ngxson commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ElaineWu66 commented Jul 26, 2024

Uh oh!

ngxson commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FranzKafkaYu commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FranzKafkaYu commented Jul 29, 2024

Uh oh!

FranzKafkaYu commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ElaineWu66 commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FranzKafkaYu commented Jul 30, 2024

Uh oh!

jsamol commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FranzKafkaYu commented Aug 7, 2024

Uh oh!

jsamol commented Aug 8, 2024

Uh oh!

dermotfix commented Sep 17, 2024

Uh oh!

awgr commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awgr commented Oct 25, 2024

Uh oh!

0cc4m commented Oct 25, 2024

Uh oh!

Nottlespike commented Nov 15, 2024

Uh oh!

slaren commented Nov 16, 2024

Uh oh!

github-actions bot commented Dec 31, 2024

Uh oh!

ngxson commented Jul 26, 2024 •

edited

Loading

ngxson commented Jul 26, 2024 •

edited

Loading

FranzKafkaYu commented Jul 29, 2024 •

edited

Loading

FranzKafkaYu commented Jul 29, 2024 •

edited

Loading

ElaineWu66 commented Jul 29, 2024 •

edited

Loading

jsamol commented Aug 6, 2024 •

edited

Loading

awgr commented Oct 25, 2024 •

edited

Loading