Skip to content

How to utilize GPU on Android to accelerate inference? #8705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ElaineWu66 opened this issue Jul 26, 2024 Discussed in #8704 · 18 comments
Closed

How to utilize GPU on Android to accelerate inference? #8705

ElaineWu66 opened this issue Jul 26, 2024 Discussed in #8704 · 18 comments
Labels

Comments

@ElaineWu66
Copy link

Discussed in #8704

Originally posted by ElaineWu66 July 26, 2024
I am trying to compile and run llama.cpp demo on my android device (QUALCOMM Adreno) with linux and termux.
Any suggestion on how to utilize the GPU?
I have followed tutorial https://github.com/JackZeng0208/llama.cpp-android-tutorial, since the OpenCL is broken and removed now, it's not working.

Thanks!!!

@ngxson
Copy link
Collaborator

ngxson commented Jul 26, 2024

The android docs can be found here (please let me know if it's still up-to-date): https://github.com/ggerganov/llama.cpp/blob/master/docs/android.md

A while ago I saw someone build android+vulkan that kinda work but buggy. I couldn't test it myself, but you can give a try: https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+android+vulkan+

@ElaineWu66
Copy link
Author

The android docs can be found here (please let me know if it's still up-to-date): https://github.com/ggerganov/llama.cpp/blob/master/docs/android.md

A while ago I saw someone build android+vulkan that kinda work but buggy. I couldn't test it myself, but you can give a try: https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+android+vulkan+

I was able to compile and run with Android CPU following the android docs instruction. I just wanna know how I can utilize the GPU.

I've seen the posts saying they build android+vulkan and it was buggy, but there is no detailed instruction.. I'm very new to vulkan, is there any step-by-step tutorial that I can follow? Really appreciate it, thanks a ton!!!

@ngxson
Copy link
Collaborator

ngxson commented Jul 26, 2024

Unfortunately, I'm not working on vulkan or android so I can't help much. It would be nice if someone can share how to do that. Probably you can follow this thread for some clues: #5186

@FranzKafkaYu
Copy link

FranzKafkaYu commented Jul 29, 2024

I am also trying to accelerate inference by enable GPU&Vulkan backend in Android,but I encountered a build error like this:

franzkafka95@franzkafka95:~/Desktop/llama/llama.cpp/build-android$ make -j8
[  1%] Built target build_info
[  1%] Built target sha256
[  2%] Built target xxhash
[  3%] Built target sha1
[  4%] Built target vulkan-shaders-gen
[  5%] Generate vulkan shaders
/bin/sh: 1: vulkan-shaders-gen: not found
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:123: ggml/src/ggml-vulkan-shaders.hpp] Error 127
make[1]: *** [CMakeFiles/Makefile2:1617: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:146: all] Error 2  

build configuration:

cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=latest -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod  -DGGML_VULKAN=1 ..  

I am using Ubuntu 22.0,Cmake Version 3.22.0,Android NDK r25,I have installed Vulkan SDK,like these libraries:

libvulkan-dev/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed]
libvulkan1/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
lunarg-vulkan-layers/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
mesa-vulkan-drivers/jammy-updates,now 23.2.1-1ubuntu3.1~22.04.2 amd64 [installed,automatic]
vulkan-extensionlayer/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-headers/jammy,jammy,now 1.3.290.0~rc1-1lunarg22.04-1 all [installed,automatic]
vulkan-profiles/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-sdk/jammy,jammy,now 1.3.290.0~rc1-1lunarg22.04-1 all [installed]
vulkan-tools/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-utility-libraries-dev/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-utility-libraries/jammy,now 1.3.290.0~rc1-1lunarg22.04-1 amd64 [installed,automatic]
vulkan-validationlayers/jammy,now 1.3.290.0~rc2-1lunarg22.04-1 amd64 [installed,automatic]
vulkancapsviewer/jammy,now 3.41~rc1-1lunarg22.04-1 amd64 [installed,automatic]  

anyone can help me fix this compile error?

@FranzKafkaYu
Copy link

Guess this problem is because the compiled vulkan-shaders-gen binary is for the ARM architecture, and when it's called in CMake or in a script to generate *.hpp files, it cannot be executed.

I have tried move the x86_64 arch vulkan-shaders-gen binary to the build-android/bin/ dir while still got this error.

@FranzKafkaYu
Copy link

FranzKafkaYu commented Jul 29, 2024

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

@ElaineWu66
Copy link
Author

ElaineWu66 commented Jul 29, 2024

image

I have encountered this issue when compiling with ' make GGML_VULKAN=1 ' through termux on my Android device.
It seems like that I do not have the root permission to create the output directory

I guess the only way out is to cross compile with my laptop? Any suggestion on how to do that.... (I'm quite new to Android and I really need some followable tutorials to deal with the parameters in compilation)

Or any suggestion on how I can compile with termux on the Android device?

Big thanksss!!!

@FranzKafkaYu
Copy link

As things stand right now, Vulkan on Android for llama.cpp only works (and is fairly performant even with Llama 3 8B) on Exynos devices with RDNA GPUs. For the rest, your best bet is to use MLC-LLM/Mediapipe/Executorch based solutions.

Refer to issue ,Qualcomm Adreno GPU can work

Well, of course it can work; it is just a matter of development. However, in its current state, you have to manually disable feature checks and contend with 1 GB of VRAM, which either means a model as smart as a parakeet or splitting layers between GPU and CPU, which will probably make inference slower than pure CPU.

Thank you very much. I am new to AI and appreciate the other frameworks you recommended. I will try to understand and learn more about them. As for this issue, I still hope that a developer can investigate this problem

@jsamol
Copy link

jsamol commented Aug 6, 2024

I found myself looking for an answer to the same question and struggling with the same issues. Eventually, after gathering solutions from many other issues, I prepared this Dockerfile that helps me cross-compile the library with Vulkan support, maybe someone will find it useful:

FROM ubuntu:24.04

### Prepare environment ###

# Install essential tools
RUN apt-get update -qqy && \
    apt-get install -qqy build-essential cmake make default-jre wget unzip git && \
    apt-get clean && \
    apt-get autoremove -y

# Set env vars
ENV ANDROID_TARGET_PLATFORM=android-33
ENV NDK_VERSION=27.0.12077973
ENV VULKAN_VERSION=1.3.292

ENV ANDROID_HOME=/usr/lib/android-sdk
ENV ANDROID_NDK_HOME=${ANDROID_HOME}/ndk/${NDK_VERSION}

### Install Android NDK ###

# Download command line tools
RUN wget -q https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O android-commandlinetools.zip && \
    unzip -q android-commandlinetools.zip -d ${ANDROID_HOME} && \
    mv ${ANDROID_HOME}/cmdline-tools ${ANDROID_HOME}/latest && \
    mkdir ${ANDROID_HOME}/cmdline-tools && \
    mv ${ANDROID_HOME}/latest ${ANDROID_HOME}/cmdline-tools && \
    rm android-commandlinetools.zip

ENV PATH=${PATH}:${ANDROID_HOME}/cmdline-tools/latest/bin

# Accept licenses
RUN yes | sdkmanager --licenses
# Download NDK
RUN sdkmanager "ndk;${NDK_VERSION}"

### Install Vulkan SDK ###

RUN wget -qO- https://packages.lunarg.com/lunarg-signing-key-pub.asc | tee /etc/apt/trusted.gpg.d/lunarg.asc
RUN wget -qO /etc/apt/sources.list.d/lunarg-vulkan-noble.list http://packages.lunarg.com/vulkan/lunarg-vulkan-noble.list

RUN apt-get update -qqy && \
    apt-get install -qqy vulkan-sdk

# Replace outdated Vulkan headers in Android NDK
RUN wget -q https://github.com/KhronosGroup/Vulkan-Headers/archive/refs/tags/v${VULKAN_VERSION}.zip -O vulkan-headers.zip && \
    unzip -q vulkan-headers.zip -d . && \
    rm -r ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/vk_video && \
    rm -r ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/vulkan && \
    mv ./Vulkan-Headers-${VULKAN_VERSION}/include/* ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include && \
    rm -r ./Vulkan-Headers-${VULKAN_VERSION} && \
    rm vulkan-headers.zip

### Build ###

RUN mkdir proj
COPY . proj
WORKDIR /proj

# Compile for host
RUN cmake -B build -DGGML_VULKAN=1
RUN cmake --build build --config Release

# Use host vulkan-shaders-gen
RUN mkdir bin
RUN mv build/bin/vulkan-shaders-gen bin/
RUN rm -rf build
ENV PATH=${PATH}:/proj/bin

# Compile for target (arm64-v8a)
RUN cmake -B build-android/arm64-v8a -DGGML_VULKAN=1 \
    -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=${ANDROID_TARGET_PLATFORM} \
    -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod
RUN cmake --build build-android/arm64-v8a --config Release

Unfortunately, when I then copy the binaries to my device and try to follow the next steps described in the guide, I run into CANNOT LINK EXECUTABLE "./llama-cli": library "libllama.so" not found. The library seems to be working, however, when I copy the output shared and static libraries instead and use them inside the llama.android example.

@FranzKafkaYu
Copy link

May I ask you one question why we need Replace outdated Vulkan headers in Android NDK?

@jsamol
Copy link

jsamol commented Aug 8, 2024

It appears that the ndk version I've been using ships with an incomplete set of headers, missing some that llama.cpp depends on. Without updating them in the ndk, the compilation fails with a missing headers error.

@github-actions github-actions bot added the stale label Sep 8, 2024
@dermotfix
Copy link

image

I have encountered this issue when compiling with ' make GGML_VULKAN=1 ' through termux on my Android device. It seems like that I do not have the root permission to create the output directory

I guess the only way out is to cross compile with my laptop? Any suggestion on how to do that.... (I'm quite new to Android and I really need some followable tutorials to deal with the parameters in compilation)

Or any suggestion on how I can compile with termux on the Android device?

Big thanksss!!!

replace "/tmp" in vulkan-shaders-gen.cpp with just "tmp"

@awgr
Copy link

awgr commented Oct 25, 2024

@jsamol
As one small note, I was under the impression that ggml-vulkan is implemented against vulkan 1.2, so using vulkan 1.3 with ndk 27 seems like it might risk some compatibility issues to me, though ggml-vulkan does make references to both 1.2 and 1.3 versions of the API.

https://github.com/ggerganov/llama.cpp/blob/master/ggml/src/ggml-vulkan.cpp#L31

Also, Android NDK doesn't ship vulkan.hpp because they don't want to support them; there's a sort of anecdote in the other discussion about vulkan.hpp from the right header version probably working though. So, this probably works.
android/ndk#1767

@0cc4m If you're around, it seems like a lot of people want to build llama.cpp for adreno using vulkan backend for gpu support, but they're struggling to build it; Do you think there's a way I can make it easier for people to build it, or similar? would it be weird to contrib an include dir with the vulkan.hpp headers at the correct version? ty

@awgr
Copy link

awgr commented Oct 25, 2024

Really, I got confused there by the Occ4m vs. @0cc4m.

@0cc4m
Copy link
Collaborator

0cc4m commented Oct 25, 2024

@jsamol As one small note, I was under the impression that ggml-vulkan is implemented against vulkan 1.2, so using vulkan 1.3 with ndk 27 seems like it might risk some compatibility issues to me, though ggml-vulkan does make references to both 1.2 and 1.3 versions of the API.

https://github.com/ggerganov/llama.cpp/blob/master/ggml/src/ggml-vulkan.cpp#L31

I've tried to stick with Vulkan 1.2, but it's not always obvious which version is required by a feature. That's the goal, however. But, Vulkan is fully backwards compatible, so if you have a Vulkan 1.3 runtime, it will run Vulkan 1.2 code just fine.

Also, Android NDK doesn't ship vulkan.hpp because they don't want to support them; there's a sort of anecdote in the other discussion about vulkan.hpp from the right header version probably working though. So, this probably works. android/ndk#1767

@0cc4m If you're around, it seems like a lot of people want to build llama.cpp for adreno using vulkan backend for gpu support, but they're struggling to build it; Do you think there's a way I can make it easier for people to build it, or similar? would it be weird to contrib an include dir with the vulkan.hpp headers at the correct version? ty

We shouldn't include external dependencies in this repo if not absolutely necessary. What you could do is figure out a reliable way to install the header file on Android and get the project to build, then document the steps in the BUILD.md file. If you run into issues, those can be handled separately.

Sadly, in my experience Qualcomm is not handling Vulkan in a proper way, like AMD, Nvidia and Intel are, so there are a lot of quirks to deal with. It would be cool to support mobile devices, but even if we get it to build properly, a lot of work would have to go into optimizing the matrix multiplication shader for Adreno and other phone GPUs. I don't have the time or motivation for that, but if someone else wants to try it, I'll do my best to help.

@github-actions github-actions bot removed the stale label Oct 26, 2024
@Nottlespike
Copy link

Ok so I may have found an oddity to improve Android/ARM/Qualcomm Snapdragon 8 Gen 2 preformance for Q4_0_4_4.gguf quants. I simply built llama.cpp build ae8de6d5 following the current Android docs. I did the basic cmake CPU build

cmake -B build cmake --build build --config Release

I did not add -DGGML_LLAMAFILE=OFF even though it's suggested for Q4_0_4_4 quants.

Based on timestamp as I couldn't find do_sample=false it is about ~2x faster with -ngl 99 vs -ngl 0

Video attaached.

Screen_Recording_20241115_103340_Termux.mp4

@slaren
Copy link
Member

slaren commented Nov 16, 2024

I am not sure if I understand correctly, but -ngl should have no effect on CPU-only builds.

@github-actions github-actions bot added the stale label Dec 16, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants