-
Notifications
You must be signed in to change notification settings - Fork 4
How to Build
This guide describes how to build Android and Windows versions of the QNN backend for llama.cpp, enabling efficient inference on Qualcomm hardware.
-
Docker Engine
- Install following the official Docker guide
- Ensure Docker Compose is included with your installation
-
Source Code
- Clone the repository:
git clone https://github.com/chraac/llama-cpp-qnn-builder.git cd llama-cpp-qnn-builder
- Clone the repository:
Note: Use the latest
main
branch as we're using NDK r27c with important optimization flags for Release builds.
-
Basic Build
- Navigate to the project root directory:
./docker/docker_compose_compile.sh
- Navigate to the project root directory:
-
Build Output
- Executables will be in
build_qnn_arm64-v8a/bin/
- The console will show build progress and completion status:
- Executables will be in
Parameter | Short | Description | Default |
---|---|---|---|
--rebuild |
-r |
Force rebuild of the project | false |
--repo-dir |
Specify llama.cpp repository directory | ../llama.cpp |
|
--debug |
-d |
Build in Debug mode | Release |
--asan |
Enable AddressSanitizer | false |
|
--build-linux-x64 |
Build for Linux x86_64 platform | android arm64-v8a |
|
--perf-log |
Enable Hexagon performance tracking | false |
|
--enable-hexagon-backend |
Enable Hexagon backend support | false |
|
--hexagon-npu-only |
Build Hexagon NPU backend only | false |
|
--disable-hexagon-and-qnn |
Disable both Hexagon and QNN backends | false |
|
--qnn-only |
Build QNN backend only | false |
|
--enable-dequant |
Enable quantized tensor support in Hexagon | false |
# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh
# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend
# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only
# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant
# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log
# Force rebuild with debug symbols
./docker/docker_compose_compile.sh -r -d
To build with Hexagon NPU backend support, you need to create a Docker image that includes the Hexagon SDK.
-
Hexagon SDK
- Option 1: Download SDK from Hexagon NPU SDK - Getting started (version 6.3.0.0 for Linux)
- Option 2: Use an existing SDK installation
-
Base Docker Image
- Required image:
chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27
- Contains Android NDK r27c and build tools
- Required image:
If you already have the Hexagon SDK extracted on your machine:
-
Create Dockerfile (save as
Dockerfile.hexagon_sdk.local
):FROM chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27 ENV HEXAGON_SDK_VERSION='6.3.0.0' ENV HEXAGON_SDK_BASE=/local/mnt/workspace/Qualcomm/Hexagon_SDK ENV HEXAGON_SDK_PATH=${HEXAGON_SDK_BASE}/${HEXAGON_SDK_VERSION} ENV ANDROID_NDK_HOME=/android-ndk/android-ndk-r27c ENV ANDROID_ROOT_DIR=${ANDROID_NDK_HOME}/ RUN mkdir -p ${HEXAGON_SDK_PATH} ARG LOCAL_SDK_PATH ADD ${LOCAL_SDK_PATH} ${HEXAGON_SDK_PATH}/6.3.0.0 # Install required dependencies RUN apt update && apt install -y \ python-is-python3 \ libncurses5 \ lsb-base \ lsb-release \ sqlite3 \ rsync \ git \ build-essential \ libc++-dev \ clang \ cmake # Dummy version info for hexagon-sdk RUN echo 'VERSION_ID="20.04"' > /etc/os-release
-
Create Setup Script (save as
docker_compose_hexagon_local.sh
):#!/bin/bash # Check if SDK path is provided if [ -z "$1" ]; then echo "Usage: $0 /path/to/hexagon/sdk/6.3.0.0" exit 1 fi SDK_PATH="$1" # Check if SDK path exists if [ ! -d "$SDK_PATH" ]; then echo "Error: SDK path does not exist: $SDK_PATH" exit 1 fi # Build the Docker image with SDK embedded docker build -f Dockerfile.hexagon_sdk.local --build-arg LOCAL_SDK_PATH="$SDK_PATH" -t llama-cpp-qnn-hexagon:embedded . # Create a Docker Compose configuration file cat > docker-compose.hexagon.yml << EOF version: '3' services: hexagon-builder: image: llama-cpp-qnn-hexagon:embedded volumes: - ./:/workspace working_dir: /workspace EOF echo "Setup complete! Use the following command to compile with Hexagon support:" echo "./docker/docker_compose_compile.sh --enable-hexagon-backend"
-
Run Setup:
chmod +x docker_compose_hexagon_local.sh ./docker_compose_hexagon_local.sh /path/to/your/Hexagon_SDK/6.3.0.0
-
Build with Hexagon Support:
# Enable Hexagon NPU backend ./docker/docker_compose_compile.sh --enable-hexagon-backend # Or build with Hexagon NPU backend only ./docker/docker_compose_compile.sh --hexagon-npu-only # Access container shell for manual builds docker-compose -f docker-compose.hexagon.yml run --rm hexagon-builder bash
-
Qualcomm AI Engine Direct SDK
- Download from Qualcomm Developer Portal
- Extract to a folder (example:
C:/ml/qnn_sdk/qairt/2.31.0.250130/
)
-
Visual Studio 2022
- Required components:
-
Clang toolchain for ARM64 compilation
-
CMake tools for Visual Studio
-
- Required components:
-
Hexagon SDK (optional, only for Hexagon NPU backend)
- Follow Hexagon NPU SDK - Getting started
- Install Qualcomm Package Manager (QPM) first
- Use QPM to install the Hexagon SDK
- Set environment variable
HEXAGON_SDK_ROOT
to your installation directory
-
Open Project
- Launch Visual Studio 2022
- Click
Continue without code
- Navigate to
File
→Open
→CMake
- Select
CMakeLists.txt
in the llama.cpp root directory
-
Configure CMake
Edit
llama.cpp/CMakePresets.json
to modify thearm64-windows-llvm
configuration:{ "name": "arm64-windows-llvm", "hidden": true, "architecture": { "value": "arm64", "strategy": "external" }, "toolset": { "value": "host=x64", "strategy": "external" }, "cacheVariables": { - "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake" + "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake", + "GGML_QNN": "ON", + "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/", + "BUILD_SHARED_LIBS": "OFF" } },
Important: Replace the QNN SDK path with your actual installation path.
-
Select Configuration
- Choose
arm64-windows-llvm-debug
configuration from the dropdown menu
- Choose
-
Build
- Select
Build
→Build All
- Output will be in
build-arm64-windows-llvm-debug/bin/
- Select
After successful compilation, you'll have these executables:
-
llama-cli.exe
- Main inference executable -
llama-bench.exe
- Benchmarking tool -
test-backend-ops.exe
- Backend operation tests
-
Docker Permission Issues
- Add your user to the docker group:
sudo usermod -aG docker $USER # Log out and back in for changes to take effect
- Add your user to the docker group:
-
Hexagon SDK Compatibility
- Verify you're using exactly version 6.3.0.0 of the SDK
- Ensure SDK directory permissions allow Docker container access
-
Build Failures
- Check Docker logs for detailed error messages:
docker-compose -f docker-compose.hexagon.yml logs
- Check Docker logs for detailed error messages: