How to Build

This guide describes how to build Android and Windows versions of the QNN backend for llama.cpp, enabling efficient inference on Qualcomm hardware.

Android Build

Android Prerequisites

Docker Engine
- Install following the official Docker guide
- Ensure Docker Compose is included with your installation

Source Code

Clone the repository:

git clone https://github.com/chraac/llama-cpp-qnn-builder.git
cd llama-cpp-qnn-builder

Note: Use the latest main branch as we're using NDK r27c with important optimization flags for Release builds.

Android Build Process

Basic Build
- Navigate to the project root directory:
```
./docker/docker_compose_compile.sh
```
Build Output
- Executables will be in build_qnn_arm64-v8a/bin/
- The console will show build progress and completion status:

Build Options

Parameter	Short	Description	Default
`--rebuild`	`-r`	Force rebuild of the project	`false`
`--repo-dir`		Specify llama.cpp repository directory	`../llama.cpp`
`--debug`	`-d`	Build in Debug mode	`Release`
`--asan`		Enable AddressSanitizer	`false`
`--build-linux-x64`		Build for Linux x86_64 platform	`android arm64-v8a`
`--perf-log`		Enable Hexagon performance tracking	`false`
`--enable-hexagon-backend`		Enable Hexagon backend support	`false`
`--hexagon-npu-only`		Build Hexagon NPU backend only	`false`
`--disable-hexagon-and-qnn`		Disable both Hexagon and QNN backends	`false`
`--qnn-only`		Build QNN backend only	`false`
`--enable-dequant`		Enable quantized tensor support in Hexagon	`false`

Build Examples

# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh

# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend

# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only

# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant

# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log

# Force rebuild with debug symbols
./docker/docker_compose_compile.sh -r -d

Hexagon SDK Setup

To build with Hexagon NPU backend support, you need to create a Docker image that includes the Hexagon SDK.

Prerequisites

Base Docker Image
- Required image: chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27
- Contains Android NDK r27c and build tools

Building the Hexagon SDK Image

Now you can add the Hexagon SDK (community edition) url to your docker image directly.

Create Dockerfile (save as Dockerfile.hexagon_sdk.local):

FROM chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27

ENV HEXAGON_SDK_VERSION='6.3.0.0'
ENV HEXAGON_SDK_BASE=/local/mnt/workspace/Qualcomm/Hexagon_SDK
ENV HEXAGON_SDK_PATH=${HEXAGON_SDK_BASE}/${HEXAGON_SDK_VERSION}
ENV ANDROID_NDK_HOME=/android-ndk/android-ndk-r27c
ENV ANDROID_ROOT_DIR=${ANDROID_NDK_HOME}/

RUN mkdir -p ${HEXAGON_SDK_PATH}
ADD https://softwarecenter.qualcomm.com/api/download/software/sdks/Hexagon_SDK/Linux/Debian/${HEXAGON_SDK_VERSION}/Hexagon_SDK.zip /tmp/

# Install required dependencies
RUN apt update && apt install -y \
    python-is-python3 \
    libncurses5 \
    lsb-base \
    lsb-release \
    sqlite3 \
    rsync \
    git \
    build-essential \
    libc++-dev \
    clang \
    cmake \
    unzip

# Unarchive Hexagon_SDK
RUN unzip -o /tmp/Hexagon_SDK.zip -d ${HEXAGON_SDK_BASE}/../ && \
   rm -rf ${HEXAGON_SDK_BASE}/${HEXAGON_SDK_VERSION}/tools/android-ndk-*

# Dummy version info for hexagon-sdk 
RUN echo 'VERSION_ID="20.04"' > /etc/os-release

Create Setup Script (save as docker_compose_hexagon_local.sh):

#!/bin/bash

# Check if SDK path is provided
if [ -z "$1" ]; then
echo "Usage: $0 /path/to/hexagon/sdk/6.3.0.0"
exit 1
fi

SDK_PATH="$1"

# Check if SDK path exists
if [ ! -d "$SDK_PATH" ]; then
echo "Error: SDK path does not exist: $SDK_PATH"
exit 1
fi

# Build the Docker image with SDK embedded
docker build -f Dockerfile.hexagon_sdk.local -t llama-cpp-qnn-hexagon:embedded .

# Create a Docker Compose configuration file
cat > docker-compose.hexagon.yml << EOF
version: '3'
services:
hexagon-builder:
   image: llama-cpp-qnn-hexagon:embedded
   volumes:
      - ./:/workspace
   working_dir: /workspace
EOF

echo "Setup complete! Use the following command to compile with Hexagon support:"
echo "./docker/docker_compose_compile.sh --enable-hexagon-backend"

Run Setup:

chmod +x docker_compose_hexagon_local.sh
./docker_compose_hexagon_local.sh

Build with Hexagon Support:

# Enable Hexagon NPU backend
./docker/docker_compose_compile.sh --enable-hexagon-backend

# Or build with Hexagon NPU backend only
./docker/docker_compose_compile.sh --hexagon-npu-only

# Access container shell for manual builds
docker-compose -f docker-compose.hexagon.yml run --rm hexagon-builder bash

Windows Build

Windows Prerequisites

Qualcomm AI Engine Direct SDK
- Download from Qualcomm Developer Portal
- Extract to a folder (example: C:/ml/qnn_sdk/qairt/2.31.0.250130/)
Visual Studio 2022
- Required components:
  - Clang toolchain for ARM64 compilation
  - CMake tools for Visual Studio
Hexagon SDK (optional, only for Hexagon NPU backend)
- Follow Hexagon NPU SDK - Getting started
- Install Qualcomm Package Manager (QPM) first
- Use QPM to install the Hexagon SDK
- Set environment variable HEXAGON_SDK_ROOT to your installation directory

Windows Build Process

Open Project
- Launch Visual Studio 2022
- Click Continue without code
- Navigate to File → Open → CMake
- Select CMakeLists.txt in the llama.cpp root directory

Configure CMake

Edit llama.cpp/CMakePresets.json to modify the arm64-windows-llvm configuration:

{
    "name": "arm64-windows-llvm", 
    "hidden": true,
    "architecture": { "value": "arm64", "strategy": "external" },
    "toolset": { "value": "host=x64", "strategy": "external" },
    "cacheVariables": {
-        "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake"
+        "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake",
+        "GGML_QNN": "ON",
+        "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/",
+        "BUILD_SHARED_LIBS": "OFF"
    }
},

Important: Replace the QNN SDK path with your actual installation path.

Select Configuration
- Choose arm64-windows-llvm-debug configuration from the dropdown menu
Build
- Select Build → Build All
- Output will be in build-arm64-windows-llvm-debug/bin/

Windows Build Output

After successful compilation, you'll have these executables:

llama-cli.exe - Main inference executable
llama-bench.exe - Benchmarking tool
test-backend-ops.exe - Backend operation tests

Troubleshooting

Common Issues

Docker Permission Issues

Add your user to the docker group:

sudo usermod -aG docker $USER
# Log out and back in for changes to take effect

Hexagon SDK Compatibility
- Verify you're using exactly version 6.3.0.0 of the SDK
- Ensure SDK directory permissions allow Docker container access
Build Failures
- Check Docker logs for detailed error messages:
```
docker-compose -f docker-compose.hexagon.yml logs
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to Build

Table of Contents

Android Build

Android Prerequisites

Android Build Process

Build Options

Build Examples

Hexagon SDK Setup

Prerequisites

Building the Hexagon SDK Image

Windows Build

Windows Prerequisites

Windows Build Process

Windows Build Output

Troubleshooting

Common Issues

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally