Support for arm64 wheels and CPU Features #1342

gaby · 2024-04-13T15:45:11Z

@abetlen Thank you for the new efforts to start publishing wheels for CUDA, etc.

I noticed that the METAL wheels only work for darwin platform, when using Docker in MacOS the platform is arm64/linux not darwin.

I have a repo where I was building arm64/wheels that could probably be integrated into your workflows: https://github.com/gaby/arm64-wheels

TLDR

    steps:
      - name: Checkout abetlen/llama-cpp-python
        uses: actions/checkout@v4
        with:
          repository: 'abetlen/llama-cpp-python'
          ref: '${{ matrix.version }}'
          submodules: 'recursive'

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
        with:
          platforms: linux/arm64

      - name: Build wheels
        uses: pypa/[email protected]
        env:
          CIBW_SKIP: "*musllinux* pp*"
          CIBW_REPAIR_WHEEL_COMMAND: ""
          CIBW_ARCHS: "aarch64"
          CIBW_BUILD: "cp311-*"
        with:
          output-dir: wheelhouse/

      - name: Upload wheels as artifacts
        uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.version }}
          path: wheelhouse/*.whl

This would need to be expanded to support other Python versions/Pypy.

I also notice the CPU wheels don't have specifics about AVX, AVX2, AVX512 are there plans to add support for those?

abetlen · 2024-04-14T15:33:24Z

Hey @gaby thank you I'll add support for that soon, do you mind giving me a hand testing when the PR is ready?

Wrt the cpu wheels, I'm conflicted because it does really blow up the power set of builds that have to be run for each release. My thought process for the wheels is to build something that works okay for most people but if you want it to run quickly you should build from source.

My current position is that I'm willing to expand the number of builds if we also implement some optimizations each time to mitigate the combinatorial explosion.

Some thoughts I have for long term solutions

At the moment we're building llama.cpp for each (platform, python version) however we don't actually bind to the python api so this isn't necessary and we can just build once for each platform and copy in the shared library and other required files. This would involve modifying the build process but should offer a significant speedup for ci runs.
For AVX, AVX2, AVX512 we could experiment with shipping all of them in llama-cpp-python, load the basic shared library, check the cpu flags available via ggml and then load the appropriately optimized shared library. The first optimization would likely be a pre-requisite for this but I think it's a valuable speedup.

gaby · 2024-04-14T22:14:09Z

Yeah, I can definitely test the arm64/linux wheels on a Raspberry PI.

I was using the wheels from https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels for the longest, but yes it does blow out the number of CI builds/job that get created. I do like your idea of having ggml check the CPU flags to determine what to/not use.

abetlen · 2024-04-17T13:21:28Z

@gaby sorry I was just re-reviewing this, so currently is it that the wheels that end in _arm64.whl don't work inside of docker on MacOS and we should replace them with the wheels built using the cibuildwheel process from your repo?

gaby · 2024-04-17T13:58:09Z

@abetlen If you install the package in MacOS directly the platform is darwin/arm64, which you have wheels for already. If you install the package in MacOS with Docker the platform inside the container is linux/arm64. This is due to MacOS using QEMU when running containers.

The linux/arm64 platform would also benefit users of Raspberry Pi, specially Pi4/Pi5.

abetlen · 2024-04-23T06:55:54Z

@gaby thank you, I've added your provided code to the release workflow for python versions 3.8-3.12, can you let me know if it works correctly?

gaby · 2024-04-24T13:23:47Z

@abetlen I don't see any arm64 wheels here https://abetlen.github.io/llama-cpp-python/whl/cpu/llama-cpp-python/

Running pip install confirms it can't find wheels:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      *** scikit-build-core 0.9.2 using CMake 3.29.2 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpwa68rmyp/build/CMakeInit.txt
      -- The C compiler identification is unknown
      -- The CXX compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_CXX_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
        to the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
root@eb7b81f37400:/# uname -m
aarch64

I think it's related to this line in the CI https://github.com/abetlen/llama-cpp-python/blob/main/.github/workflows/build-and-release.yaml#L70

abetlen · 2024-04-29T16:46:36Z

@gaby looks like they were built and uploaded as artifacts but not added to the release(?)

I'll take a look later but this is the last workflow run file if you can spot what I'm doing wrong.

Smartappli · 2024-04-30T02:20:53Z

@gaby looks like they were built and uploaded as artifacts but not added to the release(?)

I'll take a look later but this is the last workflow run file if you can spot what I'm doing wrong.

@gaby @abetlen Fixed here: https://github.com/abetlen/llama-cpp-python/pull/1392/files

Smartappli · 2024-04-30T02:22:45Z

@abetlen Test: https://github.com/Smartappli/llama-cpp-python/releases/tag/test2

abetlen · 2024-04-30T02:51:34Z

@Smartappli wow thank you so much!

asterbini · 2024-08-06T10:47:09Z

Would be nice to have arm64 updated builds, as the last conda package has no support for many model types

abetlen closed this as completed in 611781f Apr 23, 2024

abetlen reopened this Apr 29, 2024

xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this issue Apr 30, 2024

ci: Build arm64 wheels. Closes abetlen#1342

e786fb1

abetlen mentioned this issue Apr 30, 2024

Workflow update - PART 1 #1416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for arm64 wheels and CPU Features #1342

Support for arm64 wheels and CPU Features #1342

gaby commented Apr 13, 2024

abetlen commented Apr 14, 2024

Uh oh!

gaby commented Apr 14, 2024

Uh oh!

abetlen commented Apr 17, 2024

Uh oh!

gaby commented Apr 17, 2024

Uh oh!

abetlen commented Apr 23, 2024

Uh oh!

gaby commented Apr 24, 2024

Uh oh!

abetlen commented Apr 29, 2024

Uh oh!

Smartappli commented Apr 30, 2024 •

edited

Loading

Uh oh!

Smartappli commented Apr 30, 2024

Uh oh!

abetlen commented Apr 30, 2024

Uh oh!

asterbini commented Aug 6, 2024

Uh oh!

Support for arm64 wheels and CPU Features #1342

Support for arm64 wheels and CPU Features #1342

Comments

gaby commented Apr 13, 2024

abetlen commented Apr 14, 2024

Uh oh!

gaby commented Apr 14, 2024

Uh oh!

abetlen commented Apr 17, 2024

Uh oh!

gaby commented Apr 17, 2024

Uh oh!

abetlen commented Apr 23, 2024

Uh oh!

gaby commented Apr 24, 2024

Uh oh!

abetlen commented Apr 29, 2024

Uh oh!

Smartappli commented Apr 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Smartappli commented Apr 30, 2024

Uh oh!

abetlen commented Apr 30, 2024

Uh oh!

asterbini commented Aug 6, 2024

Uh oh!

Smartappli commented Apr 30, 2024 •

edited

Loading