Skip to content

Support for arm64 wheels and CPU Features #1342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gaby opened this issue Apr 13, 2024 · 11 comments
Open

Support for arm64 wheels and CPU Features #1342

gaby opened this issue Apr 13, 2024 · 11 comments

Comments

@gaby
Copy link

gaby commented Apr 13, 2024

@abetlen Thank you for the new efforts to start publishing wheels for CUDA, etc.

I noticed that the METAL wheels only work for darwin platform, when using Docker in MacOS the platform is arm64/linux not darwin.

I have a repo where I was building arm64/wheels that could probably be integrated into your workflows: https://github.com/gaby/arm64-wheels

TLDR

    steps:
      - name: Checkout abetlen/llama-cpp-python
        uses: actions/checkout@v4
        with:
          repository: 'abetlen/llama-cpp-python'
          ref: '${{ matrix.version }}'
          submodules: 'recursive'

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3
        with:
          platforms: linux/arm64

      - name: Build wheels
        uses: pypa/[email protected]
        env:
          CIBW_SKIP: "*musllinux* pp*"
          CIBW_REPAIR_WHEEL_COMMAND: ""
          CIBW_ARCHS: "aarch64"
          CIBW_BUILD: "cp311-*"
        with:
          output-dir: wheelhouse/

      - name: Upload wheels as artifacts
        uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ matrix.version }}
          path: wheelhouse/*.whl

This would need to be expanded to support other Python versions/Pypy.

I also notice the CPU wheels don't have specifics about AVX, AVX2, AVX512 are there plans to add support for those?

@abetlen
Copy link
Owner

abetlen commented Apr 14, 2024

Hey @gaby thank you I'll add support for that soon, do you mind giving me a hand testing when the PR is ready?

Wrt the cpu wheels, I'm conflicted because it does really blow up the power set of builds that have to be run for each release. My thought process for the wheels is to build something that works okay for most people but if you want it to run quickly you should build from source.

My current position is that I'm willing to expand the number of builds if we also implement some optimizations each time to mitigate the combinatorial explosion.

Some thoughts I have for long term solutions

  • At the moment we're building llama.cpp for each (platform, python version) however we don't actually bind to the python api so this isn't necessary and we can just build once for each platform and copy in the shared library and other required files. This would involve modifying the build process but should offer a significant speedup for ci runs.
  • For AVX, AVX2, AVX512 we could experiment with shipping all of them in llama-cpp-python, load the basic shared library, check the cpu flags available via ggml and then load the appropriately optimized shared library. The first optimization would likely be a pre-requisite for this but I think it's a valuable speedup.

@gaby
Copy link
Author

gaby commented Apr 14, 2024

Yeah, I can definitely test the arm64/linux wheels on a Raspberry PI.

I was using the wheels from https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels for the longest, but yes it does blow out the number of CI builds/job that get created. I do like your idea of having ggml check the CPU flags to determine what to/not use.

@abetlen
Copy link
Owner

abetlen commented Apr 17, 2024

@gaby sorry I was just re-reviewing this, so currently is it that the wheels that end in _arm64.whl don't work inside of docker on MacOS and we should replace them with the wheels built using the cibuildwheel process from your repo?

@gaby
Copy link
Author

gaby commented Apr 17, 2024

@abetlen If you install the package in MacOS directly the platform is darwin/arm64, which you have wheels for already. If you install the package in MacOS with Docker the platform inside the container is linux/arm64. This is due to MacOS using QEMU when running containers.

The linux/arm64 platform would also benefit users of Raspberry Pi, specially Pi4/Pi5.

@abetlen
Copy link
Owner

abetlen commented Apr 23, 2024

@gaby thank you, I've added your provided code to the release workflow for python versions 3.8-3.12, can you let me know if it works correctly?

@gaby
Copy link
Author

gaby commented Apr 24, 2024

@abetlen I don't see any arm64 wheels here https://abetlen.github.io/llama-cpp-python/whl/cpu/llama-cpp-python/

Running pip install confirms it can't find wheels:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      *** scikit-build-core 0.9.2 using CMake 3.29.2 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmpwa68rmyp/build/CMakeInit.txt
      -- The C compiler identification is unknown
      -- The CXX compiler identification is unknown
      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_C_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
        the compiler, or to the compiler name if it is in the PATH.


      CMake Error at CMakeLists.txt:3 (project):
        No CMAKE_CXX_COMPILER could be found.

        Tell CMake where to find the compiler by setting either the environment
        variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
        to the compiler, or to the compiler name if it is in the PATH.


      -- Configuring incomplete, errors occurred!

      *** CMake configuration failed
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
root@eb7b81f37400:/# uname -m
aarch64

I think it's related to this line in the CI https://github.com/abetlen/llama-cpp-python/blob/main/.github/workflows/build-and-release.yaml#L70

@abetlen abetlen reopened this Apr 29, 2024
@abetlen
Copy link
Owner

abetlen commented Apr 29, 2024

@gaby looks like they were built and uploaded as artifacts but not added to the release(?)

I'll take a look later but this is the last workflow run file if you can spot what I'm doing wrong.

@Smartappli
Copy link
Contributor

Smartappli commented Apr 30, 2024

@gaby looks like they were built and uploaded as artifacts but not added to the release(?)

I'll take a look later but this is the last workflow run file if you can spot what I'm doing wrong.

@gaby @abetlen Fixed here: https://github.com/abetlen/llama-cpp-python/pull/1392/files

@Smartappli
Copy link
Contributor

@abetlen Test: https://github.com/Smartappli/llama-cpp-python/releases/tag/test2

@abetlen
Copy link
Owner

abetlen commented Apr 30, 2024

@Smartappli wow thank you so much!

xhedit pushed a commit to xhedit/llama-cpp-conv that referenced this issue Apr 30, 2024
@asterbini
Copy link

Would be nice to have arm64 updated builds, as the last conda package has no support for many model types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants