Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I am using llama-cpp-python on M1 mac .
I installed using the cmake flag as mentioned in README.
I am able to run inference, but I am noticing that its mostly using CPU .
I expected it to use GPU
How do I make sure llama-cpp-python is using GPU on m1 mac?
Current Behavior
I am using llama-cpp-python on M1 mac .
I installed using the cmake flag as mentioned in README.
I am able to run inference, but I am noticing that its mostly using CPU .
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
M1 Mac
llama_cpp_python 0.2.6
Python 3.9.0
macOS 13.5.2
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- Load llama2-7b 4-bit quantization model
- Run inference