llama-cpp-python not using GPU on m1

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

I am using llama-cpp-python on M1 mac .
I installed using the cmake flag as mentioned in README.

I am able to run inference, but I am noticing that its mostly using CPU .
I expected it to use GPU
![Screenshot 2023-09-26 at 11 59 38 AM](https://github.com/abetlen/llama-cpp-python/assets/16617092/8660d670-2893-4e48-afa4-72bc88b9ec50)

How do I make sure llama-cpp-python is using GPU on m1 mac?

# Current Behavior

I am using llama-cpp-python on M1 mac .
I installed using the cmake flag as mentioned in README.

I am able to run inference, but I am noticing that its mostly using CPU .


# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
```
M1 Mac
llama_cpp_python          0.2.6
Python 3.9.0
macOS 13.5.2
```


# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. Load llama2-7b 4-bit quantization model
2. Run inference



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-cpp-python not using GPU on m1 #756

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

llama-cpp-python not using GPU on m1 #756

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions