violent crash on Mac Mini M2 8GB RAM when trying to use GPU

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [ x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [ x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [ x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [ x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

I have a M2 Mac Mini with 8 GB unified memory. I tried to run `llama.cpp` as explained here: https://github.com/ggerganov/llama.cpp/pull/1642

# Current Behavior

My computer froze and rebooted after sometime. I got a brief flash of a pink screen of death. I re-tried several times and got the same behavior. Once instead of crashing, I got an assert in following code:

```
for (int i = 0; i < n_cb; i++) {
        MTLCommandBufferStatus status = (MTLCommandBufferStatus) [command_buffers[i] status];
        if (status != MTLCommandBufferStatusCompleted) {
            fprintf(stderr, "%s: command buffer %d failed with status %lu\n", __func__, i, status);
            GGML_ASSERT(false);
        }
    }
``` 

Each time I was able to see console output saying its trying to load GPU buffers similar to what we see in the video on https://github.com/ggerganov/llama.cpp/pull/1642.

The model I was trying is `gpt4-x-vicuna-13B.ggmlv3.q5_K_M.bin`

```
MODEL=gpt4-x-vicuna-13B.ggmlv3.q5_K_M.bin
CONTEXT_SIZE=2048
PROMPT="$SCRIPT_DIR/../prompts/chat-with-bob.txt"

cd $SCRIPT_DIR/../build/release/bin
set -x
./main \
-m "$MODEL" \
 -c $CONTEXT_SIZE \
 --repeat_penalty 1.0 \
--color \
-i \
-r "User:" \
--in-prefix " " \
-f "$PROMPT"
```

over [here](https://github.com/ggerganov/llama.cpp/issues/1881#issuecomment-1608449265) I see a funny comment:

>yes - this is fixed now that this crashes instead of giving bad output

how is crashing acceptable behaviour?

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

M2 Mac Mini w/ 8 GB memory

* Operating System, e.g. for Linux:

```
± uname -a
Darwin 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun  8 22:21:34 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T8112 arm64
```

* SDK version, e.g. for Linux:

```
$ python3 --version
$ make --version
$ g++ --version
```

# Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

# Steps to Reproduce

see above.

# Failure Logs



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

violent crash on Mac Mini M2 8GB RAM when trying to use GPU #2141

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

violent crash on Mac Mini M2 8GB RAM when trying to use GPU #2141

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions