Skip to content

Eval bug: On macOS 12 / 13 metal crashes after commit 0f0a3c28 #16266

@mchiang0610

Description

@mchiang0610

Name and Version

When running tests against the latest llama.cpp, I noticed crashes on both macOS 12 and 13 (Ventura).

It's either old MacOS version or small VRAM or maybe both that causes the problem. To make it easier to spot the exact location of the failure and resulting crash, I added the following diff

diff --git a/ggml/src/ggml-metal/ggml-metal-context.m b/ggml/src/ggml-metal/ggml-metal-context.m
index af9ff2143..e327fc152 100644
--- a/ggml/src/ggml-metal/ggml-metal-context.m
+++ b/ggml/src/ggml-metal/ggml-metal-context.m
@@ -294,10 +294,12 @@ void ggml_metal_set_tensor_async(ggml_metal_t ctx, struct ggml_tensor * tensor,
 
 void ggml_metal_get_tensor_async(ggml_metal_t ctx, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) {
     @autoreleasepool {
+        GGML_LOG_INFO("%s XXX calling newBufferWithBytesNoCopy data:%p size:%llu\n", __func__, data, size);
         id<MTLBuffer> buf_dst = [ctx->device newBufferWithBytesNoCopy:data
                                                                length:size
                                                               options:MTLResourceStorageModeShared
                                                           deallocator:nil];
+        GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed");
 
         struct ggml_metal_buffer_id bid_src = ggml_metal_get_buffer_id(tensor);
         if (bid_src.metal == nil) {

To build a version compatible with older MacOS

export SDKROOT=/Applications/Xcode_14.1.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
export DEVELOPER_DIR=/Applications/Xcode_14.1.0.app/Contents/Developer

then build with

cmake -B build -DCMAKE_OSX_DEPLOYMENT_TARGET=12.0
cmake --build build --parallel 8

Copy the binaries to a MacOS v12 or v13 system with 16G (or 8G)

./llama-cli -m <path to llama3.2 or qwen3>
...
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
ggml_metal_get_tensor_async XXX calling newBufferWithBytesNoCopy data:0x10a910000 size:607744
/Users/ollama/code/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:302: GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed") failed

If you add --gpu-layers XX with 1 less than the full load, then the ggml_metal_get_tensor_async code doesn't run, it doesn't crash, and the model works properly.

Operating systems

Mac

GGML backends

Metal

Hardware

tested on Apple M1 Mac mini

Models

tested on llama3.2, qwen3

Problem description & steps to reproduce

./llama-cli -m <path to llama3.2 or qwen3>
...
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
ggml_metal_get_tensor_async XXX calling newBufferWithBytesNoCopy data:0x10a910000 size:607744
/Users/ollama/code/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:302: GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed") failed

First Bad Commit

0f0a3c2

Relevant log output

/Users/ollama/code/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:302: GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed") failed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions