-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Open
Labels
Description
Name and Version
When running tests against the latest llama.cpp, I noticed crashes on both macOS 12 and 13 (Ventura).
It's either old MacOS version or small VRAM or maybe both that causes the problem. To make it easier to spot the exact location of the failure and resulting crash, I added the following diff
diff --git a/ggml/src/ggml-metal/ggml-metal-context.m b/ggml/src/ggml-metal/ggml-metal-context.m
index af9ff2143..e327fc152 100644
--- a/ggml/src/ggml-metal/ggml-metal-context.m
+++ b/ggml/src/ggml-metal/ggml-metal-context.m
@@ -294,10 +294,12 @@ void ggml_metal_set_tensor_async(ggml_metal_t ctx, struct ggml_tensor * tensor,
void ggml_metal_get_tensor_async(ggml_metal_t ctx, const struct ggml_tensor * tensor, void * data, size_t offset, size_t size) {
@autoreleasepool {
+ GGML_LOG_INFO("%s XXX calling newBufferWithBytesNoCopy data:%p size:%llu\n", __func__, data, size);
id<MTLBuffer> buf_dst = [ctx->device newBufferWithBytesNoCopy:data
length:size
options:MTLResourceStorageModeShared
deallocator:nil];
+ GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed");
struct ggml_metal_buffer_id bid_src = ggml_metal_get_buffer_id(tensor);
if (bid_src.metal == nil) {
To build a version compatible with older MacOS
export SDKROOT=/Applications/Xcode_14.1.0.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
export DEVELOPER_DIR=/Applications/Xcode_14.1.0.app/Contents/Developer
then build with
cmake -B build -DCMAKE_OSX_DEPLOYMENT_TARGET=12.0
cmake --build build --parallel 8
Copy the binaries to a MacOS v12 or v13 system with 16G (or 8G)
./llama-cli -m <path to llama3.2 or qwen3>
...
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
ggml_metal_get_tensor_async XXX calling newBufferWithBytesNoCopy data:0x10a910000 size:607744
/Users/ollama/code/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:302: GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed") failed
If you add --gpu-layers XX
with 1 less than the full load, then the ggml_metal_get_tensor_async code doesn't run, it doesn't crash, and the model works properly.
Operating systems
Mac
GGML backends
Metal
Hardware
tested on Apple M1 Mac mini
Models
tested on llama3.2, qwen3
Problem description & steps to reproduce
./llama-cli -m <path to llama3.2 or qwen3>
...
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
ggml_metal_get_tensor_async XXX calling newBufferWithBytesNoCopy data:0x10a910000 size:607744
/Users/ollama/code/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:302: GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed") failed
First Bad Commit
Relevant log output
/Users/ollama/code/llama.cpp/ggml/src/ggml-metal/ggml-metal-context.m:302: GGML_ASSERT(buf_dst != nil && "newBufferWithBytesNoCopy failed") failed