metal : add memory pool for temp allocs #12850

ggerganov · 2025-04-09T11:58:23Z

The goal is to introduce a mechanism that allows to allocate temporary buffers in the Metal backend that can be used to store intermediate results. This is needed for some composite operations (like convolution represented by im2col + mul_mat) or for rearranging or padding data on the fly. This is similar to the ggml_cuda_pool_alloc functionality in the CUDA backend.

For testing, currently using the SOFT_MAX operation by introducing an intermediate step of copying the input data to an intermediate buffer and then running the softmax kernel on that intermediate buffer (instead of on the input one).

make -j && MTL_DEBUG_LAYER=1 ./bin/test-backend-ops -b Metal -o SOFT_MAX

TODO:

Figure out how to create MTLHeap and allocate buffers from it
How to release the buffers
Create per-command-buffer heaps
How to dynamically resize the heap based on the memory need of the graph
Start using MTLHeapTypePlacement to be able to reuse heap memory from previous nodes
Un-encode the failed encoder - how? Maybe recreate the command buffer?
Check for memory leaks
Try to allocate the MTLHeaps dynamically in order to avoid the extra loop over the nodes.
Add comments

Next PRs:

Use this new functionality to add F16 x F16 MUL_MAT support by casting src1 from F32 to F16
Implement im2col + mul_mat for GGML_OP_CONV_XXX

ggml-ci

* metal : add memory pool for temp allocs (wip) [no ci] * cont : free buffers from the heap * cont : resize heap [no ci] * cont : refactor heap [no ci] * cont : heap for each cmd buffer [no ci] * cont : fix free * wip * cont : fix alignment [no ci] * cont : not working .. [no ci] * cont : heap allocation now works [no ci] * cont : use MTLHeapTypePlacement ggml-ci * metal : use dynamic MTLHeap allocations ggml-ci * metal : add comments * metal : disable softmax use of mem_pool ggml-ci * metal : final touches

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Apr 9, 2025

ggerganov force-pushed the gg/metal-heap branch from 5c20b9f to 1df0996 Compare April 11, 2025 09:12

ggerganov mentioned this pull request Apr 11, 2025

Depthwise 2D convolution ggml-org/ggml#1152

Merged

ggerganov added 11 commits April 14, 2025 11:33

metal : add memory pool for temp allocs (wip) [no ci]

c254b21

cont : free buffers from the heap

2341e7c

cont : resize heap [no ci]

3745031

cont : refactor heap [no ci]

9433c50

cont : heap for each cmd buffer [no ci]

2804db7

cont : fix free

e1dc4df

wip

c77ccf0

cont : fix alignment [no ci]

c2c0f0f

cont : not working .. [no ci]

cbb617e

cont : heap allocation now works [no ci]

91d5dc5

cont : use MTLHeapTypePlacement

455691c

ggml-ci

ggerganov force-pushed the gg/metal-heap branch from 9ebaab6 to 455691c Compare April 14, 2025 08:34

ggerganov added 3 commits April 14, 2025 19:03

metal : use dynamic MTLHeap allocations

6f41327

ggml-ci

metal : add comments

69f7b09

metal : disable softmax use of mem_pool

e5b7f7e

ggml-ci

ggerganov marked this pull request as ready for review April 15, 2025 12:03

metal : final touches

dc4c048

ggerganov merged commit 7b53389 into master Apr 22, 2025
47 checks passed

ggerganov deleted the gg/metal-heap branch April 22, 2025 13:15

This was referenced May 2, 2025

mtmd : add **vision** support for Mistral Small 3.1 #13231

Merged

metal : optimize MoE for large batches #13388

Merged

ggerganov mentioned this pull request May 13, 2025

metal : optimize multi-sequence FA vec kernel #13493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : add memory pool for temp allocs #12850

metal : add memory pool for temp allocs #12850

Uh oh!

ggerganov commented Apr 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

metal : add memory pool for temp allocs #12850

metal : add memory pool for temp allocs #12850

Uh oh!

Conversation

ggerganov commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Apr 9, 2025 •

edited

Loading