Skip to content

metal : add memory pool for temp allocs #12850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 22, 2025
Merged

metal : add memory pool for temp allocs #12850

merged 15 commits into from
Apr 22, 2025

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 9, 2025

ref ggml-org/ggml#1152 (comment)

The goal is to introduce a mechanism that allows to allocate temporary buffers in the Metal backend that can be used to store intermediate results. This is needed for some composite operations (like convolution represented by im2col + mul_mat) or for rearranging or padding data on the fly. This is similar to the ggml_cuda_pool_alloc functionality in the CUDA backend.

For testing, currently using the SOFT_MAX operation by introducing an intermediate step of copying the input data to an intermediate buffer and then running the softmax kernel on that intermediate buffer (instead of on the input one).

make -j && MTL_DEBUG_LAYER=1 ./bin/test-backend-ops -b Metal -o SOFT_MAX

TODO:

  • Figure out how to create MTLHeap and allocate buffers from it
  • How to release the buffers
  • Create per-command-buffer heaps
  • How to dynamically resize the heap based on the memory need of the graph
  • Start using MTLHeapTypePlacement to be able to reuse heap memory from previous nodes
  • Un-encode the failed encoder - how? Maybe recreate the command buffer?
  • Check for memory leaks
  • Try to allocate the MTLHeaps dynamically in order to avoid the extra loop over the nodes.
  • Add comments

Next PRs:

  • Use this new functionality to add F16 x F16 MUL_MAT support by casting src1 from F32 to F16
  • Implement im2col + mul_mat for GGML_OP_CONV_XXX

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Apr 9, 2025
@ggerganov ggerganov marked this pull request as ready for review April 15, 2025 12:03
@ggerganov ggerganov merged commit 7b53389 into master Apr 22, 2025
47 checks passed
@ggerganov ggerganov deleted the gg/metal-heap branch April 22, 2025 13:15
pockers21 pushed a commit to pockers21/llama.cpp that referenced this pull request Apr 28, 2025
* metal : add memory pool for temp allocs (wip) [no ci]

* cont : free buffers from the heap

* cont : resize heap [no ci]

* cont : refactor heap [no ci]

* cont : heap for each cmd buffer [no ci]

* cont : fix free

* wip

* cont : fix alignment [no ci]

* cont : not working .. [no ci]

* cont : heap allocation now works [no ci]

* cont : use MTLHeapTypePlacement

ggml-ci

* metal : use dynamic MTLHeap allocations

ggml-ci

* metal : add comments

* metal : disable softmax use of mem_pool

ggml-ci

* metal : final touches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant