Add metal count equal op #18314

gatbontonpc · 2025-12-23T06:02:26Z

Add metal count equal op

This PR extends the CPU implementations of count_equal to Metal.

The current implementation uses a single thread group, but supports multiple if anything changes. This currently matches the CPU / Cuda implementation in which only takes int32 for src0 and src1. This kernel uses the atomic_fetch_add_explicit, which only supports returning an int32 adds similar to Cuda. This limits the size of the buffers we can take in to 2^31 - 1.

The docs have been updated.

codex generated summary:

Summary

This PR introduces a Metal implementation for COUNT_EQUAL on int32 tensors that uses SIMD-group reduction to efficiently compute per-threadgroup partial counts and accumulate the result into the destination buffer using atomic operations.

The change improves parallel efficiency over a naïve per-element atomic approach by:

Performing the equality comparison per thread
Reducing results within a SIMD group via simd_sum
Emitting a single atomic update per SIMD group

Key Changes

Added a templated Metal kernel kernel_count_equal<int32_t>
Uses shared memory (shmem_i32) and SIMD intrinsics (simd_sum) to aggregate counts
Emits a single atomic_fetch_add_explicit per SIMD group
Registers kernel under the exported symbol:
```
kernel_count_equal_i32
```

ggml/src/ggml-metal/ggml-metal.metal

ggerganov · 2025-12-23T07:09:42Z

ggml/src/ggml-metal/ggml-metal-ops.cpp

+    const size_t smem = pipeline.smem;
+    int64_t      z    = 0;
+    ggml_backend_tensor_set(op, &z, 0, sizeof(z));
+


This does not work, you need to call a separate kernel that fills the buffer with zeros

Added a new kernel to memset a buffer to a value. Similar to fill but simpler pipeline and only takes the buffer and value.

ggml/src/ggml-metal/ggml-metal-impl.h

Co-authored-by: Georgi Gerganov <[email protected]>

gatbontonpc added 6 commits December 22, 2025 19:13

add count equal for metal

ca72e2d

remove trailing whitespace

86cc8ab

updated doc ops table

55349e5

changed shmem to i32

3dcf719

added multi tg and templating

bcb5cd1

Merge branch 'ggml-org:master' into add_metal_count_equal

89e3473

gatbontonpc requested a review from ggerganov as a code owner December 23, 2025 06:02

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 23, 2025

removed BLAS support from Metal docs

84d7c9b

ggerganov reviewed Dec 23, 2025

View reviewed changes

gatbontonpc and others added 2 commits December 22, 2025 23:18

Apply suggestions from code review

0e9b0d6

Co-authored-by: Georgi Gerganov <[email protected]>

add memset to set dst to 0

287fff7

gatbontonpc requested a review from ggerganov December 23, 2025 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add metal count equal op #18314

Add metal count equal op #18314

gatbontonpc commented Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

ggerganov Dec 23, 2025

Uh oh!

gatbontonpc Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add metal count equal op #18314

Are you sure you want to change the base?

Add metal count equal op #18314

Conversation

gatbontonpc commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add metal count equal op

Summary

Key Changes

Uh oh!

Uh oh!

ggerganov Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gatbontonpc Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gatbontonpc commented Dec 23, 2025 •

edited

Loading

gatbontonpc Dec 23, 2025 •

edited

Loading