Skip to content

Bug: Gemma 2 slower with FA #9243

@Azirine

Description

@Azirine

What happened?

Gemma 2 is slower with FA on Apple Silicon (M3 Max).

Name and Version

version: 3642 (1d1ccce)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0

What operating system are you seeing the problem on?

Mac

Relevant log output

| model                          |       size |     params | backend    | ngl | fa | mmap |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | ---: | ------------: | ---------------: |
| gemma2 2B Q8_0                 |   3.17 GiB |     3.20 B | Metal      |  99 |  0 |    0 |         pp512 |   2360.42 ± 3.71 |
| gemma2 2B Q8_0                 |   3.17 GiB |     3.20 B | Metal      |  99 |  0 |    0 |          tg64 |     85.54 ± 0.05 |
| gemma2 2B Q8_0                 |   3.17 GiB |     3.20 B | Metal      |  99 |  1 |    0 |         pp512 |   1487.45 ± 3.27 |
| gemma2 2B Q8_0                 |   3.17 GiB |     3.20 B | Metal      |  99 |  1 |    0 |          tg64 |     50.99 ± 0.17 |
| gemma2 9B Q8_0                 |  10.05 GiB |    10.16 B | Metal      |  99 |  0 |    0 |         pp512 |    608.84 ± 0.96 |
| gemma2 9B Q8_0                 |  10.05 GiB |    10.16 B | Metal      |  99 |  0 |    0 |          tg64 |     30.29 ± 0.04 |
| gemma2 9B Q8_0                 |  10.05 GiB |    10.16 B | Metal      |  99 |  1 |    0 |         pp512 |   397.25 ± 23.27 |
| gemma2 9B Q8_0                 |  10.05 GiB |    10.16 B | Metal      |  99 |  1 |    0 |          tg64 |     21.33 ± 0.01 |

build: 1d1ccce6 (3642)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Apple Metalhttps://en.wikipedia.org/wiki/Metal_(API)bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions