Skip to content

metal : reorder write loop in mul mat kernel + style #10231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 9, 2024

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Nov 9, 2024

Minor optimization for BS>1 by writing the results using float4 instead of float.

./scripts/compare-commits.sh master gg/metal-mul-mat-write-opt -m ./models/llama-3.2-3b-instruct/ggml-model-q4_0.gguf -m ./models/llama-3.1-8b/ggml-model-q4_k.gguf -m models/llama-3.2-1b-instruct/ggml-model-q8_0.gguf -m models/qwen2.5-7b-coder/ggml-model-q8_0.gguf -m models/qwen2.5-1.5b-coder/ggml-model-f16.gguf -fa 1 -p 511,512,1,2,3,4,5,6,7,8 -n 128
CPU Model Test t/s master t/s gg/metal-mul-mat-write-opt Speedup
M2 Ultra llama 1B Q8_0 pp1 232.67 230.65 0.99
M2 Ultra llama 1B Q8_0 pp2 125.45 133.59 1.06
M2 Ultra llama 1B Q8_0 pp3 188.18 198.16 1.05
M2 Ultra llama 1B Q8_0 pp4 250.57 265.69 1.06
M2 Ultra llama 1B Q8_0 pp5 313.60 331.30 1.06
M2 Ultra llama 1B Q8_0 pp6 376.26 397.60 1.06
M2 Ultra llama 1B Q8_0 pp7 434.15 460.83 1.06
M2 Ultra llama 1B Q8_0 pp8 496.23 531.32 1.07
M2 Ultra llama 1B Q8_0 pp511 7650.92 7744.50 1.01
M2 Ultra llama 1B Q8_0 pp512 7791.42 7801.47 1.00
M2 Ultra llama 1B Q8_0 tg128 230.77 231.00 1.00
M2 Ultra llama 3B Q4_0 pp1 155.76 155.40 1.00
M2 Ultra llama 3B Q4_0 pp2 61.35 64.07 1.04
M2 Ultra llama 3B Q4_0 pp3 91.48 95.32 1.04
M2 Ultra llama 3B Q4_0 pp4 118.39 124.69 1.05
M2 Ultra llama 3B Q4_0 pp5 147.37 155.11 1.05
M2 Ultra llama 3B Q4_0 pp6 177.50 185.52 1.05
M2 Ultra llama 3B Q4_0 pp7 205.60 215.63 1.05
M2 Ultra llama 3B Q4_0 pp8 234.92 247.89 1.06
M2 Ultra llama 3B Q4_0 pp511 2863.44 2962.87 1.03
M2 Ultra llama 3B Q4_0 pp512 2908.56 2988.59 1.03
M2 Ultra llama 3B Q4_0 tg128 155.74 155.60 1.00
M2 Ultra llama 8B Q4_K_M pp1 85.79 85.90 1.00
M2 Ultra llama 8B Q4_K_M pp2 28.87 29.79 1.03
M2 Ultra llama 8B Q4_K_M pp3 43.22 44.40 1.03
M2 Ultra llama 8B Q4_K_M pp4 56.79 58.38 1.03
M2 Ultra llama 8B Q4_K_M pp5 71.16 72.81 1.02
M2 Ultra llama 8B Q4_K_M pp6 85.22 87.56 1.03
M2 Ultra llama 8B Q4_K_M pp7 98.90 101.67 1.03
M2 Ultra llama 8B Q4_K_M pp8 113.33 116.40 1.03
M2 Ultra llama 8B Q4_K_M pp511 1114.44 1118.62 1.00
M2 Ultra llama 8B Q4_K_M pp512 1128.36 1124.39 1.00
M2 Ultra llama 8B Q4_K_M tg128 85.86 85.52 1.00
M2 Ultra qwen2 1.5B F16 pp1 112.38 112.68 1.00
M2 Ultra qwen2 1.5B F16 pp2 81.90 88.56 1.08
M2 Ultra qwen2 1.5B F16 pp3 122.75 130.37 1.06
M2 Ultra qwen2 1.5B F16 pp4 158.62 169.50 1.07
M2 Ultra qwen2 1.5B F16 pp5 197.93 212.72 1.07
M2 Ultra qwen2 1.5B F16 pp6 241.20 255.49 1.06
M2 Ultra qwen2 1.5B F16 pp7 280.54 298.75 1.06
M2 Ultra qwen2 1.5B F16 pp8 321.05 336.54 1.05
M2 Ultra qwen2 1.5B F16 pp511 6035.39 6208.76 1.03
M2 Ultra qwen2 1.5B F16 pp512 6251.98 6257.90 1.00
M2 Ultra qwen2 1.5B F16 tg128 110.76 110.69 1.00
M2 Ultra qwen2 7B Q8_0 pp1 67.37 68.35 1.01
M2 Ultra qwen2 7B Q8_0 pp2 35.29 36.57 1.04
M2 Ultra qwen2 7B Q8_0 pp3 53.36 54.62 1.02
M2 Ultra qwen2 7B Q8_0 pp4 69.73 71.91 1.03
M2 Ultra qwen2 7B Q8_0 pp5 86.87 89.66 1.03
M2 Ultra qwen2 7B Q8_0 pp6 104.37 107.63 1.03
M2 Ultra qwen2 7B Q8_0 pp7 121.80 125.39 1.03
M2 Ultra qwen2 7B Q8_0 pp8 139.49 143.51 1.03
M2 Ultra qwen2 7B Q8_0 pp511 1347.18 1355.80 1.01
M2 Ultra qwen2 7B Q8_0 pp512 1358.74 1360.51 1.00
M2 Ultra qwen2 7B Q8_0 tg128 67.48 67.62 1.00

@ggerganov ggerganov merged commit 6423c65 into master Nov 9, 2024
56 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* metal : reorder write loop

* metal : int -> short, style

ggml-ci
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* metal : reorder write loop

* metal : int -> short, style

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant