ggml : support `bs > 512` for Metal `ggml_mul_mat_id`

Mixtral models + metal gpu + batch size > 512 = GGML_ASERT. Does not affect models such as llama-2-7b-chat.Q5_K_M.gguf

Hardware: Apple M2 Ultra
RAM: 192GB
llama.cpp current version as of 2024-01-21 (504dc37be8446fb09b1ede70300250ad41be32a2)

./main -f /tmp/prompt1k -m models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 4096 -b 512 << OK
./main -f /tmp/prompt1k -m models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 4096 -b 4096 << FAIL

```
### Assistant:GGML_ASSERT: ggml-metal.m:1511: ne11 <= 512
```

./main -f /tmp/prompt1k -m models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 4096 -b 4096 -ngl 0 << OK

but takes forever

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : support `bs > 512` for Metal `ggml_mul_mat_id` #5070

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ggml : support bs > 512 for Metal ggml_mul_mat_id #5070

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ggml : support `bs > 512` for Metal `ggml_mul_mat_id` #5070