Skip to content

ggml : support bs > 512 for Metal ggml_mul_mat_id #5070

Closed
@stewartoallen

Description

@stewartoallen

Mixtral models + metal gpu + batch size > 512 = GGML_ASERT. Does not affect models such as llama-2-7b-chat.Q5_K_M.gguf

Hardware: Apple M2 Ultra
RAM: 192GB
llama.cpp current version as of 2024-01-21 (504dc37)

./main -f /tmp/prompt1k -m models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 4096 -b 512 << OK
./main -f /tmp/prompt1k -m models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 4096 -b 4096 << FAIL

### Assistant:GGML_ASSERT: ggml-metal.m:1511: ne11 <= 512

./main -f /tmp/prompt1k -m models/mixtral-8x7b-instruct-v0.1.Q6_K.gguf -c 4096 -b 4096 -ngl 0 << OK

but takes forever

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomersmacosIssues specific to macOS

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions