Low GPU usage of quantized Mixtral 8x22B for prompt processing on Metal #6642

beebopkim · 2024-04-12T15:48:23Z

My computer is M1 Max Mac Studio with 32 Cores of GPU with 64 GB of RAM. macOS version is Sonoma 14.4.1.

I run llama-bench from commit 4cc120c and it shows low GPU usage for prompt processing. Of course, inferences on main and server show same low GPU usages.

In the above image, I run benchmark for IQ2_XXS, IQ2_XS, IQ2_S, IQ2_M, and Q2_K_S but IQ1_S and IQ1_M from https://huggingface.co/MaziyarPanahi/Mixtral-8x22B-v0.1-GGUF will show same low GPU usage.

The text was updated successfully, but these errors were encountered:

stefanvarunix · 2024-04-19T09:07:30Z

#6740

github-actions · 2024-06-04T01:06:32Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

beebopkim added the bug-unconfirmed label Apr 12, 2024

github-actions bot added the stale label May 20, 2024

github-actions bot closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low GPU usage of quantized Mixtral 8x22B for prompt processing on Metal #6642

Low GPU usage of quantized Mixtral 8x22B for prompt processing on Metal #6642

beebopkim commented Apr 12, 2024 •

edited

Loading

stefanvarunix commented Apr 19, 2024

Uh oh!

github-actions bot commented Jun 4, 2024

Uh oh!

Low GPU usage of quantized Mixtral 8x22B for prompt processing on Metal #6642

Low GPU usage of quantized Mixtral 8x22B for prompt processing on Metal #6642

Comments

beebopkim commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

stefanvarunix commented Apr 19, 2024

Uh oh!

github-actions bot commented Jun 4, 2024

Uh oh!

beebopkim commented Apr 12, 2024 •

edited

Loading