Describe the bug
with 2 Mac studio m2 Ultra: 192GB and 64GB, create a gpu cluster. In resource displays two workers ready. deploy DeepSeek-R1-UD-IQ1_S.gguf(131GB) locally in one big file with the following distribution configuration:


Result
inference is very slow: 0.69 tokens/s

Expected behavior
commonly the same hardware could provide 17 tokens/s with Ollama or llama.cpp backend. GPUStack could catch up with this anyway.
Environment
- GPUStack version:0.5.1
- OS:macos 14/15
- GPU: Mac Studio m2 ultra