PowerPC: Enable MMA for BF16 in llamafile_sgemm #13148

shalinib-ibm · 2025-04-28T06:57:09Z

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Make sure to read the contributing guidelines before submitting a PR

shalinib-ibm · 2025-04-30T11:33:44Z

@ggerganov Can you please review this PR and provide your comments ?

ggerganov

I don't have a machine to test this, but at least fix the indentation of the code and we can merge it.

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <[email protected]>

shalinib-ibm · 2025-05-02T07:19:29Z

I don't have a machine to test this, but at least fix the indentation of the code and we can merge it.

Thank you @ggerganov . I have fixed the code indent. Can you please review ?

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 28, 2025

shalinib-ibm force-pushed the main_bf16_sgemm branch from 6813bbe to b2910f5 Compare April 28, 2025 07:43

ggerganov reviewed Apr 30, 2025

View reviewed changes

shalinib-ibm force-pushed the main_bf16_sgemm branch 3 times, most recently from c6c14fa to b9c6af2 Compare May 2, 2025 06:20

shalinib-ibm force-pushed the main_bf16_sgemm branch from b9c6af2 to 203993c Compare May 2, 2025 06:26

shalinib-ibm requested a review from ggerganov May 2, 2025 16:39

ggerganov merged commit 3f3769b into ggml-org:master May 2, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PowerPC: Enable MMA for BF16 in llamafile_sgemm #13148

PowerPC: Enable MMA for BF16 in llamafile_sgemm #13148

shalinib-ibm commented Apr 28, 2025

shalinib-ibm commented Apr 30, 2025

ggerganov left a comment

shalinib-ibm commented May 2, 2025

PowerPC: Enable MMA for BF16 in llamafile_sgemm #13148

PowerPC: Enable MMA for BF16 in llamafile_sgemm #13148

Conversation

shalinib-ibm commented Apr 28, 2025

shalinib-ibm commented Apr 30, 2025

ggerganov left a comment

Choose a reason for hiding this comment

shalinib-ibm commented May 2, 2025