Extend GGML_HIP_ROCWMMA_FATTN to support CDNA warp size 64 #12156

bjj · 2025-03-03T03:51:30Z

Extends PR #12032 based on comments from @IMbackK to support CDNA, which uses warp size 64.

The first commit changes all relevant uses of the define WARP_SIZE to use the current device size. Once I did this properly, the changes to heuristics I mentioned in the other PR were not necessary.

I added one assert about the total T_BLOCK_X / T_BLOCK_Y size, based on the AMD docs. I fenced it for AMD, but someone else may know if that same limit (basically don't exceed 4*warp_size total) applies to other architectures.

The second commit removes a fence preventing the __launch_bounds__ from applying on HIP. This is a significant performance improvement on prompt processing (>15% at larger sizes) but a slight (<5%) penalty to token generation. I don't know enough about the multiple layers of kernel sizing heuristics to dig into this more right now.

This passes test-backend-ops on a Mi100 (gfx908, CDNA) and a 3090.

Co-authored-by: Johannes Gäßler <[email protected]>

…d GGML_HIP_ROCWMMA_FATTN is disabled.

… for AMD and GGML_HIP_ROCWMMA_FATTN not enabled.

…compiled for AMD and GGML_HIP_ROCWMMA_FATTN not enabled." This reverts commit 5516909.

sorasoras · 2025-03-03T14:06:54Z

would it be faster on RDNA3 with warp size of 64?

IMbackK · 2025-03-03T14:15:16Z

rdna3 dose not support warp64 in hip as some instructions only work in warp32 mode.

hjc4869 and others added 13 commits February 23, 2025 01:56

Add GGML_HIP_ROCWMMA_FATTN and rocwmma header check

206d22b

Add rocWMMA support

02369da

Merge branch 'master' into pr

547115d

Update ggml/src/ggml-hip/CMakeLists.txt

419f1ea

Co-authored-by: Johannes Gäßler <[email protected]>

Move comments to reduce confusion.

828577a

Use namespace alias wmma instead of lots of ifdefs.

9d27c38

Fix: FP16_MMA_AVAILABLE should not be checked in host code.

19272bf

Always return false in fp16_mma_available when compiling for HIP an…

29debe1

…d GGML_HIP_ROCWMMA_FATTN is disabled.

Remove the Q->ne[1] > 8 check

5d4ab04

Also always return false in fp16_mma_hardware_available when compiled…

5516909

… for AMD and GGML_HIP_ROCWMMA_FATTN not enabled.

Revert "Also always return false in fp16_mma_hardware_available when …

fea171f

…compiled for AMD and GGML_HIP_ROCWMMA_FATTN not enabled." This reverts commit 5516909.

ggml: Make fattn use hardware warp size instead of 32

a90f4cb

ggml: Make fattn kernel use launch bounds w/HIP

a135b4c

bjj requested a review from JohannesGaessler as a code owner March 3, 2025 03:51

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 3, 2025

IMbackK closed this Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend GGML_HIP_ROCWMMA_FATTN to support CDNA warp size 64 #12156

Extend GGML_HIP_ROCWMMA_FATTN to support CDNA warp size 64 #12156

Uh oh!

bjj commented Mar 3, 2025

Uh oh!

sorasoras commented Mar 3, 2025

Uh oh!

IMbackK commented Mar 3, 2025

Uh oh!

Uh oh!

Extend GGML_HIP_ROCWMMA_FATTN to support CDNA warp size 64 #12156

Extend GGML_HIP_ROCWMMA_FATTN to support CDNA warp size 64 #12156

Uh oh!

Conversation

bjj commented Mar 3, 2025

Uh oh!

sorasoras commented Mar 3, 2025

Uh oh!

IMbackK commented Mar 3, 2025

Uh oh!

Uh oh!