Skip to content

Extend GGML_HIP_ROCWMMA_FATTN to support CDNA warp size 64 #12156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

bjj
Copy link
Contributor

@bjj bjj commented Mar 3, 2025

Extends PR #12032 based on comments from @IMbackK to support CDNA, which uses warp size 64.

The first commit changes all relevant uses of the define WARP_SIZE to use the current device size. Once I did this properly, the changes to heuristics I mentioned in the other PR were not necessary.

I added one assert about the total T_BLOCK_X / T_BLOCK_Y size, based on the AMD docs. I fenced it for AMD, but someone else may know if that same limit (basically don't exceed 4*warp_size total) applies to other architectures.

The second commit removes a fence preventing the __launch_bounds__ from applying on HIP. This is a significant performance improvement on prompt processing (>15% at larger sizes) but a slight (<5%) penalty to token generation. I don't know enough about the multiple layers of kernel sizing heuristics to dig into this more right now.

This passes test-backend-ops on a Mi100 (gfx908, CDNA) and a 3090.

@bjj bjj requested a review from JohannesGaessler as a code owner March 3, 2025 03:51
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 3, 2025
@sorasoras
Copy link

would it be faster on RDNA3 with warp size of 64?

@IMbackK
Copy link
Collaborator

IMbackK commented Mar 3, 2025

rdna3 dose not support warp64 in hip as some instructions only work in warp32 mode.

@IMbackK IMbackK closed this Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants