Skip to content

Allow all RDNA2 archs to use sdot4 intrinsic #8629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

jeroen-mostert
Copy link
Contributor

@jeroen-mostert jeroen-mostert commented Jul 22, 2024

The check gating the use of __builtin_amdgc_sdot4 specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using HSA_OVERRIDE_GFX_VERSION (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.

With this change my custom ROCm build that includes gfx1036 support (and uses the gfx1030 kernels) performs identically with or without HSA_OVERRIDE_GFX_VERSION.

The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
@github-actions github-actions bot added the Nvidia GPU Issues specific to Nvidia GPUs label Jul 22, 2024
@jeroen-mostert
Copy link
Contributor Author

Original discussion (including repro) is at lamikr/rocm_sdk_builder#114 (comment) . That incorrectly stated that there was no arch-specific logic since I missed it on first read.

Copy link
Collaborator

@JohannesGaessler JohannesGaessler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not particularly knowledgeable when it comes to ROCm but this looks correct to me.

@JohannesGaessler JohannesGaessler merged commit 46e4741 into ggml-org:master Jul 23, 2024
49 checks passed
@jeroen-mostert jeroen-mostert deleted the patch-2 branch July 23, 2024 10:13
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants