Skip to content

Conversation

@am17an
Copy link
Collaborator

@am17an am17an commented Oct 18, 2025

While looking at this kernel I realized that it is relatively easy to add it for gpt-oss, which does the softmax after the top-k.

Performance on a 4090:

Model Test t/s master t/s cuda_gpt_oss_opt Speedup
gpt-oss 20B MXFP4 MoE tg32 170.99 177.68 1.04
gpt-oss 20B MXFP4 MoE tg64 168.75 175.36 1.04
gpt-oss 20B MXFP4 MoE tg128 167.01 173.33 1.04

@am17an am17an requested a review from slaren as a code owner October 18, 2025 11:24
@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 18, 2025
jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 18, 2025
jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 21, 2025
@am17an am17an force-pushed the cuda_topk_moe_gpt_oss branch from 49a541e to 17c3927 Compare October 21, 2025 11:53
jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 21, 2025
@am17an am17an merged commit 03792ad into ggml-org:master Oct 21, 2025
70 checks passed
@am17an am17an deleted the cuda_topk_moe_gpt_oss branch October 21, 2025 15:21
ye-NX pushed a commit to ye-NX/llama.cpp that referenced this pull request Oct 21, 2025
@avidwriter
Copy link

how to use this?

@am17an
Copy link
Collaborator Author

am17an commented Oct 22, 2025

@avidwriter if you are using the CUDA backend, with the latest master it should already be included

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 22, 2025
FMayran pushed a commit to FMayran/llama.cpp that referenced this pull request Oct 23, 2025
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants