sampling : optimize dist sampler #15704

ggerganov · 2025-08-31T19:27:07Z

When the dist sampler runs on the full vocabulary (e.g. top-p = 1, top-k = 0, min-p = 0), the sampling time can have a non-negligible effect. This PR fuses the probability normalization loop with the inverse transform sampling.

On M2 Ultra, the tg with gpt-oss-120b goes from ~70t/s -> 78t/s at empty context.

ggml-ci

…upport * origin/master: (72 commits) metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799) llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791) CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763) server: add exceed_context_size_error type (ggml-org#15780) Document the new max GPU layers default in help (ggml-org#15771) ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669) CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784) opencl: add hs=40 to FA (ggml-org#15758) CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760) vulkan: fix mmv subgroup16 selection (ggml-org#15775) vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724) vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666) ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762) CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715) model-conversion : fix pyright errors (ggml-org#15770) sampling : optimize dist sampler (ggml-org#15704) llama : fix incorrect model type for Gemma 270M (ggml-org#15764) model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765) CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735) ggml-cpu : optimize RVV kernels (ggml-org#15720) ...

ggml-ci

sampling : optimize dist sampler

be99091

ggml-ci

ggerganov merged commit cdedb70 into master Sep 3, 2025
54 of 56 checks passed

ggerganov deleted the gg/sampling-dist-opt branch September 3, 2025 15:16

walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025

sampling : optimize dist sampler (ggml-org#15704)

5569dc3

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sampling : optimize dist sampler #15704

sampling : optimize dist sampler #15704

Uh oh!

ggerganov commented Aug 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sampling : optimize dist sampler #15704

sampling : optimize dist sampler #15704

Uh oh!

Conversation

ggerganov commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Aug 31, 2025 •

edited

Loading