GEMM kernel regression slipping past llama-perplexity? #17293

tlemo · 2025-11-16T01:29:38Z

tlemo
Nov 16, 2025

While experimenting with GEMM kernels (CUDA), I introduced a bug. The perplexing part is that all the unit tests (test-backend-ops) were passing, and the PPL score was exactly the same as before the change which introduced the bug, and the only way I noticed that something is wrong was playing with llama-cli.

The unit tests gap is understandable, since the coverage is not 100%, and the only case that was affected was an fp16 MUL_MAT with non-contiguous tensors of "irregular" dimensions. I assumed that unit tests + PPL score would catch regressions with high confidence, but it doesn't appear to be the case.

What would be a better way to catch functional regressions?

am17an · 2025-11-16T06:12:33Z

am17an
Nov 16, 2025
Collaborator

Without knowing what you changed, as I recently found out PPL tests are impervious to batch_size = 1 bugs.

0 replies

tlemo · 2025-11-18T00:59:48Z

tlemo
Nov 18, 2025
Author

Thanks @am17an, I'll keep the batch_size == 1 in mind. Here, both llama-cli and llama-perplexity have batch_size > 1.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GEMM kernel regression slipping past llama-perplexity? #17293

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

GEMM kernel regression slipping past llama-perplexity? #17293

Uh oh!

tlemo Nov 16, 2025

Replies: 2 comments

Uh oh!

am17an Nov 16, 2025 Collaborator

Uh oh!

Uh oh!

tlemo Nov 18, 2025 Author

tlemo
Nov 16, 2025

am17an
Nov 16, 2025
Collaborator

tlemo
Nov 18, 2025
Author