[Benchmark] gather_gemv kernel and test #635

Sibylau · 2025-09-19T00:39:24Z

Stacked PRs:

[Benchmark] gather_gemv kernel and test

stack-info: PR: #635, branch: Sibylau/stack/3

Sibylau · 2025-09-19T00:42:49Z

code generation test passes with python -m unittest test_examples.TestExamples.test_gather_gemv

tritonbench accuracy check passes with python benchmarks/run.py --metrics accuracy --kernel gather_gemv (after applying @yf225 's patch for test_eager: gather_gemv: fix eager impl to support large input; fix compile impl meta-pytorch/tritonbench#455 )

yf225

thanks @Sibylau ! left some nit comments, and also might need to rebase to fix conflicts with main branch

yf225 · 2025-09-19T00:53:09Z

examples/gather_gemv.py

+
+    def baseline_gather_gemv(w: Tensor, idx: Tensor, x: Tensor) -> Tensor:
+        """PyTorch baseline implementation."""
+        # A hard-wired fix for tritonbench baseline: w[idx].to(x.dtype) @ x


maybe can remove this comment

yf225 · 2025-09-19T00:53:25Z

examples/gather_gemv.py

+        for idx_val in idx.tolist():
+            outputs.append(w[idx_val].to(x.dtype) @ x)
+        return torch.stack(outputs, dim=0)
+        # return torch.stack([w[idx[0]].to(x.dtype) @ x, w[idx[1]].to(x.dtype) @ x])


maybe remove?

yf225 · 2025-09-19T07:32:23Z

test/test_examples.py

+                args,
+                expected(*args),
+                fn_name="gather_gemv",
+                block_sizes=[64, 64],


Not exactly sure why AMD job fails.. but I suspect changing the block sizes to some smaller value might help

Sibylau · 2025-09-19T21:31:28Z

@yf225 The CI test on rocm fails due to code mismatch:

-         dot = tl.dot(tl.cast(gathered, tl.float32), tl.cast(load_1, tl.float32), input_precision='tf32', out_dtype=tl.float32)
+         dot = tl.split(tl.permute(tl.reshape(tl.split(tl.permute(tl.reshape(tl.split(tl.permute(tl.reshape(tl.split(tl.permute(tl.reshape(tl.split(tl.permute(tl.reshape(tl.dot(tl.reshape(tl.permute(tl.join(tl.cast(gathered, tl.float32),

Do you know why the generated code for AMD is different? can i put a @skipIfRocm for this kernel test?

yf225 · 2025-09-20T01:36:14Z

test/test_examples.py

            )
        )

+    @skipIfRocm("failure on rocm")


Instead of using skipIfRocm which skips the whole test including the output equality check, maybe we can add a skip_rocm: bool arg to def assertExpectedJournal that skips the journal check if the device is rocm. And to check the device is rocm, we can add something like this to _testing.py:

def is_rocm() -> bool: """Return True if running on ROCm (AMD GPU).""" return ( triton.runtime.driver.active.get_current_target().backend == "hip" and DEVICE.type == "cuda" )

(Please feel free to do this in a follow-up PR. Thanks!)

stack-info: PR: #635, branch: Sibylau/stack/3

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025

Sibylau added a commit that referenced this pull request Sep 19, 2025

[Benchmark] gather_gemv kernel and test

d64b898

stack-info: PR: #635, branch: Sibylau/stack/3

Sibylau force-pushed the Sibylau/stack/3 branch from f0765bb to d64b898 Compare September 19, 2025 00:39

Sibylau requested a review from yf225 September 19, 2025 00:43

Sibylau mentioned this pull request Sep 19, 2025

Set up benchmark for TritonBench kernels #234

Open

74 tasks

yf225 approved these changes Sep 19, 2025

View reviewed changes

yf225 reviewed Sep 19, 2025

View reviewed changes

jansel approved these changes Sep 20, 2025

View reviewed changes

yf225 reviewed Sep 20, 2025

View reviewed changes

[Benchmark] gather_gemv kernel and test

c8421c3

stack-info: PR: #635, branch: Sibylau/stack/3

Sibylau force-pushed the Sibylau/stack/3 branch from 78a4c22 to c8421c3 Compare September 22, 2025 18:52

Sibylau mentioned this pull request Sep 22, 2025

Run assertExpectedJournal for gather_gemv when is_cuda #656

Merged

run assertExpectedJournal for gather_gemv when is_cuda (#656)

e27ed69

Sibylau merged commit 7d09e0a into main Sep 23, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] gather_gemv kernel and test #635

[Benchmark] gather_gemv kernel and test #635

Uh oh!

Sibylau commented Sep 19, 2025 •

edited

Loading

Uh oh!

Sibylau commented Sep 19, 2025

Uh oh!

yf225 left a comment

Uh oh!

yf225 Sep 19, 2025

Uh oh!

yf225 Sep 19, 2025

Uh oh!

yf225 Sep 19, 2025

Uh oh!

Sibylau commented Sep 19, 2025

Uh oh!

yf225 Sep 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Benchmark] gather_gemv kernel and test #635

[Benchmark] gather_gemv kernel and test #635

Uh oh!

Conversation

Sibylau commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!