Skip to content

Conversation

@Sibylau
Copy link
Contributor

@Sibylau Sibylau commented Sep 10, 2025

[Benchmark] add geglu example and test

For the Triton kernel benchmarking issue #234.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 10, 2025
@Sibylau Sibylau requested review from oulgen and yf225 September 10, 2025 00:51
Copy link
Contributor

@oulgen oulgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we now need transformers, you're gonna need to update the CI to handle it.

@oulgen
Copy link
Contributor

oulgen commented Sep 10, 2025

@Sibylau not needing transformers would be better if you wanna write baselines

@@ -0,0 +1,285 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder does python benchmarks/run.py --op geglu --metrics accuracy pass (i.e. showing accuracy check = 1 for all backends)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it passes. Maybe it's good to post accuracy pass info in each PR, and document the performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But torch.compile for the geglu kernel seems to have accuracy issues
image

Copy link
Contributor

@yf225 yf225 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sibylau I just merged #596 to allow passing the TB operator instance as the first argument to the Helion integration wrapper geglu_tritonbench - now we should be able to access the TB baseline's model weights in helion tritonbench wrapper and copy the weights into the helion MLP.

It would be great to run the tritonbench accuracy check again to confirm it passes, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The new commit copies the weights, and the accuracy matches
image

torch.backends.cuda.matmul.fp32_precision = "tf32"
torch.backends.cudnn.conv.fp32_precision = "tf32"
# torch.backends.cuda.matmul.fp32_precision = "tf32"
# torch.backends.cudnn.conv.fp32_precision = "tf32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be commenting out these fp32 precision lines - could you share the errors you were seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should've uncommented before pushing the code. But I ran into this error when using these fp32 precision:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/__main__.py", line 18, in <module>
    main(module=None)
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/main.py", line 101, in __init__
    self.parseArgs(argv)
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/main.py", line 150, in parseArgs
    self.createTests()
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/main.py", line 161, in createTests
    self.test = self.testLoader.loadTestsFromNames(self.testNames,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/loader.py", line 232, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/loader.py", line 232, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/unittest/loader.py", line 162, in loadTestsFromName
    module = __import__(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jieeliu/workspace/helion/test/test_examples.py", line 17, in <module>
    torch.backends.cuda.matmul.fp32_precision = "tf32"
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jieeliu/envs/conda/envs/helion/lib/python3.11/site-packages/torch/backends/cuda/__init__.py", line 149, in __setattr__
    raise AttributeError("Unknown attribute " + name)
AttributeError: Unknown attribute fp32_precision

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm wonder are you using the latest PyTorch nightly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My torch version is

__all__ = ['__version__', 'debug', 'cuda', 'git_version', 'hip', 'xpu']
__version__ = '2.8.0+cu128'
debug = False
cuda: Optional[str] = '12.8'
git_version = 'a1cb3cc05d46d198467bebbb6e8fba50a325d4e7'
hip: Optional[str] = None
xpu: Optional[str] = None

Yeah, I remember it was PyTorch 2.8.0 nightly. I can try different versions too.

down_proj: nn.Linear


class TritonBenchOperator(Protocol):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BaselineModel and TritonBenchOperator classes are type annotations to pass pyright type checking. If there are better ways to write, please suggest

Copy link
Contributor

@yf225 yf225 Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it simple, maybe we can just use object as type for tb_op (similar to

tb_op: object, x: torch.Tensor, B: int, M: int, seqlen: int, sparsity: float
), and add type ignore there as necessary, so that we don't need these classes.

@Sibylau Sibylau requested a review from yf225 September 14, 2025 23:53
down_proj: nn.Linear


class TritonBenchOperator(Protocol):
Copy link
Contributor

@yf225 yf225 Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it simple, maybe we can just use object as type for tb_op (similar to

tb_op: object, x: torch.Tensor, B: int, M: int, seqlen: int, sparsity: float
), and add type ignore there as necessary, so that we don't need these classes.

@Sibylau Sibylau requested a review from yf225 September 15, 2025 21:24
Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Sibylau !

@Sibylau Sibylau merged commit 7cfa568 into main Sep 15, 2025
12 of 13 checks passed
@Sibylau Sibylau deleted the jieeliu/stack/2 branch September 15, 2025 22:26
lolpack pushed a commit to lolpack/helion that referenced this pull request Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants