Skip to content

Merge OpenAI Triton commit 9a49104 #3316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jan 31, 2025
Merged

Merge OpenAI Triton commit 9a49104 #3316

merged 29 commits into from
Jan 31, 2025

Conversation

whitneywhtsang
Copy link
Contributor

@whitneywhtsang whitneywhtsang commented Jan 30, 2025

This PR change the Triton base from ac61cb0 to 9a49104 (Jan 29).
Pass rate: 98.19%

Please do not squash and merge this PR.

peterbell10 and others added 20 commits January 28, 2025 17:53
The `strip` method removes characters that match the input argument, but
it does not remove substrings that match the full input. For example,
previously, `"num_samples (inc)".strip("inc")` would result in
`um_samples` because the character `n` is part of the input argument.
Instead of using `strip`, the `replace` method can be used. However, in
this case, we can simply output the original metric name (i.e.,
"num_samples (inc)") since it’s more useful to know whether the metric
is inclusive or exclusive.
This PR allows a call to a JITFunction to pass another JITFunction as an
argument.

For example:

```python
@triton.jit
def fn_a(x):
    ...

@triton.jit
def fn_b(x, fn):
    ...

@triton.jit
def fn_c(x):
    return fn_b(x, fn_a)  # fn_a (a JITFunction) is passed as an argument to fn_b (another JITFunction)
```

Prior to #5220, this worked. After #5220, the user needs to annotate the
JITFunctions with @triton.constexpr manually (until this PR).

Use case: Inductor has some generic helper functions for implementing
scans (e.g. exclusive_scan_decoupled_lookback) which take a `combine_fn`
to implement the combination function (similar to tl.reduce). These
helper functions have stopped working after #5220.


https://github.com/pytorch/pytorch/blob/01a4d86b31365cfb484dc17885c9a7ee09c235ab/torch/_inductor/runtime/triton_helpers.py#L321
Disabling lineinfo in ptxas can be somewhat tricky. If the input PTX
file contains lineinfo, the generated CUBIN will include lineinfo by
default, regardless of whether the `-lineinfo` option is used. To
disable lineinfo in the generated CUBINs, the `-suppress-debug-info`
option must be used in conjunction with `-lineinfo`.

Here also attached an
[test.txt](https://github.com/user-attachments/files/18550903/test.txt)
file. Please rename it to `test.ptx`.

```
ptxas -lineinfo --gpu-name=sm_86 ./test.ptx
nvdisasm -g ./test.ptx # you will still see the lineinfo
```

```
ptxas -suppress-debug-info -lineinfo --gpu-name=sm_86 ./test.ptx
nvdisasm -g ./test.ptx # lineinfo is gone
```
This way users can directly open the file using IDEs like vscode and
jump to the corresponding lines, by holding the `ctrl` key and click the
line on the terminal.

Also, this PR emits an error instead of using workarounds for CUPTI
compatibility problems and adds more instructions for using PC sampling
in the tutorial.


![image](https://github.com/user-attachments/assets/9ecb90c3-0953-43bd-8db4-605dc13c38a2)
The `filter_traceback` call was commented out during the tuple PR. This
just restores it and adds a check in the relevant tests.
Taking over triton-lang/triton#4914 due to an
inactivity

As discussed there, when there are multiple "contiguity of 1" in the
`contiguity` array, doing argsort on it means that the resulting `order`
becomes ascending for those elements. In the unit test, `order = [2, 1,
0]` becomes `[0, 1, 2]`, which is odd. This convention seems arbitrary,
so it is better to pick the row-major ordering by default in such case
to be consistent with the rest of code.

The current convention is "correct", but we get an additional
`convert_layout`. Moreover, this order is inherited to the SMEM
allocated during SWP, which could be problematic for other ops. For
example, in my case I was getting the order `[4, 0, 1, 2, 3]` in SMEM
for 5D blocked scales because only the innermost axis had a contiguity 4
while the rest were 1.

@ThomasRaoux @pawelszczerbuk @Jokeren @rawnhenry

---------

Co-authored-by: Rawn Henry <[email protected]>
Co-authored-by: Masahiro Masuda <[email protected]>
The custom fma codegen for Ampere has been upstreamed to NVPTX, so we no longer need custom conversion code. As a bonus, we now codegen vectorized bf16 ops for free.
For the most part, I expect this will just be a minor cleanup. It is
vaguely possible that this might enable some better codegen.
…ds (#5739)

It will be more consistent since the CUPTI header and library can use
the same version.
For roctracer, we don't package the library with triton, so we need to
fall back to the system library.
This doesn't break functional backward compatiblity as the new semantic
is a subset of the what was allowed before but it would break
performance backward compatiblity.
The makes it less error prone.
Follow-up to #5733 which somehow the passed CI and auto-merged, even
with this bug. TMA should be using `bulk.commit.group` rather than
`commit.group`.
Reverts triton-lang/triton#5707

This causes some functional changes that I need to investigate
…r backends" (#5749)

Reverts triton-lang/triton#5739

This causes problem on setup where the wheel is used on a remote
machine. @Jokeren is working on a different fix
Pingpong pass only expects to handle local load ops as A/B.
Avoid using the transform when different op is detected.
Enabling it is the next step.
When reducing a 1D tensor the order of elements doesn't matter. This
allows us to use a more relaxed version of reshape.
@whitneywhtsang whitneywhtsang self-assigned this Jan 30, 2025
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit bc4675a Merge OpenAI Triton commit 64fff02 Jan 31, 2025
@whitneywhtsang whitneywhtsang marked this pull request as ready for review January 31, 2025 02:54
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit 64fff02 Merge OpenAI Triton commit c2c193a Jan 31, 2025
@whitneywhtsang whitneywhtsang changed the title Merge OpenAI Triton commit c2c193a Merge OpenAI Triton commit 9a49104 Jan 31, 2025
@whitneywhtsang whitneywhtsang merged commit 4a99671 into main Jan 31, 2025
5 checks passed
@whitneywhtsang whitneywhtsang deleted the whitneywhtsang/merge branch January 31, 2025 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants