-
Notifications
You must be signed in to change notification settings - Fork 61
Merge OpenAI Triton commit 9a49104
#3316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `strip` method removes characters that match the input argument, but it does not remove substrings that match the full input. For example, previously, `"num_samples (inc)".strip("inc")` would result in `um_samples` because the character `n` is part of the input argument. Instead of using `strip`, the `replace` method can be used. However, in this case, we can simply output the original metric name (i.e., "num_samples (inc)") since it’s more useful to know whether the metric is inclusive or exclusive.
This PR allows a call to a JITFunction to pass another JITFunction as an argument. For example: ```python @triton.jit def fn_a(x): ... @triton.jit def fn_b(x, fn): ... @triton.jit def fn_c(x): return fn_b(x, fn_a) # fn_a (a JITFunction) is passed as an argument to fn_b (another JITFunction) ``` Prior to #5220, this worked. After #5220, the user needs to annotate the JITFunctions with @triton.constexpr manually (until this PR). Use case: Inductor has some generic helper functions for implementing scans (e.g. exclusive_scan_decoupled_lookback) which take a `combine_fn` to implement the combination function (similar to tl.reduce). These helper functions have stopped working after #5220. https://github.com/pytorch/pytorch/blob/01a4d86b31365cfb484dc17885c9a7ee09c235ab/torch/_inductor/runtime/triton_helpers.py#L321
Disabling lineinfo in ptxas can be somewhat tricky. If the input PTX file contains lineinfo, the generated CUBIN will include lineinfo by default, regardless of whether the `-lineinfo` option is used. To disable lineinfo in the generated CUBINs, the `-suppress-debug-info` option must be used in conjunction with `-lineinfo`. Here also attached an [test.txt](https://github.com/user-attachments/files/18550903/test.txt) file. Please rename it to `test.ptx`. ``` ptxas -lineinfo --gpu-name=sm_86 ./test.ptx nvdisasm -g ./test.ptx # you will still see the lineinfo ``` ``` ptxas -suppress-debug-info -lineinfo --gpu-name=sm_86 ./test.ptx nvdisasm -g ./test.ptx # lineinfo is gone ```
This way users can directly open the file using IDEs like vscode and jump to the corresponding lines, by holding the `ctrl` key and click the line on the terminal. Also, this PR emits an error instead of using workarounds for CUPTI compatibility problems and adds more instructions for using PC sampling in the tutorial. 
…er (#5717) Fixes #5691 and unblocks #5684
The `filter_traceback` call was commented out during the tuple PR. This just restores it and adds a check in the relevant tests.
Taking over triton-lang/triton#4914 due to an inactivity As discussed there, when there are multiple "contiguity of 1" in the `contiguity` array, doing argsort on it means that the resulting `order` becomes ascending for those elements. In the unit test, `order = [2, 1, 0]` becomes `[0, 1, 2]`, which is odd. This convention seems arbitrary, so it is better to pick the row-major ordering by default in such case to be consistent with the rest of code. The current convention is "correct", but we get an additional `convert_layout`. Moreover, this order is inherited to the SMEM allocated during SWP, which could be problematic for other ops. For example, in my case I was getting the order `[4, 0, 1, 2, 3]` in SMEM for 5D blocked scales because only the innermost axis had a contiguity 4 while the rest were 1. @ThomasRaoux @pawelszczerbuk @Jokeren @rawnhenry --------- Co-authored-by: Rawn Henry <[email protected]> Co-authored-by: Masahiro Masuda <[email protected]>
The custom fma codegen for Ampere has been upstreamed to NVPTX, so we no longer need custom conversion code. As a bonus, we now codegen vectorized bf16 ops for free.
For the most part, I expect this will just be a minor cleanup. It is vaguely possible that this might enable some better codegen.
…ds (#5739) It will be more consistent since the CUPTI header and library can use the same version. For roctracer, we don't package the library with triton, so we need to fall back to the system library.
This doesn't break functional backward compatiblity as the new semantic is a subset of the what was allowed before but it would break performance backward compatiblity. The makes it less error prone.
Follow-up to #5733 which somehow the passed CI and auto-merged, even with this bug. TMA should be using `bulk.commit.group` rather than `commit.group`.
Reverts triton-lang/triton#5707 This causes some functional changes that I need to investigate
…r backends" (#5749) Reverts triton-lang/triton#5739 This causes problem on setup where the wheel is used on a remote machine. @Jokeren is working on a different fix
Pingpong pass only expects to handle local load ops as A/B. Avoid using the transform when different op is detected. Enabling it is the next step.
When reducing a 1D tensor the order of elements doesn't matter. This allows us to use a more relaxed version of reshape.
Pulls in llvm/llvm-project#123200 which is useful and also handles triton-lang/triton#5664. Integrations were required due to llvm/llvm-project#123026, llvm/llvm-project#123321 and llvm/llvm-project#123326. Also closes triton-lang/triton#5685
…lder Signed-off-by: Whitney Tsang <[email protected]>
56e95e2
to
2b5b6f7
Compare
pbchekin
approved these changes
Jan 30, 2025
This reverts commit 0ba4205.
bc4675a
64fff02
c5b8e80
to
be51333
Compare
This was referenced Jan 31, 2025
64fff02
c2c193a
c2c193a
9a49104
This was referenced Jan 31, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from ac61cb0 to 9a49104 (Jan 29).
Pass rate: 98.19%
Please do not squash and merge this PR.