-
Notifications
You must be signed in to change notification settings - Fork 68
IFU-master-2023-03-01 #1194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IFU-master-2023-03-01 #1194
Conversation
Rolling back the default change for Adam and rectifying the docs to reflect that AdamW never defaulted to fused. Since our fused implementations are relatively newer, let's give them a longer bake-in time before flipping the switch for every user. Pull Request resolved: pytorch#95241 Approved by: https://github.com/ngimel
Running an operator registered in python returning a symint will result in the following error: ``` RuntimeError: Unable to cast Python instance of type <class 'torch.SymInt'> to C++ type 'long' ``` The interaction of 2 things make the issue being triggered: - We use boxed kernel here. For boxed kernel, we need convert py::object to IValue in torch/csrc/autograd/python_variable.cpp pushPyOutToStack . - In the schema parsing code in torch/csrc/jit/frontend/schema_type_parser.cpp SchemaTypeParser::parseFakeAndRealType , if a SymInt is found, we register a Int type instead (not sure why we do this), and register SymInt as the real type. The result is we would convert an SymInt to int in pushPyOutToStack and cause the issue. The fix is to use real type when we convert py::object to IValue. BTW, registering the same op using C++ API does not trigger the issue. ``` TORCH_LIBRARY(clib, m) { m.def("sqsum(SymInt a, SymInt b) -> SymInt", [](SymInt a, SymInt b) -> SymInt { return a * a + b * b; }); } ``` The reason is, the kernel registered in C++ is unboxed kernel and it does not trigger the code path above that converts an py::object to IValue. Pull Request resolved: pytorch#95240 Approved by: https://github.com/larryliu0820, https://github.com/ezyang
Simply pipes the arg to the existing torch.cuda API by the same name. Useful for locally debugging OOMs that happened on a smaller GPU. Pull Request resolved: pytorch#95260 Approved by: https://github.com/davidberard98
Summary: attempt two at enabling search of global/local cache, regardless of `max_autotune`, by default. the main problem is that triton template generation seems to be broken in some cases for CI tests (maybe dynamic shapes), but this is going to take more time to figure out. for now, we can just cancel template generation instead of raising an assertion error and filter out those failed templates. Test Plan: sandcastle + CI Differential Revision: D43424922 Pull Request resolved: pytorch#95134 Approved by: https://github.com/jansel
…4970)" This reverts commit 5d2eb6d. Reverted pytorch#94970 on behalf of https://github.com/jeanschmidt due to Requires codev to land internal test changes
- give warnings of converting int64 for reduction ops - use cast tensor for reduction sum on trace - unblock trace from running Pull Request resolved: pytorch#95231 Approved by: https://github.com/razarmehr
Pull Request resolved: pytorch#94970 Approved by: https://github.com/ezyang
…#95272) Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#95272 Approved by: https://github.com/DenisVieriu97
…ch#95078) - Fixes convolution crashes in backward with weights - Removes unnecessary contiguous calls Pull Request resolved: pytorch#95078 Approved by: https://github.com/kulinseth
This would fix the issue with `__rdiv__` with float16 Pull Request resolved: pytorch#94952 Approved by: https://github.com/kulinseth
Fixes formatting so that the merge rule shows up on a different line than the "Raised by" text Follow up to pytorch#94932 New version <img width="433" alt="image" src="https://user-images.githubusercontent.com/4468967/220441349-ac99096d-590a-42c1-b995-4a23b2d9b810.png"> Pull Request resolved: pytorch#95234 Approved by: https://github.com/huydhn
…" (pytorch#95209) This reverts commit 4e88547. Pull Request resolved: pytorch#95209 Approved by: https://github.com/albanD
pytorch#95264) Pull Request resolved: pytorch#95264 Approved by: https://github.com/rohan-varma
Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#95213 Approved by: https://github.com/kulinseth, https://github.com/soulitzer
Remove mps specialized path in BCE backward as `logit` op has been implemented for mps. Pull Request resolved: pytorch#95220 Approved by: https://github.com/soulitzer
Pull Request resolved: pytorch#94714 Approved by: https://github.com/soulitzer, https://github.com/albanD
Summary: nccl backend does not support `tag` as mentioned in pytorch#94819. Adding a note in the documentation for it. Example: <img width="888" alt="image" src="https://user-images.githubusercontent.com/14858254/220464900-094c8063-797a-4bdc-8e25-657f17593fe9.png"> Differential Revision: D43475756 Pull Request resolved: pytorch#95236 Approved by: https://github.com/awgu, https://github.com/rohan-varma
…ch#95245) Currently, transformer creates proxy objects directly for get_attr method. node.meta is lost in this step. In order to keep it, we invoke tracer.create_proxy. Meta data is copied over in tracer.create_proxy and tracer.create_node. Pull Request resolved: pytorch#95245 Approved by: https://github.com/SherlockNoMad, https://github.com/tugsbayasgalan
I am still reading Dynamo source code... This is an easy PR to simplify `Source.is_nn_module()` to reuse `GuardSource.is_nn_module()` instead of having the `in (...)` check implemented twice. While simplifying that, I thought I might as well add some type annotations for `Source` methods. Pull Request resolved: pytorch#95292 Approved by: https://github.com/ezyang
This handles the disabling masks if numel is a multiple of BLOCK. It currently introduces a performance regression, but the triton it generates does not seem to have any issues: all the change does is cause xmask to be removed from load/stores in cases where it safely can be removed. It seems it must be coming from some issue in triton optimizer. FWIW, if you try this change with current triton master (instead of pinned version) it does _not_ cause a performance regression. However, upgradign to triton master by itself already causes significant performance regressions so it's not an option to just bump up the pin. I'm going to leave this PR open until we manage to increase the triton pin past the big refactoring. Once we do that I will check if it still causes a performance regression. UPDATE: The triton pin has been moved and I retried this PR. As expected, there's no longer a performance regression for hf_Bert: ``` tspin python benchmarks/dynamo/torchbench.py --performance --backend inductor --float16 --training --batch-size-file $(realpath benchmarks/dynamo/torchbench_models_list.txt) --only hf_Bert -n 5 --diff-branch viable/strict 2> err batch size: 16 cuda train hf_Bert numel_BLOCK 1.175x p=0.00 batch size: 16 cuda train hf_Bert viable/strict 1.161x p=0.00 ``` Re-opening this, should be okay to merge now I expect. Pull Request resolved: pytorch#92749 Approved by: https://github.com/jansel
Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#95314 Approved by: https://github.com/yinghai, https://github.com/voznesenskym
Summary: bypass-github-export-checks use `dinfo.name` instead of `repr(dinfo)`, as initial results have shown that `dinfo.total_memory` may unexpectedly fluctuate Test Plan: sandcastle + CI Differential Revision: D43503558 Pull Request resolved: pytorch#95302 Approved by: https://github.com/bertmaher
…ing (pytorch#95249) Summary: This change adds input shape when CoreML throws an errors. Test Plan: testMCSModelInvalidInputShape tests that the assert throws when invalid input shapes are provided. Differential Revision: D43449112 Pull Request resolved: pytorch#95249 Approved by: https://github.com/mcr229
Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#95324 Approved by: https://github.com/bdhirsh
) This PR adds back some explanation for why we have the heuristic to only register the post-backward hook on the first forward in the case of multiple forwards. Pull Request resolved: pytorch#95326 Approved by: https://github.com/fegin
) Pull Request resolved: pytorch#95330 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch
Pull Request resolved: pytorch#95322 Approved by: https://github.com/clee2000
Pull Request resolved: pytorch#95176 Approved by: https://github.com/ezyang
Temporary Fix for pytorch#95312 In triton, 1 warp computes 16x16 tile of output, so for 32x32 block we only need 4 warps. 8 warps IMA, which is a bug, but it's not a good config anyway. Triton main is supposed to have better behavior for these pathological, but we are not on main yet. Pull Request resolved: pytorch#95339 Approved by: https://github.com/ezyang, https://github.com/Chillee
…orch#95311) Fixes pytorch#95266 Pull Request resolved: pytorch#95311 Approved by: https://github.com/cpuhrsch
… CI node (pytorch#95402) Fixes pytorch#95155 which breaks CI and no nvfuser python tests are run on CI nodes. Thanks to @davidberard98 for noticing this. Pull Request resolved: pytorch#95402 Approved by: https://github.com/davidberard98
…torch#95200) Changes: - => this PR: pytorch#95200 1. Recognize `.py.in` and `.pyi.in` files as Python in VS Code for a better development experience. 2. Fix deep setting merge in `tools/vscode_settings.py`. - pytorch#95267 3. Use `Namedtuple` rather than `namedtuple + __annotations__` for `torch.nn.utils.rnn.PackedSequence_`: `namedtuple + __annotations__`: ```python PackedSequence_ = namedtuple('PackedSequence_', ['data', 'batch_sizes', 'sorted_indices', 'unsorted_indices']) # type annotation for PackedSequence_ to make it compatible with TorchScript PackedSequence_.__annotations__ = {'data': torch.Tensor, 'batch_sizes': torch.Tensor, 'sorted_indices': Optional[torch.Tensor], 'unsorted_indices': Optional[torch.Tensor]} ``` `Namedtuple`: Python 3.6+ ```python class PackedSequence_(NamedTuple): data: torch.Tensor batch_sizes: torch.Tensor sorted_indices: Optional[torch.Tensor] unsorted_indices: Optional[torch.Tensor] ``` - pytorch#95268 4. Sort import statements and remove unnecessary imports in `.pyi`, `.pyi.in` files. 5. Format `.pyi`, `.pyi.in` files and remove unnecessary ellipsis `...` in type stubs. Pull Request resolved: pytorch#95200 Approved by: https://github.com/janeyx99
…e annotated `NamedTuple` (pytorch#95267) Changes: - pytorch#95200 1. Recognize `.py.in` and `.pyi.in` files as Python in VS Code for a better development experience. 2. Fix deep setting merge in `tools/vscode_settings.py`. - => this PR: pytorch#95267 3. Use `Namedtuple` rather than `namedtuple + __annotations__` for `torch.nn.utils.rnn.PackedSequence_`: `namedtuple + __annotations__`: ```python PackedSequence_ = namedtuple('PackedSequence_', ['data', 'batch_sizes', 'sorted_indices', 'unsorted_indices']) # type annotation for PackedSequence_ to make it compatible with TorchScript PackedSequence_.__annotations__ = {'data': torch.Tensor, 'batch_sizes': torch.Tensor, 'sorted_indices': Optional[torch.Tensor], 'unsorted_indices': Optional[torch.Tensor]} ``` `Namedtuple`: Python 3.6+ ```python class PackedSequence_(NamedTuple): data: torch.Tensor batch_sizes: torch.Tensor sorted_indices: Optional[torch.Tensor] unsorted_indices: Optional[torch.Tensor] ``` - pytorch#95268 4. Sort import statements and remove unnecessary imports in `.pyi`, `.pyi.in` files. 5. Format `.pyi`, `.pyi.in` files and remove unnecessary ellipsis `...` in type stubs. Pull Request resolved: pytorch#95267 Approved by: https://github.com/janeyx99
Summary: xcit_large_24_p8_224 occasionally hits TIMEOUT on CI. Bump up the limit to reduce flakiness. Pull Request resolved: pytorch#95787 Approved by: https://github.com/ezyang, https://github.com/ZainRizvi
…5792) Signed-off-by: Edward Z. Yang <[email protected]> Pull Request resolved: pytorch#95792 Approved by: https://github.com/Skylion007
Continuation of PR pytorch#93153 where I implemented logaddexp for complex, but didn't expose it to `torch.logaddexp`. So this PR is to expose the complex logaddexp to `torch.logaddexp`. Pull Request resolved: pytorch#95717 Approved by: https://github.com/lezcano
Fixes pytorch#88098 This is the rebased and retry merging branch of the reverted PR: pytorch#94597 Pull Request resolved: pytorch#94899 Approved by: https://github.com/kit1980
jenkins retest this please |
3 similar comments
jenkins retest this please |
jenkins retest this please |
jenkins retest this please |
http://rocmhead:8080/job/pytorch/job/pytorch-ci/625/:
|
jenkins retest this please |
http://rocmhead:8080/job/pytorch/job/pytorch-ci/626/:
|
jenkins retest this please (with rocAutomation scripts updated to pip install requirements-ci.txt) |
arrgh: http://rocmhead:8080/job/pytorch/job/pytorch-ci/641/:
|
http://rocmhead:8080/job/pytorch/job/pytorch-ci-multibranch/job/PR-1194/1/display/redirect:
Re-disabled this test by reopening the issue pytorch#93045 which got closed automatically by a bot since no failures seen upstream in 200 runs. But local testing is able to reproduce these failures consistently as per @jaglinux and my observations.
|
jenkins notest pytorch |
1 similar comment
jenkins notest pytorch |
http://rocmhead:8080/job/pytorch/job/pytorch-ci/653/
These tests are skipped upstream though eg. https://ossci-raw-job-status.s3.amazonaws.com/log/pytorch/vision/12126909908
Need to figure out why these tests are not being skipped in our runs, but doesn't seem to be a blocker for IFU. cc @lcskrishna any ideas? |
@jithunnair-amd In the latest release of torchvision, there is an introduction of new transformsv2 API and few of the old APIs like torchvision.transforms.functional_pil are getting deprecated and will become private from 0.17. |
|
conflicts.txt