forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit 43f36d7

JakubPietrakIntel
Squashed commit of the following:
commit 63ebc8d6a000199e963d29b6c8a0f54d3150872b
Author: Jakub Pietrak <[email protected]>
Date: Thu Dec 1 13:32:03 2022 +0100
rm print
commit 2c8ffeaf1b2168ed9ad4ca6b192a1231fb036760
Author: Jakub Pietrak <[email protected]>
Date: Thu Dec 1 11:35:02 2022 +0100
pytorch_sparse.matmul to torch.sparse.matmul
commit ee0e184a1ce5dc6ad7005a67621fac19d6fdbb0b
Merge: 4562359b9f 3a858ba8e3
Author: Jakub Pietrak <[email protected]>
Date: Mon Nov 28 14:09:42 2022 +0100
Merge branch 'gh/mingfeima/85/head' of https://github.com/pytorch/pytorch into pyg-36
commit 4562359b9fb3de301690334a892d44911eda45c8
Merge: deba083400 b5616cd5f4
Author: Jakub Pietrak <[email protected]>
Date: Mon Nov 28 12:22:11 2022 +0000
Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36
commit deba0834008ad95af7e3a6603223a0f8a5555967
Merge: 0e1a8522bb a97d0508cb
Author: Jakub Pietrak <[email protected]>
Date: Mon Nov 28 12:19:25 2022 +0000
Merge branch 'pyg-36' of https://github.com/JakubPietrakIntel/pytorch into pyg-36
commit 0e1a8522bb695387816a29bbfcf182962429b3ab
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <[email protected]>
Date: Mon Nov 28 12:16:35 2022 +0000
Merge remote-tracking branch 'origin/gh/mingfeima/85/head' into pyg-36
commit b5616cd5f4fc150138b79d3396a603eda6a7a8a8
Author: Michael Voznesensky <[email protected]>
Date: Mon Nov 28 05:12:37 2022 +0000
Add simple assert to detect fake tensors on modules (#89723)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89723
Approved by: https://github.com/ezyang
commit db1f1144f1303db45e0b9d96e4bb6bdd87c80e5a
Author: Edward Z. Yang <[email protected]>
Date: Sat Nov 26 13:52:28 2022 -0800
Beef up AOTAutograd logging with aot_id and input descriptions (#89710)
A few things in this PR, that I found useful while debugging some
recent issues:
- We now allocate an aot_id to each aot_function/aot_module invocation,
and print it whenever we report error messages and graph output
logging. Check the comment for why this sort of thing is useful,
and also why it's different from nth_graph. This number is now
incorporated into aot_graph_name
- I noticed that nth_graph only gets incremented when backwards is
compiled. Because backwards is compiled lazily, this means that
multiple forward graphs would have gotten the same ID! I change
nth_graph to always increment to avoid confusion here.
- I added a simple describe_input function, which makes use of
num_params_buffers to tell the user if the input index they're
looking at is a param/buffer or an input. With the help of
https://github.com/pytorch/pytorch/pull/89709 we could give
even more detailed information about inputs (we could also
easily give detailed information about parameters if we stored
a mapping of index to parameter name, but I didn't need this
when debugging so I'll let someone else add it if they need
it.)
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89710
Approved by: https://github.com/bdhirsh
commit 5f8848f32901e35cead64d520885f718679c2bbe
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 15:26:55 2022 -0500
Don't suppress log messages for dynamo CI config (#89653)
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89653
Approved by: https://github.com/albanD, https://github.com/kit1980
commit 1a2dd6b15e0089a9e45ba4feb90c2d0dfac19238
Author: Edward Z. Yang <[email protected]>
Date: Sun Nov 27 19:27:45 2022 -0500
Add single process version of dynamo distributed hf_Bert tests (#89721)
It's a lot easier to debug problems in the Dynamo optimization pass if
you aren't actually triggering a multiprocessing run. Keep these tests
around.
I think the other tests can probably get this treatment too, leaving
this to future work.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89721
Approved by: https://github.com/voznesenskym
commit 0e7c100c9b7417efb1a8f65778a1e3c9ad10ef3e
Author: Edward Z. Yang <[email protected]>
Date: Sat Nov 26 11:25:24 2022 -0800
Add debug asserts to AOTAutograd for input consistency with compilation (#89702)
Fixes https://github.com/pytorch/torchdynamo/issues/1927
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89702
Approved by: https://github.com/bdhirsh
commit 1f95f24d3003a35568a00b5e5e18439846089b0f
Author: Edward Z. Yang <[email protected]>
Date: Sat Nov 26 11:25:24 2022 -0800
Factor input deduplication into a separate function (#89701)
It turns out that instead of having a giant blobby aot_dispatch_autograd
function, we can factor it into a series of wrapper functions, each
of which successively guarantees more invariants on the inner
compilation function until the final inner function is quite trivial.
How exactly you have to wrap the input user functions and the output
compiled functions can be expressed concisely in Haskell, so I've
included the Haskell formulation in code comments.
This PR shows how to do this for input deduplication. Dealing with the
rest of the view handling is left to future work.
This PR should also be a slight performance improvement as deduplicating
is skipped entirely when there are no duplicate inputs.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89701
Approved by: https://github.com/bdhirsh
commit dcefc8f90fbc86041a7abcce4f227d15c59bd96c
Author: Edward Z. Yang <[email protected]>
Date: Sat Nov 26 14:28:56 2022 -0500
Implement guard_source on RandomValueSource (#89711)
I audited the pattern matches on the enum and it didn't
look like this one should apply there.
Sorry, no test, I know this matters on symbolic-shapes branch
but I haven't had time to extract out a minimal reproducer.
Take my word for it.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89711
Approved by: https://github.com/jansel
commit 1da633f98a5da000083c0c47d9e192b2689f867b
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 13:57:17 2022 +0000
Access named parameters/buffers/etc via getattr rather than index (#89625)
I'm not sure why this never caused problems before. The error
manifests as `TypeError: 'MyModule' object is not subscriptable`
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89625
Approved by: https://github.com/albanD
commit e36d68af8885f27d8c0b4727ab078bf53e55e7a0
Author: Horace He <[email protected]>
Date: Thu Nov 24 02:17:37 2022 +0000
Don't allow recomputing a node that *must* be materialized in the backwards pass (#89171)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89171
Approved by: https://github.com/ngimel
commit b709078dc673cbd5025a1df3eae7f5c60acc2698
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:21 2022 -0800
[Profiler] Memory profiler part 11: Mark tensors created in the backward pass which don't correspond to parameters. (#88926)
There are various Tensors created in the backward pass which do not correspond to parameters. We don't want to mark these as gradients, but we do still want to convey as much information as possible. Thus, this PR introduces an AUTOGRAD_DETAIL category. (Which can be grouped with GRADIENT in visualization if one wishes to take a coarse grained view of the world.)
Differential Revision: [D40868661](https://our.internmc.facebook.com/intern/diff/D40868661/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88926
Approved by: https://github.com/chaekit
commit 143d2881a844934c95c4ada63b38179d97e65af3
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:19 2022 -0800
[Profiler] Memory profiler part 10: Mark optimizer state (#88925)
This is also a fairly simple pass, since we're simply collecting values from the python tracer.
Differential Revision: [D40868664](https://our.internmc.facebook.com/intern/diff/D40868664/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88925
Approved by: https://github.com/chaekit
commit ae725d501e33ed6f823997bea03d99cdc8dae5ff
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:18 2022 -0800
[Profiler] Memory profiler part 9: Mark activations (#88924)
This is a fairly straightforward pass: start at inputs and flood fill until we reach the backward pass.
Differential Revision: [D40868662](https://our.internmc.facebook.com/intern/diff/D40868662/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88924
Approved by: https://github.com/chaekit
commit 56e40fe054ecb7700142ea9ae7fe37e77800a2da
Author: Yuxin Wu <[email protected]>
Date: Sun Nov 27 05:55:24 2022 +0000
Let SyncBatchNorm fallback to BN if not using distributed training (#89706)
Fixes #63662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706
Approved by: https://github.com/soumith
commit 39449ea61d9a6644731687219282f610cbf7cf54
Author: PyTorch MergeBot <[email protected]>
Date: Sun Nov 27 02:59:04 2022 +0000
[vision hash update] update the pinned vision hash (#89692)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89692
Approved by: https://github.com/pytorchbot
commit 483d3a3d07e6694757c5158bc21f7f757f8c82c3
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:16 2022 -0800
[Profiler] E2E expecttests for category assignment (#88653)
Up until now the unit tests for category assignment have been narrowly scoped to specific checks on specific Tensors. However as we start to reach reasonable levels of category assignment it's useful to supplement those tests with higher level summary tests to inspect the larger graph and confirm that it makes sense. (It will also be necessary for some categories like activations where it is tedious to record all relevant Tensors.)
The general structure of these tests is to capture a model invocation with `__torch_dispatch__` and then cross reference those inputs and outputs with the categories assigned by the memory profiler.
Differential Revision: [D40868659](https://our.internmc.facebook.com/intern/diff/D40868659/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88653
Approved by: https://github.com/chaekit
commit 0435894bb3b2d60e5da9f993c2a56d95fb03a971
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:14 2022 -0800
[Profiler] Memory profiler part 8: Mark parameters. (#87568)
Following the pattern of earlier PRs, we use two methods to extract parameters. The primary one is the Python tracer; both nn.Module and optim.Optimizer collect parameters and in most cases that is sufficient. As a fallback we can analyze the data flow graph and deduce likely parameters based on gradient computation and updates.
Parameter identification has a circular interaction with input identification. Inputs are defined as "not part of the core forward-backward-update loop", but we need inputs for the parameter identification fallback to give us a proxy for the forward pass. Thus, we mark parameters from the python tracer which limits which Tensors get marked as inputs. While not necessary, it adds a bit of robustness. (As shown by the strengthening of the input unit tests.)
Differential Revision: [D40238619](https://our.internmc.facebook.com/intern/diff/D40238619/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87568
Approved by: https://github.com/chaekit
commit 17fa6bf1f57cbbe84a14566efcf00f21e1abe489
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:13 2022 -0800
[Profiler] Memory profiler part 7: Mark inputs (#87567)
It is surprisingly difficult to identify the leaves of the data flow graph. The issue is that inputs and pre-existing parameters look identical until parameter identification takes place. It's not too bad for training since Autograd lets us differentiate between them however I still want the tool to do something reasonable in inference.
Some of this will be ameliorated when a later PR pulls in parameters from python tracing. The current approach is passable, but I will continue to mull over refinements.
Differential Revision: [D40220388](https://our.internmc.facebook.com/intern/diff/D40220388/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87567
Approved by: https://github.com/chaekit
commit 64c5c77cd47212da719eb29c3b0a2b07cebb3705
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:11 2022 -0800
[Profiler] Memory profiler part 6: Mark gradients and temporary intermediates. (#87566)
Semantic assignment will be built up as a series of passes which gradually pin down the regions of a trace. For this reason it is important to be very meticulous in the assignment of categories.
We begin with gradients as they are both straightforward to identify and foundational to subsequent analysis. There are two mechanisms that the profiler can use to tag gradients, each with their own advantages and limitations. The first is direct inspection of the op graph which is generic but predicated on certain features of the Autograd engine. (And therefore not necessarily exhaustive.) The second approach is direct instrumentation via the python tracer. This method relies requires that gradients be attached to an nn.Module parameter and can miss corner cases such as `set_to_none=True` due to the cache structure of the python tracer. Combined these two approaches provide very high coverage.
Temporaries are more straightforward; we can easily add them by trivial local inspection of a data flow node.
Because this is the first PR in the end-to-end section most of the code is building the scaffolding for category bookkeeping and unit testing. (The actual gradient extraction was covered in an earlier PR.)
Differential Revision: [D40220389](https://our.internmc.facebook.com/intern/diff/D40220389/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87566
Approved by: https://github.com/chaekit
commit 5f09a6d573a2a07c00c76c3cbdbffe0fafe2436d
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:09 2022 -0800
[Profiler] Memory profiler part 5: Data flow graph (#87006)
The semantic meaning of a Tensor is tightly coupled to its lineage. The data flow graph allows us to identify temporary Tensors, masks, inputs, activations, and more. However one important nuance is that Tensors must be versioned; operations which mutate their inputs can also change the semantic meaning of said inputs.
It is challenging to assemble a complete picture of the data flow in a PyTorch model because ops can, and often do, recursively call into other ops. For the purpose of memory profiling this is an implementation detail, so instead we traverse the op tree to identify top level ops and allocations and then coalesce their children, folding inputs and outputs into the top level Node.
Differential Revision: [D40220391](https://our.internmc.facebook.com/intern/diff/D40220391/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87006
Approved by: https://github.com/chaekit
commit c3116dd78b294f1bd3f6424dc1bfb7ff86bb0a66
Author: Taylor Robie <[email protected]>
Date: Sat Nov 26 10:33:08 2022 -0800
[Profiler] Memory profiler part 4: Select top level torch ops (#86880)
In a later PR we will walk the children of these nodes and formulate a node from the entire bundle to build a data flow graph. This PR simply defines what a "top level" op is.
Differential Revision: [D40220387](https://our.internmc.facebook.com/intern/diff/D40220387/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86880
Approved by: https://github.com/chaekit
commit bb77accb4c996e3aab9ae4b665fb8464400c8194
Author: Jiong Gong <[email protected]>
Date: Sat Nov 26 14:06:44 2022 +0000
[Inductor] Record cpp kernel in PyTorch Profiler (#89367)
Add an option `config.cpp.enable_kernel_profile` to record individual cpp kernel time in PyTorch Profiler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89367
Approved by: https://github.com/jansel
commit 36018a6ee63f140b95ad644d09920798b0c624f8
Author: Edward Z. Yang <[email protected]>
Date: Fri Nov 25 13:48:35 2022 -0800
Don't suppress exceptions from backends (#89656)
Taken from voz's https://github.com/pytorch/pytorch/pull/89392
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89656
Approved by: https://github.com/voznesenskym
commit 3e20d023b1f442ebe59e76604395cd8d4abed52a
Author: Natalia Gimelshein <[email protected]>
Date: Sat Nov 26 03:08:23 2022 +0000
put descriptive kernel names behind config (#89697)
Per title, generated kernel names are often long and confusing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89697
Approved by: https://github.com/Chillee
commit 591dfffa38848de54b7f5f4e49260847024c9281
Author: jlukehubbard <[email protected]>
Date: Fri Nov 25 21:31:53 2022 +0000
update docstring for torch.linalg.lstsq (#89383)
Previous documentation lacked details about the handling of over- and underdetermined systems, and made incorrect mention of MAGMA.
Fixes #85021
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89383
Approved by: https://github.com/lezcano
commit c9a0cc86407d7ec20524b0e26305109d0cf2b5c2
Author: Edward Z. Yang <[email protected]>
Date: Fri Nov 25 03:31:20 2022 +0000
Simplify aot_module_simplified by removing top_args/top_kwargs (#89666)
This makes good on Chillee's CR comment at
https://github.com/pytorch/functorch/pull/660/files/af30d351cc93dfafb5a94dbcb32983c5ef65fd6a#r843315222
which was never done in the original PR.
There is no logic change, just unpack the args/kwargs at the top
level and remove the inner function indirection.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89666
Approved by: https://github.com/voznesenskym
commit 6168f22fae66da5703e087bcd10076921ca157e7
Author: Edward Z. Yang <[email protected]>
Date: Fri Nov 25 03:31:19 2022 +0000
Don't support kwargs at runtime in aot_module_simplified (#89664)
The preexisting logic here added in
https://github.com/pytorch/functorch/pull/970 was very peculiar: if top_kwargs
was non-empty, then the inner compiled function supports kwargs. Naively, this
would leave you to expect that there is some sort of correlation between
top_kwargs and kwargs. But in fact, they're completely unrelated! top_kwargs
is the AOTAutograd configuration knobs (e.g., fw_compiler/bw_compiler), but
kwargs is the RUNTIME kwargs that are to be passed to the compiled function.
But (1) we don't support this (the function to be compiled only takes a list
of tensors) and (2) even if we did support it, conditioning on whether or not
you had passed AOTAutograd configuration kwargs to support kwargs at runtime
is bonkers.
So delete it.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89664
Approved by: https://github.com/voznesenskym
commit b04dda4291f1d30b064572e4521e82fa2573af77
Author: Edward Z. Yang <[email protected]>
Date: Fri Nov 25 03:31:19 2022 +0000
Delay verify correctness wrapping to call site. (#89662)
There is only one call site for compiler_fn, so we can safely delay
wrapping verify correctness to here. This will help later when we
change the backend compiler calling convention to pass fake tensors
(but I need to pass real tensors here.)
This is adapted from voz's changes at https://github.com/pytorch/pytorch/pull/89392
but with less changes to the substantive logic. I only moved the relevant
inner implementation; there are no changes otherwise.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89662
Approved by: https://github.com/voznesenskym
commit 61a3fe4b6409965223273c1098f9a77ff071efe1
Author: Natalia Gimelshein <[email protected]>
Date: Fri Nov 25 19:42:38 2022 +0000
make inductor correctly propagate nans for maximum and minimum (#89612)
Partially fixes https://github.com/pytorch/torchdynamo/issues/594
Also, small cleanup for `where` codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89612
Approved by: https://github.com/soumith, https://github.com/jansel
commit 70c0a3006ee96b3db1f531109fc383f8159e2d2f
Author: Ikko Ashimine <[email protected]>
Date: Fri Nov 25 19:26:18 2022 +0000
Fix typo in segment_reduction_op_gpu.cu (#89647)
menber -> member
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89647
Approved by: https://github.com/kit1980
commit 2c0bd85c755043d696452ddab354f3ff6775738b
Author: kshitij12345 <[email protected]>
Date: Fri Nov 25 14:53:57 2022 +0000
complex: register c10::complex with py::cast (#89680)
Fixes #77134
TODO:
* [x] Add test (tested locally with script below) (Are there similar tests in the test-suite?)
```c++
namespace py = pybind11;
int main() {
py::scoped_interpreter guard{}; // start the interpreter
auto casted_cdouble = py::cast(c10::complex<double>(1.0, 2.0));
assert(
(c10::complex<double>(1.0, 2.0) ==
py::cast<c10::complex<double>>(casted_cdouble)));
auto casted_cfloat = py::cast(c10::complex<float>(1.0, 2.0));
assert(
(c10::complex<double>(1.0, 2.0) ==
py::cast<c10::complex<double>>(casted_cfloat)));
auto casted_chalf = py::cast(c10::complex<at::Half>(1.0, 2.0));
assert(
(c10::complex<double>(1.0, 2.0) ==
py::cast<c10::complex<double>>(casted_chalf)));
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89680
Approved by: https://github.com/ezyang
commit a97d0508cb5259951bc48300fb914cebdf322bb9
Merge: 849be586e6 abb446af8c
Author: Jakub Pietrak <[email protected]>
Date: Fri Nov 25 15:24:54 2022 +0100
Merge branch 'master' of https://github.com/pytorch/pytorch into pyg-36
commit 849be586e649421ba58182feb9067a4ac65479e3
Merge: 059a238619 75bfbc35ca
Author: Jakub Pietrak <[email protected]>
Date: Fri Nov 25 14:25:40 2022 +0100
Merge branch 'gh/mingfeima/85/head' into pyg-36
commit abb446af8c65a49bbc3767e14605a73d244c176b
Author: Alvaro Gaona <[email protected]>
Date: Fri Nov 25 11:09:28 2022 +0000
Implement old windows in Python (#87082)
Relates to #85366
- Bartlett, Blackman, Hamming, Hann.
- Except Kaiser which will be in a different PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87082
Approved by: https://github.com/mruberry, https://github.com/lezcano
commit 059a238619b122f922c569c618919a277420e483
Merge: 26ba2e9751 95ea47ef0c
Author: Jakub Pietrak <[email protected]>
Date: Fri Nov 25 10:00:53 2022 +0100
Merge branch 'pytorch:master' into jpietrak/pyg-36
commit 95ea47ef0c1cffe1fe05cc36bdc47c26cc72f13e
Author: Jason Ansel <[email protected]>
Date: Fri Nov 25 04:28:36 2022 +0000
torchdynamo to torch._dynamo in aot_autograd.py (#89385)
Test Plan: Run torchbench models
Differential Revision: D41429573
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89385
Approved by: https://github.com/soumith, https://github.com/malfet
commit 69043247819042db18ac9526c2d747fa61fe8880
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 12:00:13 2022 -0800
Remove fake_tensor_propagation (#89646)
You always have to run dynamo with fake tensors.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89646
Approved by: https://github.com/soumith
commit 1aa1014b262b75d4269d9a4d8b562c6ee43a0991
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 12:00:12 2022 -0800
xfail maml test, instead of running it without fake tensor prop (#89645)
A previous version of this patch graph breaks when torch.tensor fails, but that causes
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/nn/test_embedding.py -k test_embedding_bag_1D_padding_idx_cpu_float32
```
to start failing. Probably another latent bug that needs investigating.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89645
Approved by: https://github.com/albanD
commit a048913e2530442360c36a48420079ca9ebca149
Author: PyTorch MergeBot <[email protected]>
Date: Fri Nov 25 03:03:41 2022 +0000
[vision hash update] update the pinned vision hash (#89667)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89667
Approved by: https://github.com/pytorchbot
commit 3b3ebcd031b68762938806f541d7247a1521bb11
Author: XiaobingSuper <[email protected]>
Date: Thu Nov 24 02:33:01 2022 -0500
TorchDynamo: weight prepack for single conv (#89209)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89209
Approved by: https://github.com/jgong5, https://github.com/jansel
commit 0c4f3db7bf24e94125c6802718a1105ee548c953
Author: XiaobingSuper <[email protected]>
Date: Thu Nov 24 02:32:59 2022 -0500
TorchDynamo: weight prepack for mkl linear (#89109)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89109
Approved by: https://github.com/jgong5, https://github.com/jansel
commit 07151a6bd62e308b6b32e2e0edfc4d5f0563576e
Author: XiaobingSuper <[email protected]>
Date: Thu Nov 24 02:32:55 2022 -0500
TorchDynamo: weight prepack for onednn convolution external call (#88988)
This PR is about enabled weight prepack using the MKLDNN tensor:
1. enable fake tensor mode for MKLDNN tensor input.
2. make convolution fusion kernel support MKLDNN tensor input.
3. do the weight prepack at FX fusion step.
For better performance, we always use channels_last for CPU convolution path. because we test that the channels_last path can get a better performance than block input path, and also avoid the activation's layout conversion(plain to block, block to plain), currently, there only need plain to plain format conversion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88988
Approved by: https://github.com/jgong5, https://github.com/jansel
commit 0884fdaba0280e3f3ad2abc34c0940587f744886
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 14:31:00 2022 -0500
Revert "Dont clone unmutated args in triton autotuning (#89519)" (#89652)
This reverts commit f18f0c70ab10c400947e71be30794e04dcc22acf.
Testing to see if this fixes gmixer_24_224 mixer_b16_224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89652
Approved by: https://github.com/eellison
commit 4a16f8cdb26be3561742e86f184e59f65418fe63
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 09:00:09 2022 -0800
Reenable fake_tensor_propagation on test_cudnn_rnn (#89644)
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89644
Approved by: https://github.com/anjali411
commit fc7dcb684aa38da5b1534fc701657ee63af8909c
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 09:00:09 2022 -0800
Run optimizer tests with fake tensors (#89643)
This is a slight regression: RAdam and Adagrad don't appear to
trace at all under fake tensors. But I think this is a more accurate
reflection of the current state of affairs.
Along the way fix some problems on the fake tensor path.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89643
Approved by: https://github.com/anjali411
commit 9b13508ef3a4e858fbbbf068b3a825f1632e8daa
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 09:00:08 2022 -0800
Force test_rng_state to run with fake tensor prop (#89641)
I'm not really sure what desertfire's intended follow up was
on https://github.com/pytorch/pytorch/pull/87490 because when I remove
the unsupported() call, dynamo tests pass. But the change here is
conservative and I think strictly better than the current situation.
The idea is to force fake tensor pop on for the test, and then just
observe that we are doing a graph break. Clearly, export doesn't work,
so I manually xfail it.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89641
Approved by: https://github.com/anjali411
commit c6be06d93ab911a3fbb185451c8cf42bcedad0c1
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 09:00:08 2022 -0800
Easy: These tests work with fake_tensor_propagation on (#89640)
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89640
Approved by: https://github.com/anjali411, https://github.com/albanD
commit 6fb6eb0a7498839e69302da7bf8c04205c64e0f3
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 08:11:48 2022 -0800
Support unspecialized integers with dynamic shapes (#89639)
Previously, we hackily wrapped unspecialized integers into
tensors and treated them as tensor inputs. Sometimes, downstream
operations would not be able to deal with the tensor input. Now,
we wrap them into SymInt, so more correct overload selection occurs.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89639
Approved by: https://github.com/anjali411
commit 0c96841a20f0ae9380ef26657914276a42c9c9d7
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 08:11:47 2022 -0800
Cond capture with fake tensors actually works; don't raise in this case (#89638)
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89638
Approved by: https://github.com/anjali411
commit d3c012f409a4e4d5a11070a90b5578da82778030
Author: kshitij12345 <[email protected]>
Date: Thu Nov 24 21:41:20 2022 +0000
[test_nn] split pruning tests from test_nn (#89590)
Ref: https://github.com/pytorch/pytorch/issues/63085
Note: Doesn't need corresponding XLA PR as the migrated tests were not run on XLA (as they weren't in TestNNDeviceType).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89590
Approved by: https://github.com/albanD
commit 83666f167dcf023d301f16fad82b9afb374ad836
Author: Aleksandar Samardžić <[email protected]>
Date: Thu Nov 24 14:44:12 2022 +0000
Added vectorized CPU code for uint8_t datatype. (#89284)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89284
Approved by: https://github.com/lezcano, https://github.com/peterbell10
commit 9497552771ca59c68509398ab3094e590a3047c5
Author: Howard Huang <[email protected]>
Date: Thu Nov 24 19:41:17 2022 +0000
Update SyncBatchNorm _all_gather_base to all_gather_into_tensor (#89521)
Summary: Fixes https://github.com/pytorch/pytorch/issues/88568
`_all_gather_base` is deprecated. So replacing its usage with `all_gather_into_tensor`
Test Plan: CI
Differential Revision: D41479983
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89521
Approved by: https://github.com/wz337
commit 94a88b53ed37854379813abf9641d1637fe2688b
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 08:11:46 2022 -0800
Remove fake_tensors_available (#89637)
As we are one repo now, they are always available.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89637
Approved by: https://github.com/anjali411
commit 1c8b0779de76d0c76d34835047106ab37b41790b
Author: Emilio Castillo <[email protected]>
Date: Thu Nov 24 18:25:26 2022 +0000
Fix segfault when swapping custom allocator (#89613)
Just screwed it before merging ...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89613
Approved by: https://github.com/albanD
commit fd279fe85b8f5a8e74c615436f0b180621b6ef52
Author: Edward Z. Yang <[email protected]>
Date: Thu Nov 24 09:23:05 2022 -0500
Make pytest work again on test/dynamo (#89631)
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89631
Approved by: https://github.com/anjali411
commit c3e85d879cdbd3973754760c6767c75276b1dca8
Author: albanD <[email protected]>
Date: Thu Nov 24 17:11:42 2022 +0000
Mention discrepency between original impl and our impl of RAdam (#89575)
Fixes https://github.com/pytorch/pytorch/issues/88836
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89575
Approved by: https://github.com/mruberry
commit 860bae49e4925868a0221ec4345d08407280bac7
Author: Edward Z. Yang <[email protected]>
Date: Wed Nov 23 08:04:31 2022 -0800
Suppress guards on as_strided call only. (#89569)
See comment in meta_utils.py for the whole story.
This doesn't have a substantive impact yet, but will in the next
PR on the stack.
Signed-off-by: Edward Z. Yang <[email protected]>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89569
Approved by: https://github.com/albanD
commit 1588ea0dbf16f37ce14cfc8764666985c16ccbf9
Author: mfkasim1 <[email protected]>
Date: Thu Nov 24 11:11:51 2022 +0000
Added log1p for complex in c10 (#89214)
One PR towards #89205.
The content is mostly from PR #38465, but slightly changed the expression to make it faster.
Here are some benchmarking code:
```c++
// main.cc
template<typename T> inline std::complex<T> log1p_v0(const std::complex<T> &z) {
// this PR
T x = z.real();
T y = z.imag();
T theta = std::atan2(y, x + T(1));
T r = x * (x + T(2)) + y * y;
return {T(0.5) * std::log1p(r), theta};
}
template<typename T> inline std::complex<T> log1p_v1(const std::complex<T> &z) {
// PR #38465
T x = z.real();
T y = z.imag();
std::complex<T> p1 = z + T(1);
T r = std::abs(p1);
T a = std::arg(p1);
T rm1 = (x * x + y * y + x * T(2)) / (r + 1);
return {std::log1p(rm1), a};
}
template<typename T>
inline std::complex<T> log1p_v2(const std::complex<T> &z) {
// naive, but numerically inaccurate
return std::log(T(1) + z);
}
int main() {
int n = 1000000;
std::complex<float> res(0.0, 0.0);
std::complex<float> input(0.5, 2.0);
auto start = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
res += log1p_v0(input);
}
auto end = std::chrono::system_clock::now();
auto elapsed = end - start;
std::cout << "time for v0: " << elapsed.count() << '\n';
start = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
res += log1p_v1(input);
}
end = std::chrono::system_clock::now();
elapsed = end - start;
std::cout << "time for v1: " << elapsed.count() << '\n';
start = std::chrono::system_clock::now();
for (int i = 0; i < n; i++) {
res += log1p_v2(input);
}
end = std::chrono::system_clock::now();
elapsed = end - start;
std::cout << "time for v2: " << elapsed.count() << '\n';
std::cout << res << '\n';
}
```
Compiling the script with command `g++ main.cc` produces the following results:
```
time for v0: 237812271
time for v1: 414524941
time for v2: 360585994
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89214
Approved by: https://github.com/lezcano
commit 4f5c4c022a8365d06ac401582958bbf0fd3f8337
Author: Jiewen Tan <[email protected]>
Date: Thu Nov 24 10:57:01 2022 +0000
[LTC] Refine MetricsArena::Reset (#89608)
Summary:
After counters are reset, getters' behaviors are inconsistent. To improve that, here I 1) move the validation of CounterData into CounterData::IsValid such that it's better encapsulated, 2) divide getters into two groups: a) MetricsArena::GetCounter() and b) MetricsArena::ForEachCounter(), and route MetricsArena::GetCounterNames() and CreateMetricReport() to use b.
This is paired with pytorch/xla#4217.
Test Plan:
PJRT_DEVICE=CPU python xla/test/test_metrics.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89608
Approved by: https://github.com/JackCaoG
commit a8629a1c18fd13300ce69c1d6042004038885cf0
Author: Jithun Nair <[email protected]>
Date: Thu Nov 24 10:53:20 2022 +0000
Upgrade nightly wheels to ROCm5.3 (#89101)
Dependent on PR https://github.com/pytorch/builder/pull/1193
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89101
Approved by: https://github.com/kit1980
commit c0d81aa70ce45a0c2e7ced6c9f42a92d15523188
Author: Ivan Yashchuk <[email protected]>
Date: Thu Nov 24 09:37:10 2022 +0000
Use fx.replace_pattern for removing empty_like+fill in nvFuser+PrimTorch execution (#89132)
I learned about `torch.fx.replace_pattern` and it's a cleaner way of removing unnecessary tensor materialization from the graph coming from tracing C++ code `1 - tensor`.
Test:
```
python -m pytest test/test_prims.py -k "test_silu_backward_no_filled_tensor"
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89132
Approved by: https://github.com/mruberry, https://github.com/jjsjann123
commit b515c1d96082214e81cc57ce2a1de9164b50206f
Author: Hao Guan <[email protected]>
Date: Thu Nov 24 08:14:24 2022 +0000
[QAT] Check the value of numel to avoid segfault (#81547)
Fixes #78123
Segmentation fault
RuntimeError: numel is out of the bound of input tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81547
Approved by: https://github.com/kit1980
commit 22a1b5e243e852e1c423c697e51975d1545d2a1b
Author: Vasiliy Kuznetsov <[email protected]>
Date: Wed Nov 23 13:01:15 2022 -0800
quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)
Summary:
This PR deprecates the `compute_dtype` field on observers, and replaces
it with the `is_dynamic` field on observers. This is better aligned
with the reference model spec.
Test plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
Approved by: https://github.com/jerryzh168
commit e4ccec6ecab9b48e804d58f60135f0950fca864f
Author: Yanbo Liang <[email protected]>
Date: Thu Nov 24 05:28:58 2022 +0000
[Dynamo] Fix bug of using customized torch.autograd.Function (#89397)
Fixes https://github.com/pytorch/torchdynamo/issues/1899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89397
Approved by: https://github.com/jansel
commit 903ae4570e401e5c4e42dc4a44cae37f805044a4
Author: Michael Lazos <[email protected]>
Date: Thu Nov 24 04:15:34 2022 +0000
Disable optimizer tracing, enable for tests only (#89500)
Disabling optimizer tracing before launch until it can be added to the benchmark suites without increasing compile times
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89500
Approved by: https://github.com/anijain2305
commit c79489c8e69f965f3e5af8f3f39df78e7d4732ba
Author: albanD <[email protected]>
Date: Thu Nov 24 03:39:55 2022 +0000
Expose to python the backward AD view_func (#89586)
This will be useful for other systems (AOTAutograd) that want to replay autograd views.
FYI @bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89586
Approved by: https://github.com/soulitzer
commit 4cb6bbbe27162c7b0835879131991d2155329718
Author: Nikita Karetnikov <[email protected]>
Date: Thu Nov 24 01:02:28 2022 +0100
Symintify `embedding` (#89327)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327
Approved by: https://github.com/ezyang
commit 9c867eae1a7fffb6f893717073150cff04a923a4
Author: Wu, Chunyuan <[email protected]>
Date: Wed Nov 23 20:10:41 2022 +0000
nnc: fix Store if value is fp32 while buf is bf16 (#86788)
Fixes https://github.com/pytorch/pytorch/issues/86533.
For the below graph:
```bash
[DUMP kernel.cpp:1690] TensorExprKernel graph:
[DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)):
[DUMP kernel.cpp:1690] %1 : int = prim::Constant[value=0]()
[DUMP kernel.cpp:1690] %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29
[DUMP kernel.cpp:1690] %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19
[DUMP kernel.cpp:1690] return (%3)
```
**Loop stmt before the fix:**
The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16.
```bash
[DEBUG llvm_codegen.cpp:489] After HalfRewriter {
[DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8);
[DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
[DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f;
[DEBUG llvm_codegen.cpp:489] }
[DEBUG llvm_codegen.cpp:489] }
```
**Loop stmt after the fix:**
```bash
[DEBUG llvm_codegen.cpp:489] After HalfRewriter {
[DEBUG llvm_codegen.cpp:489] aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8));
[DEBUG llvm_codegen.cpp:489] for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
[DEBUG llvm_codegen.cpp:489] aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f);
[DEBUG llvm_codegen.cpp:489] }
[DEBUG llvm_codegen.cpp:489] }
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788
Approved by: https://github.com/EikanWang, https://github.com/kit1980
commit f0e5bc4b9f231b438f76ddd13b2c21b7cb8a09ac
Author: Zhijing Li (Accelerator Enablement) <[email protected]>
Date: Thu Nov 24 02:18:32 2022 +0000
Symintified layer_norm (#89466)
Summary: As titled.
Test Plan:
```
buck2 run mode/opt scripts/wwei6:test_executorch
```
Differential Revision: D41451390
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89466
Approved by: https://github.com/frank-wei, https://github.com/ezyang
commit fdb2dd113d3aec0acb2a473de6be49940ab6a115
Author: Alexander Grund <[email protected]>
Date: Thu Nov 24 01:52:11 2022 +0000
Install missing VSX headers (POWER) (#85547)
E.g. `test_cpp_extensions_aot_ninja` fails as it includes `vec.h` which requires the vec/vsx/* headers and `sleef.h`. The latter is also required for AVX512 builds on non MSVC compilers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85547
Approved by: https://github.com/kit1980
commit e922bd4e523b0a30f6607f6497ac458571e00131
Author: Wei-Sheng Chin <[email protected]>
Date: Thu Nov 24 01:30:09 2022 +0000
[ONNX] Move two headers from .h to .cc (#86852)
As title. Header dependency should be as small as possible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852
Approved by: https://github.com/titaiwangms, https://github.com/BowenBao
commit 23fe2ff910fd1577281a2210d1184aff705191b8
Author: Shunting Zhang <[email protected]>
Date: Thu Nov 24 01:28:10 2022 +0000
verify the number of outputs of xla graph (#89536)
This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it.
List some code snippets here since their behavior is not straightforward at a first glance:
```
def forward(self, a, b, c):
"""
The XLA graph will only return the first 2 items
"""
return a + b, a + c, b
```
```
def forward(self, a, b, c):
"""
Inplace update on b cause it to be returned in XLA graph
"""
b.zero_()
return a + b, a + c, b
```
```
def forward(self, a, b, c):
"""
Even if we return b twice, the XLA graph only return b once.
"""
b.zero_()
return a + b, a + c, b, b
```
Here are what observed by the added tests:
1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them. Check ***test_direct_return***
2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input. The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update***
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536
Approved by: https://github.com/jansel
commit 0bde5149819e9854bca1363aa6c9f52f7db2496e
Author: Nikita Shulga <[email protected]>
Date: Thu Nov 24 00:57:17 2022 +0000
Add `c10::` namespace in front of `optional` (#89605)
Prep change for moving the codebase to C++17 standard
Was part of https://github.com/pytorch/pytorch/pull/85969
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89605
Approved by: https://github.com/weiwangmeta, https://github.com/kit1980
commit e19a7165fd1a9a35fcac42706c20e658776c10ab
Author: foram-chandra <[email protected]>
Date: Thu Nov 24 00:34:26 2022 +0000
[nn] Remove deprecation warning from nn.functional.{tanh, sigmoid} (#86905)
Fixes #65909
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86905
Approved by: https://github.com/albanD, https://github.com/kit1980
commit a00bd6f686d7a485f7bea5f971b7e793118842b8
Author: clee2000 <[email protected]>
Date: Wed Nov 23 23:48:32 2022 +0000
Don't run auto request review on forked PRs (#89583)
tested on https://github.com/pytorch/pytorch/pull/89581
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89583
Approved by: https://github.com/albanD, https://github.com/malfet
commit 0a1a53083e331b3648ad4cb6f750d130e3530731
Author: Nikita Karetnikov <[email protected]>
Date: Wed Nov 23 20:42:55 2022 +0000
[primTorch] Enable regex error testing for some refs (#87765)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87765
Approved by: https://github.com/mruberry
commit 3ad2a032f4924d58c556b80840f6d51aa8a4472b
Author: Nikita Shulga <[email protected]>
Date: Wed Nov 23 23:23:24 2022 +0000
Update default cmake to 3.18 (#89570)
Set `cmake.dir` to `/usr/local` in `.circleci/scripts/build_android_gradle.sh `
Prep change for raising compiler standard to C++17: cmake-3.18 is the first one to support CUDA17 language
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89570
Approved by: https://github.com/atalman
commit 8695f0cced016d43298b43a4baf30315061fdacd
Author: Jane Xu <[email protected]>
Date: Wed Nov 23 23:23:17 2022 +0000
Rectify `native_batch_norm` schema by splitting it into two legit schemas (#88697)
Using the same repro from the issue (but with BatchNorm2D)
Rectifies native_batch_norm schema by splitting the schema into 2:
1. one will have NON-optional alias-able running_mean and running_var inputs
2. the other will just not have those parameters at all (no_stats variation)
**Calling for name suggestions!**
I've added tests in test_functionalization.py as well as an entry in common_method_invocations.py for `native_batch_norm_legit`
CI should pass.
Because of bc/fc reasons, we reroute native_batch_norm to call our new schemas ONLY through the python dispatcher, but in 2 weeks or so, we should make `native_batch_norm_legit` the official batch_norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88697
Approved by: https://github.com/albanD
commit a00efe55c3790789b967facf10c3f426faa98155
Author: Everton Constantino <[email protected]>
Date: Wed Nov 23 22:46:29 2022 +0000
Fix CheckOutputStreamSetting on JitLoggingTest as it failed if logging wasn't enabled. (#82722)
`JIT_LOG` checks if logging was enabled for that particular file and when it isn't it doesn't output anything. Since the test checks for the size of `test_stream` it fails. I believe forcing the file to have logging enabled to see if the stream is being correctly set during test makes no sense so this patches just forcibly outputs and checks if it worked.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82722
Approved by: https://github.com/davidberard98
commit b8d3afd88665de5f01f696333d0ff291bd94a57b
Author: Huy Do <[email protected]>
Date: Wed Nov 23 22:39:36 2022 +0000
Skip upload test stats for test reports from rerun disabled tests workflow (#89548)
I have found the reason why uploading tests stats fails for rerun disabled workflow, for example https://github.com/pytorch/pytorch/actions/runs/3522896778/jobs/5917765699. The problem is that the pytest XML file is now too big to be processed quickly (x50 bigger). Unlike unittest, `pytest-flakefinder` used by rerun disabled tests for test_ops includes skipped messages multiple times (50 times by default, retrying and skipping). This slows down the upload test stats script too much (O(n)) because it tries to gather all the stats. On the other hand, `check_disabled_tests` doesn't suffer from the same issue because it ignores all these skipped messages.
This is a quick fix to skip test reports from rerun disabled tests workflow when trying to upload test stats.
I'll try to fix this properly later in the way we use pytest-flakefinder. From what I see, a zipped test report from rerun disabled test is only few MB ([example](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3521687954/1/artifact/test-reports-test-default-1-2-linux.2xlarge_9636028803.zip)), but will balloon up to a much bigger XML file after extracting from a dozen to a few hundred MB (text). The size of the zipped file is not a big immediate problem
[3521687954](https://github.com/pytorch/pytorch/actions/runs/3521687954) is an example workflow with rerun disabled tests and mem leak check. The script can now finish when running locally:
* `upload_test_stats` finishes around 3+ minutes
```
time python -m tools.stats.upload_test_stats --workflow-run-id 3521687954 --workflow-run-attempt 1 --head-branch master
...
Writing 8925 documents to S3
Done!
Writing 1760 documents to S3
Done!
Writing 1675249 documents to S3
Done!
python3 -m tools.stats.upload_test_stats --workflow-run-id 3521687954 1 185.69s user 12.89s system 75% cpu 4:22.82 total
```
* `check_disabled_tests` finishes within 3 minutes
```
time python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 --workflow-run-attempt 1 --repo pytorch/pytorch
...
python -m tools.stats.check_disabled_tests --workflow-run-id 3521687954 1 154.19s user 4.17s system 97% cpu 2:42.50 total
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89548
Approved by: https://github.com/clee2000
commit f18f0c70ab10c400947e71be30794e04dcc22acf
Author: Elias Ellison <[email protected]>
Date: Wed Nov 23 19:02:51 2022 +0000
Dont clone unmutated args in triton autotuning (#89519)
Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning. Any other pointers on where the overhead is coming from in autotuning would be great.
Edit: i think it's just the triton cache clearing https://github.com/openai/triton/blob/44f577984d28ee979f704e2c28a1dcbac9639840/python/triton/testing.py#L159
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89519
Approved by: https://github.com/ngimel, https://github.com/jansel
commit ac19c5be82febc2140d4601c98daf45646a399ab
Author: Peter Bell <[email protected]>
Date: Tue Nov 22 22:26:21 2022 +0000
FFT: disable dimension wrapping for scalar tensors (#89234)
Fixes #88985
By default, `maybe_wrap_dim` allows through `dim=0` or `dim=-1`
for scalar tensors which leads to an invalid dimension being used to
index into `tensor.sizes()` as in the code sample from the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89234
Approved by: https://github.com/mruberry
commit 50e2e4faf38c6ebafacc43b72c40333f1f7b401e
Author: Pearu Peterson <[email protected]>
Date: Wed Nov 23 12:05:37 2022 +0200
Sparse CSC/BSR/BSC serialization and pickle support (#89553)
Fixes https://github.com/pytorch/pytorch/issues/89497
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89553
Approved by: https://github.com/cpuhrsch
commit a8d6b82167ef417e21c807cb29d7eabea15014da
Author: Elias Ellison <[email protected]>
Date: Wed Nov 23 16:47:43 2022 +0000
Fix norm decomp when dtype is passed in (#89508)
Fix for https://github.com/pytorch/torchdynamo/issues/1889. The wrapper was doing a downcast even when the dtype was explicitly passed in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89508
Approved by: https://github.com/anijain2305
commit 72110d783344c4121730b032ca0d269896604dcf
Author: Elias Ellison <[email protected]>
Date: Wed Nov 23 17:03:09 2022 +0000
Fix Upsample Decomp Striding For Small Channels (#89528)
Fix for https://github.com/pytorch/torchdynamo/issues/623.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89528
Approved by: https://github.com/ngimel, https://github.com/anijain2305
commit b7483be06afe8d4242adeb559cfbe6e0e89419d0
Author: Jerry Zhang <[email protected]>
Date: Wed Nov 23 11:03:45 2022 -0800
[quant][docs] Add docstrings for operators defined in torch.ops.quantized_decomposed namespace (#89547)
Summary:
no functionality changes
Test Plan:
NA
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89547
Approved by: https://github.com/vkuzo
commit a188f05e8c1788d393c072868421991dfcb55b02
Author: Natalia Gimelshein <[email protected]>
Date: Wed Nov 23 20:18:54 2022 +0000
Reland #89031 Added conv constraint that infers layouts (#89530)
Relands #89031
Per title. We now set strides from fx graph only for convolutions and mm, which is a hack, but bmm in some cases caused extra copy, and there is no obvious way to fix that, we should rethink the strides anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89530
Approved by: https://github.com/Chillee
commit e800d27b10137727c68cb71bccabe3a93cf38e9e
Author: William Wen <[email protected]>
Date: Wed Nov 23 20:11:39 2022 +0000
[dashboard] Add graphs for all summary metrics, add additional testing flags (#89580)
Title. Test post: https://github.com/pytorch/torchdynamo/issues/1831#issuecomment-1325572179
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89580
Approved by: https://github.com/davidberard98
commit 953f39578a7019c4c34bc1dbd6cb0facb554af79
Author: Charlie West-Taylor <[email protected]>
Date: Wed Nov 23 19:51:50 2022 +0000
Mark IPU device as not supports_as_strided (#89130)
Currently causes issues in calls to `.to`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89130
Approved by: https://github.com/albanD
commit 37e46a503502cdeda791cf684522ef83b5655328
Author: Yanbo Liang <[email protected]>
Date: Wed Nov 23 19:44:46 2022 +0000
[Dynamo] Fix several bugs & code refactor in RangeVariable (#89322)
Fix bug in [7k github models](https://github.com/pytorch/torchdynamo/issues/1884): https://github.com/jansel/pytorch-jit-paritybench/blob/master/generated/test_clovaai_stargan_v2.py
```
E TypeError: 'list' object cannot be interpreted as an integer
E
E from user code:
E File "/scratch/ybliang/work/repos/pytorch-jit-paritybench/generated/test_clovaai_stargan_v2.py", line 335, in forward
E idx = torch.LongTensor(range(y.size(0)))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89322
Approved by: https://github.com/jansel
commit 91dcef41ae96ede3f07375c2d38cb28d534e97f8
Author: Xilun Wu <[email protected]>
Date: Wed Nov 23 19:43:28 2022 +0000
Thread PG: add allreduce to threaded pg (#89043)
Summary:
Goal
Add `all_reduce` collective to multi-threaded ProcessGroup added in D40236769 (https://github.com/pytorch/pytorch/commit/6663ae5537f3c61030ba4d425bd57a097c51430a).
Code Motion
Added `allreduce` collective to ProcessLocalGroup (a subclass of c10d ProcessGroup).
What's Next
Add a DDP test utilizing the new allreduce op.
Generalize `allreduce` to allow other `ReduceOp`s besides `SUM`.
Test Plan:
cd fbcode/caffe2
buck2 test mode/dev //caffe2/test/distributed:multi_threaded
Differential Revision: D41046606
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89043
Approved by: https://github.com/wanchaol
commit 27db806888c36b029f51197a40e5196cc10792db
Author: Charlie West-Taylor <[email protected]>
Date: Wed Nov 23 19:41:07 2022 +0000
Handle Tensor.__deepcopy__ via clone(), on IPU (#89129)
Currently it falls through to a call to `storage()`, which the IPU doesn't support.
I've made the minimal change here for ease of merging (this'd help us if it was in for 1.13.1), however...
**QUESTION**: Is there any reason why `not torch._C._has_storage(self)` needs to *also* be guarded on `self.device.type == privateuseone`? in other words, could the condition for using `clone` not be this?
```python
self.is_sparse
or self.device.type
in ["lazy", "xla", "mps", "ort", "meta", "hpu", "ipu"]
or not torch._C._has_storage(self)
or (type(self) is not Tensor and self.data_ptr() == 0)
```
If the condition fails, the very next thing is a call to `self._typed_storage()` which will fail, so it feels to me like *any* case without storage shouldn't fall through to the `storage()` call.
The original PR for adding the 'no storage and device is `PrivateUse1`' condition ([86557](https://github.com/pytorch/pytorch/pull/86557)) doesn't discuss whether this could be broadened.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89129
Approved by: https://github.com/albanD
commit fa7a963f6536dd05c381fbf23270f4f009f9f113
Author: Sergii Dymchenko <[email protected]>
Date: Wed Nov 23 19:39:47 2022 +0000
Remove BaseException TODO (#89540)
After discussion in https://github.com/pytorch/pytorch/pull/88461#issuecomment-1318965664
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89540
Approved by: https://github.com/H-Huang
commit 9eed6b7f9aa4f5fc65075de3189acc9add221660
Author: Yanbo Liang <[email protected]>
Date: Wed Nov 23 19:39:43 2022 +0000
[Dynamo] Several fixes on TensorVariable & TorchVariable (#89486)
This is a group of bug fixes for [7k github models](https://github.com/pytorch/torchdynamo/issues/1884), it would fix 30+ model tests.
* Support ```tensor.type()```.
* Support ```tensor.get_device()```.
* Support ```torch.nn.functional._Reduction.get_enum```.
* Support ```torch._utils._get_device_index()```.
* Fallback ```tensor.data_ptr()```.
* ```FakeTensor``` always returns 0
* For no fake tensor propagation, we ```clone``` the input tensor, which makes no sense to track the original ```data_ptr```. And I don't think this is a very popular API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89486
Approved by: https://github.com/jansel
commit f03e6672fb6a694d6f03980e3f34d8181c7cc663
Author: Iris <[email protected]>
Date: Wed Nov 23 19:39:01 2022 +0000
[Checkpoint][2D] Minor update for dedup_tensors.py (#89542)
Rename variables for better readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89542
Approved by: https://github.com/H-Huang
commit 74703eb50299b26082bc2a357770739a68460199
Author: Iris <[email protected]>
Date: Wed Nov 23 19:36:01 2022 +0000
[Checkpoint] Add a logger to dedup_tensors (#89503)
Add a logger to dedup_tensors to log the duplicate keys to remove in global plan (List of SavePlan).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89503
Approved by: https://github.com/fduwjj
commit 57353c9608263df98156a73aaa6ed35a2a2306ad
Author: Brian Hirsh <[email protected]>
Date: Wed Nov 23 08:29:08 2022 -0800
first draft of input mutation handling for aot autograd (#88817)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88817
Approved by: https://github.com/ezyang, https://github.com/wconstab
commit 902e4e3926a9333178510f032580e4acd56c40da
Author: PyTorch MergeBot <[email protected]>
Date: Wed Nov 23 19:05:13 2022 +0000
Revert "Fix the kineto daemon build condition (#89174)"
This reverts commit 9fd00f194ae4e28948a9a03a6382c20dde04e4fd.
Reverted https://github.com/pytorch/pytorch/pull/89174 on behalf of https://github.com/robieta due to For some reason this is interacting badly with NVFuser. I think it is instability in kineto, but until we figure out what's going on reverting is a necessary evil.
commit 049a0f2cd5916c8392c6bd1adc41c709de892f3a
Author: Bin Bao <[email protected]>
Date: Wed Nov 23 02:00:44 2022 +0000
[inductor] Update CI model tests (#89499)
Summary:
1) Add model inference test
2) Switch model training test to use AMP
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89499
Approved by: https://github.com/bertmaher
commit 95474e00a9477b1333e13fa95887a2ce05c4a6a6
Author: Jerry Zhang <[email protected]>
Date: Tue Nov 22 20:29:26 2022 -0800
[quant][be] Remove unused util code (#89272)
Summary:
att
Test Plan:
python test/test_quantization.py TestQuantizeFx
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89272
Approved by: https://github.com/andrewor14
commit 128faf2b69f62b55d3ae1b4cb3e24ec594af0009
Author: Jerry Zhang <[email protected]>
Date: Tue Nov 22 20:29:26 2022 -0800
[quant][be] Refactor the error checking code for quantize_per_channel op (#89271)
Summary:
at
Test Plan:
make sure it compiles
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89271
Appro…1 parent 3a858ba commit 43f36d7Copy full SHA for 43f36d7
File tree
215 files changed
+15932
-6695
lines changedFilter options
- .circleci
- docker
- scripts
- .github
- actions/filter-test-configs
- ci_commit_pins
- scripts
- workflows
- .jenkins/pytorch
- aten/src/ATen
- cpu/vec
- vec256
- vec512
- functorch
- mps
- native
- cpu
- cuda
- mkldnn
- mps/operations
- quantized
- benchmarks/dynamo
- c10
- core
- cuda
- test/util
- util
- caffe2/operators
- docs/source
- notes
- functorch/_src
- test
- cpp/jit
- distributed
- _tensor/parallel
- dynamo
- functorch
- inductor
- lazy
- nn
- profiler
- quantization/fx
- tools
- autograd
- stats
- torch
- _C
- _decomp
- _dynamo
- optimizations
- variables
- _inductor
- codegen
- _prims
- _prims_common
- _refs
- _subclasses
- ao/quantization
- fx
- autograd
- csrc
- autograd
- cuda
- jit
- passes
- quantization
- tensorexpr
- lazy/core
- utils
- cuda
- distributed
- _tensor/parallel
- checkpoint
- rpc
- fx
- jit
- linalg
- nn
- modules
- optim
- profiler
- signal/windows
- sparse
- testing/_internal
- distributed
- opinfo/definitions
- utils/hipify
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
215 files changed
+15932
-6695
lines changed+2-4Lines changed: 2 additions & 4 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
75 | 75 |
| |
76 | 76 |
| |
77 | 77 |
| |
78 |
| - | |
79 |
| - | |
80 |
| - | |
| 78 | + | |
| 79 | + | |
81 | 80 |
| |
82 | 81 |
| |
83 | 82 |
| |
| |||
209 | 208 |
| |
210 | 209 |
| |
211 | 210 |
| |
212 |
| - | |
213 | 211 |
| |
214 | 212 |
| |
215 | 213 |
| |
|
.circleci/scripts/build_android_gradle.sh
Copy file name to clipboardExpand all lines: .circleci/scripts/build_android_gradle.sh+1-1Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
24 | 24 |
| |
25 | 25 |
| |
26 | 26 |
| |
27 |
| - | |
| 27 | + | |
28 | 28 |
| |
29 | 29 |
| |
30 | 30 |
| |
|
.github/actions/filter-test-configs/action.yml
Copy file name to clipboardExpand all lines: .github/actions/filter-test-configs/action.yml+2-1Lines changed: 2 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
53 | 53 |
| |
54 | 54 |
| |
55 | 55 |
| |
56 |
| - | |
| 56 | + | |
| 57 | + | |
57 | 58 |
| |
58 | 59 |
| |
59 | 60 |
| |
|
.github/ci_commit_pins/vision.txt
Copy file name to clipboard+1-1Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 |
| - | |
| 1 | + |
.github/ci_commit_pins/xla.txt
Copy file name to clipboard+1-1Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 |
| - | |
| 1 | + |
.github/scripts/filter_test_configs.py
Copy file name to clipboardExpand all lines: .github/scripts/filter_test_configs.py+4-1Lines changed: 4 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
50 | 50 |
| |
51 | 51 |
| |
52 | 52 |
| |
| 53 | + | |
53 | 54 |
| |
54 | 55 |
| |
55 | 56 |
| |
| |||
188 | 189 |
| |
189 | 190 |
| |
190 | 191 |
| |
191 |
| - | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
192 | 195 |
| |
193 | 196 |
| |
194 | 197 |
| |
|
.github/scripts/generate_binary_build_matrix.py
Copy file name to clipboardExpand all lines: .github/scripts/generate_binary_build_matrix.py+1-1Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
16 | 16 |
| |
17 | 17 |
| |
18 | 18 |
| |
19 |
| - | |
| 19 | + | |
20 | 20 |
| |
21 | 21 |
| |
22 | 22 |
| |
|
.github/workflows/auto_request_review.yml
Copy file name to clipboardExpand all lines: .github/workflows/auto_request_review.yml+1-1Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
7 | 7 |
| |
8 | 8 |
| |
9 | 9 |
| |
10 |
| - | |
| 10 | + | |
11 | 11 |
| |
12 | 12 |
| |
13 | 13 |
| |
|
0 commit comments