Commit 886f57f
V1.3.0 (#1)
* Named tensor support for logsumexp, mode, kthvalue, median, min, max (#26563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26563
This adds name inference rules for pre-existing logsumexp, mode,
kthvalue, and median ops. Also adds overloads so that they can take
`Dimname` dimensions.
There are a lot of min/max overloads. This PR adds name inference to
the following overloads for (both) min and max:
- min(Tensor, int dim)
- min(Tensor, Dimname dim)
- min(Tensor) (full reduction)
Test Plan: - new tests and [namedtensor ci]
Differential Revision: D17557050
Pulled By: zou3519
fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171
* Delete backwards compatibility Backend overload for registerOp (#25914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17284083
Pulled By: ezyang
fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec
* Implement multiple dispatch in boxed c10 dispatcher (#26118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26118
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17404367
Pulled By: ezyang
fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8
* Remove unnecessary include from TensorBody (#26360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26360
This is not just for aesthetics: this include blocks the inclusion
of headers like ivalue.h from ATenDispatch.h (as it causes an
include cycle.)
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17429163
Pulled By: ezyang
fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c
* Add some missing constructors to IValue. (#26718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26718
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17549623
Pulled By: ezyang
fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb
* Updating submodules
Summary:
GitHub commits:
https://github.com/facebook/litho/commit/6668c21398a9b71f12cff9574bb8c7d8ebf93463
https://github.com/pytorch/fbgemm/commit/189aebb34442a6e96bf88734a047eaae7b258195
Test Plan: n/a
Reviewed By: yns88
fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d
* Revert D17549623: Add some missing constructors to IValue.
Test Plan: revert-hammer
Differential Revision:
D17549623
Original commit changeset: 8880c09d85a1
fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd
* Hub improvements (#26723)
Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/25980.
Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles.
We can still manage to get it work with tarfile, but let's delay it when there's an ask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26723
Differential Revision: D17551795
Pulled By: ailzhang
fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e
* Upgrade sleef to v3.4.0. (#26749)
Summary:
This reset the sleef submodule to upstream, since everything else except
a small build sanity fix
<https://github.com/zdevito/sleef/commit/191f655caa25526ae226cf88dd2529265176014a>
has been merged to upstream. The new release includes an important fix
for trigonometric functions on MacOS, which would unblock https://github.com/pytorch/pytorch/issues/26431.
This should supersede https://github.com/pytorch/pytorch/issues/20536.
Close https://github.com/pytorch/pytorch/issues/20536.
cc colesbury resistor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26749
Differential Revision: D17572783
Pulled By: ezyang
fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc
* Updating submodules
Summary:
GitHub commits:
https://github.com/facebook/litho/commit/5096b0ae1f5ef28bc0b948e260eb512626c6fea9
https://github.com/facebook/proxygen/commit/ecd6c10ea3df82cb0d221798150a0cf1f07315c3
https://github.com/facebookincubator/mvfst/commit/67abe5d0aaf42659358fa1d96a4159e5832f9c70
https://github.com/facebookincubator/profilo/commit/90580f7e064c25bac9c0a1f59afb4da55f46d3cd
https://github.com/facebookresearch/pytorch-biggraph/commit/7f98961c7b70bda098c371a8b1395f0d6ff5434c
https://github.com/pytorch/fbgemm/commit/f8da6e6e36b5970e95bf150521a1b3af844638be
Test Plan: n/a
Reviewed By: yns88
fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15
* move more functions to InsertObserversHelper (#26773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26773
att
Test Plan:
ci
Imported from OSS
Differential Revision: D17563673
fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08
* autodiff changes to enable profiling
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397
Differential Revision: D17565747
Pulled By: Krovatkin
fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed
* Lets generic tests use multiple devices (#26594)
Summary:
- Separates device type from default (test) device
- Adds multidevice decorator
- Updates generic tests to use multidevice decorator where applicable
TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality.
Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26594
Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'.
Differential Revision: D17568910
Pulled By: mruberry
fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128
* quantized_tensor tests (#26784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26784
Previously we are using empty to generate test tensors, this PR changes the test tensors to use
randint so that we can test things properly
Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the
original size and strides
Test Plan:
python test/test_quantized_tensor.py
Imported from OSS
Differential Revision: D17566575
fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990
* Refactor checked_tensor_unwrap to take DeviceType instead of Backend (#26290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26290
Fixes #26206
Happily, I also can delete the dead Dense***Tensor cases, since they
are for the defunct THS backend.
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17404368
Pulled By: ezyang
fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8
* Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556)
Summary:
Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26556
Test Plan:
_Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch.
_Performance_ - All measurements below on Pixel 2
**Before**:
Multi-threaded:
> adb shell "./speed_benchmark_torch \
> --model=./xraymobilev3.pt \
> --input_dims="1,3,224,224" \
> --input_type=float --warmup=5 \
> --iter=25"
>
> Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155
Single-threaded:
> adb shell "./speed_benchmark_torch \
> --model=./xraymobilev3.pt \
> --input_dims="1,3,224,224" \
> --input_type=float --warmup=5 \
> --iter=25
> --caffe2_threadpool_force_inline=true"
>
> Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671
**After**:
Multi-threaded:
> adb shell "./speed_benchmark_torch \
> --model=./xraymobilev3.pt \
> --input_dims="1,3,224,224" \
> --input_type=float --warmup=5 \
> --iter=25
>
> Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042
Single-threaded:
> adb shell "./speed_benchmark_torch \
> --model=./xraymobilev3.pt \
> --input_dims="1,3,224,224" \
> --input_type=float --warmup=5 \
> --iter=25
> --caffe2_threadpool_force_inline=true"
> Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425
>
Differential Revision: D17533311
Pulled By: AshkanAliabadi
fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d
* Port CUDA implementation of expm1 to ATen (#26598)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24562
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26598
Differential Revision: D17531503
Pulled By: VitalyFedyunin
fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48
* add function to get NCCL version for logging (#26583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26583
Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be
used for logging NCCL version in exception messages.
Test Plan: See above
Differential Revision: D17473200
fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64
* Remove one unnecessary copy of the output during the type promotion. (#26816)
Summary:
Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain.
BEFORE
```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
AFTER
```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26816
Differential Revision: D17573455
Pulled By: VitalyFedyunin
fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2
* Prepare for Cocoapods 1.3 Release (#26751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26751
### Summary
We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below:
1. Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading.
2. Verify the binary locally - Run tests on both arm64 and simulator
3. Publish the cocoapods officially
### Test plan
- podspec lint command succeeds
- `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`
Test Plan: Imported from OSS
Differential Revision: D17577131
Pulled By: xta0
fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac
* Validate Docker version in CI. (#26496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26496
It is a BAD BAD idea to deploy Docker versions which are not deployed
(per ossci-job-dsl) because those versions will get GC'ed after two
weeks. At the moment, there is no verification that your Docker version
is deployed. This adds an Azure job to check this.
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17575100
Pulled By: ezyang
fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a
* Fix CI docker builds (#26704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26704
nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :(
ghstack-source-id: 90714191
Test Plan: build docker images on Jenkins
Differential Revision: D17543120
fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25
* Export baddbmm (#25738)
Summary:
Added ONNX export for baddbmm in opset9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25738
Reviewed By: hl475
Differential Revision: D17565828
Pulled By: houseroad
fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5
* Fix Future default constructor missing for ParallelNative
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26739
Test Plan: Imported from OSS
Differential Revision: D17577908
Pulled By: bwasti
fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8
* Quantized Interpolate Kernel(upsample_bilinear2d) (#26631)
Summary:
We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR.
For nhwc performance improvement:
import torch, time
for dtype in [torch.qint8, torch.quint8, torch.qint32]:
print('****', str(dtype), '*****')
x = torch.rand(1, 56, 56, 256)
q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
q_x = q_x.permute([0, 3, 1, 2])
x = x.permute([0, 3, 1, 2])
NITER = 100
s = time.time()
for i in range(NITER):
float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
time_per_iter_float = (time.time() - s) / NITER
s = time.time()
for i in range(NITER):
quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
time_per_iter_quant = (time.time() - s) / NITER
ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
# torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())
print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')
bytes_float = (x.numel() + float_out.numel()) * x.element_size()
bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()
float_bw_gbps = bytes_float / time_per_iter_float / 1e9
quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9
print('GB/s float', 'GB/s quant', sep='\t')
print(float_bw_gbps, quant_bw_gbps, sep='\t')
===========without nhwc handling===========
**** torch.qint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
1.999044418334961 2.5860953330993652 1.2936657681940702
GB/s float GB/s quant
1.6192056416115257 0.3129103516188541
**** torch.quint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.02730655670166 2.6061582565307617 1.2855274639721328
GB/s float GB/s quant
1.596632728927902 0.3105014816242217
**** torch.qint32 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.0180463790893555 2.4047350883483887 1.1916153728010588
GB/s float GB/s quant
1.603959172365819 1.3460376636426636
===========with nhwc handling===========
**** torch.qint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.0913314819335938 0.09696483612060547 0.04636512047863123
GB/s float GB/s quant
1.5477527249803915 8.345458337015
**** torch.quint8 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.1065664291381836 0.09959936141967773 0.04728042754408879
GB/s float GB/s quant
1.5365591871338384 8.124710725706763
**** torch.qint32 *****
time/iter ms (float) time/iter ms (quant) quant/float
2.044203281402588 0.6003522872924805 0.29368521846837126
GB/s float GB/s quant
1.5834354779917448 5.391607675216635
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631
Differential Revision: D17521498
Pulled By: llyfacebook
fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103
* Typevar matching fix + implicit conversions from Scalar to int/float (#26453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26453
Previously, schema matching would incorrectly widen typevar bindings
when later occurrences were supertypes of earlier ones. This allowed
callsites like `floatlist.append(tensor.item())` to pass the typechecker,
causing a runtime assert (issue #24856).
An earlier, reverted fix (#25136) insisted on strict equality across all
occurrences of a typevar, necessitating explicit casts around Scalar-typed
arguments to int- or float-typed parameters, like `tensor.item()` above.
This was per the original type system design, but turned out to break
existing user code that relied on the de facto dynamic downcast. (The
error required a specialized list representation.)
The current fix includes the prevention of typevar widening, but
adds logic to insert implicit conversions from Scalar to float or int
as needed to satisfy a matched schema.
Test Plan: Imported from OSS
Differential Revision: D17470598
Pulled By: bhosmer
fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659
* Improve C++ maxpool and avgpool (#26521)
Summary:
This PR makes the following improvements:
1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`)
2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically).
3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26521
Differential Revision: D17507358
Pulled By: yf225
fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9
* Revert D17565828: [pytorch][PR] [ONNX] Export baddbmm
Test Plan: revert-hammer
Differential Revision:
D17565828
Original commit changeset: 85f605a7b3fa
fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37
* Cuda101 upgrade (#26823)
Summary:
test run: https://github.com/pytorch/pytorch/issues/26732
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823
Reviewed By: soumith
Differential Revision: D17576095
Pulled By: mingbowan
fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b
* Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (#26592)
Summary:
function_ref is pulled over from LLVM. It is to callables what StringRef is to strings.
This allows it to be substantially lighter weight, particularly in code size. That comes
at the cost of not being usable in situations where the callable's lifetime is shorter
than the function_ref. This means it is suitable for callback-like scenarios, but not
for situations where the callable needs to be stored. In converting TensorIterator,
I only encountered one situation that required refactoring to comply with function_ref's
constraints.
In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26592
Differential Revision: D17516202
fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48
* Revert D17473200: [pytorch][distributed] add function to get NCCL version for logging
Test Plan: revert-hammer
Differential Revision:
D17473200
Original commit changeset: 4881ed5221b3
fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576
* Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (#26815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26815
This PR adds named tensor support for:
- any, all, `bitwise_not(_)`, cumprod, cumsum, `logical_not`
In addition, it adds smoke tests for a variety of tensor attributes and
fns:
- is_shared, is_signed
- retain_grad, register_hook
Test Plan: - [namedtensor ci]
Differential Revision: D17575905
Pulled By: zou3519
fbshipit-source-id: 37bfa327e68112c5bf0f6bf1f467a527f50fa1c4
* torch.load default encoding change to 'utf-8' (#26421)
Summary:
Default encoding when using torch.load to 'utf-8'
This commit provides changes for cases where user tries to torch.load
a pickled module with non-ASCII characters in the docstring as
discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii'
to 'utf-8'. Documentation for `torch.load` was updated and two tests
(loading py2 unicode module with unicode in it; error throwing when
user explicitly sets wrong encoding) were written.
~~This commit provides changes for better error handling in cases
where user tries to `torch.load` a pickled module with non-ASCII
characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~
Ping ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421
Differential Revision: D17581633
Pulled By: yf225
fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620
* fix to operate on cuda kernel with clang and libc++ (#25553)
Summary:
We find a bug about `std::tuple` with nvcc.
In C++11, `std::tuple` constructor is constexpr in libstdc++, but is not constexpr in libc++.
https://github.com/pytorch/pytorch/blob/c36b77fcdad3d54227cf0fd51693eb57035002c0/aten/src/ATen/native/cuda/Loops.cuh#L109-L111
The lines have occurred crashes in CUDA with a message `scan failed with synchronize`. It is a error message of cuda initialization.
The purpose of this PR is fixed for loop in nvcc and libc++ by not using `std::tuple`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25553
Differential Revision: D17582118
Pulled By: yf225
fbshipit-source-id: d6f62ed46c2415b48eb49f8a051cf3c0e7cb23ce
* Do not call cpuinfo_initialize() on other than x86 arch. (#26265)
Summary:
cpuinfo_initialize() was not implemented for s390 arch.
cpuinfo calls are x86 specific to determine vector extensions AVX, AVX512 etc.
Without this patch an unnecessary error log is printed in s390 arch:
Error in cpuinfo: processor architecture is not supported in cpuinfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26265
Differential Revision: D17452301
Pulled By: izdeby
fbshipit-source-id: 9ca485550385c26dec18aac5953c887f1ffbfb7a
* support iterables, rangevalue in list comprehensions (#26768)
Summary:
Support IterableValue expressions and rangevalue in list comprehensions. Just as with supporting list comprehensions where the expression changes the input list types, we need to correctly type the list we create and it works.
Fixes https://github.com/pytorch/pytorch/issues/26693
Fixes https://github.com/pytorch/pytorch/issues/22483
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26768
Differential Revision: D17562762
Pulled By: eellison
fbshipit-source-id: 7ce8bf8605758dfd99057bc0376b4b724c4f9251
* Fix CUDA named tensor `copy_` (#26829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26829
The TensorIterator loop for `copy_` uses operations that are currently
unsupported by named tensors. The solution is to wrap `copy_` in a
function that does the name propagation and ignore names when running
the implementation of `copy_`. There is no test case because I'm not
sure how to trigger the incorrect behavior, but there is definitely code
in CUDA copy that doesn't support named tensors (expand_as isn't
supported):
https://github.com/pytorch/pytorch/blob/aaf30cdf36839bc3f21b1622fb91ff3e2983e8ea/aten/src/ATen/native/cuda/Copy.cu#L141-L148
Test Plan: - [namedtensor ci]
Differential Revision: D17577310
Pulled By: zou3519
fbshipit-source-id: e11c52243800e1331fad738084304badcfd51ae2
* Highlighting in the doc that square root comes before adding epsilon
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735
Test Plan: Imported from OSS
Differential Revision: D17558505
Pulled By: vincentqb
fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4
* Bytecode export flow (#25187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187
The bytecode export flow: dump the bytecode format for the light weighted interpreter.
* The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested).
* Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false).
* Both bytecode and module object are exported in pickle format.
* The module object (in data.pkl) is the same as the original JIT model.
* The serializer is dependent on pickle only (no protobuf or Json).
* The major functionality is forked in ScriptModuleSerializer2::serialize().
* The test loader is test_bc_export.cpp.
* Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants).
* Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) .
* Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148).
The output layout looks like:
* folders of methods.
* In each method folder (for example, forward/):
* bytecode.pkl: instructions and operators
* constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder.
* data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript.
Test Plan: Imported from OSS
Differential Revision: D17076411
fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046
* Move the CUDA implementation of log to ATen. (#26494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26494
Close #24586
Test Plan: Imported from OSS
Differential Revision: D17572497
Pulled By: VitalyFedyunin
fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945
* enable double backward for non-cudnn LSTM and GRU (#26660)
Summary:
An attempt to enable double backward for non-cudnn LSTM and GRU (see https://github.com/pytorch/pytorch/issues/25315, https://github.com/pytorch/pytorch/issues/20449). RNN works already because it does not rely on fused kernels.
This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically.
The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used.
The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement.
The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit.
cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN.
Thanks to mcarilli whose approach to double backward in weight norm I copied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26660
Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled.
Differential Revision: D17581489
Pulled By: ngimel
fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa
* Migrate multinomial from the TH to Aten (CUDA) (#26481)
Summary:
https://github.com/pytorch/pytorch/issues/24604
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26481
Differential Revision: D17489859
Pulled By: ifedan
fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3
* QEngine::QNNPACK enabled, module.eval()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26855
Test Plan: Imported from OSS
Differential Revision: D17589837
Pulled By: IvanKobzarev
fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7
* Use optimized_graph in graph_executor.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26705
Test Plan: Imported from OSS
Differential Revision: D17543281
Pulled By: ZolotukhinM
fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9
* Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26703
Test Plan: Imported from OSS
Differential Revision: D17543131
Pulled By: ZolotukhinM
fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941
* Throw if someone tries to torch.save() quantized modules (#26828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26828
Pickle serialization for quantized modules is currently broken by https://github.com/pytorch/pytorch/issues/24045, so let's be loud and fail if the user tries to do it
Test Plan: Imported from OSS
Differential Revision: D17579127
Pulled By: jamesr66a
fbshipit-source-id: 3deccac7e4590c6f648f22bb79c57badf3bf0487
* Fix broken failure messages for OverloadedMethodValue
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26846
Test Plan: Imported from OSS
Differential Revision: D17587050
Pulled By: jamesr66a
fbshipit-source-id: e5f3ea05b496afae15994b539f018ed0499ca62b
* Re-write of tensor-scalar quantized add
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766
Test Plan: Imported from OSS
Differential Revision: D17587105
Pulled By: jamesr66a
fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464
* Try to disable annoying hypothesis warnings again (#26853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26853
This is the same as https://github.com/pytorch/pytorch/pull/25188 but we add a version check for if the hypothesis version is too old
Test Plan: Imported from OSS
Differential Revision: D17589086
Pulled By: jamesr66a
fbshipit-source-id: b968965719593ff989d612384e00dfb823cf0a73
* Remove three unused declaration. (#26699)
Summary:
`frac()` in `Vec256<int{16,32,64}_t>` is not overridden.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26699
Differential Revision: D17549502
Pulled By: soumith
fbshipit-source-id: 87c65286032bfc88c447ec4eef1e3ebc73da5d27
* Fix building with PARALLEL_BACKEND=NATIVE_TBB (#26742)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/26721
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26742
Test Plan:
```
export USE_OPENMP=0
export USE_TBB=1
export BLAS=MKL
export MKL_THREADING=TBB
export MKLDNN_THREADING=TBB
export PARALLEL_BACKEND=NATIVE_TBB
export USE_CUDA=0
python setup.py build
```
Reviewed By: dskhudia
Differential Revision: D17586233
Pulled By: ilia-cher
fbshipit-source-id: 8e8befa6aa776b8c2b27bb4b79a3bff33dbcba7e
* Remove unnecessary functions and cleanup code in quantization.cpp.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26852
Test Plan: Imported from OSS
Differential Revision: D17587742
Pulled By: ZolotukhinM
fbshipit-source-id: f345ea4d524fde9741d6629dec1ea8ab870e49a5
* Updating submodules
Summary:
GitHub commits:
https://github.com/pytorch/fbgemm/commit/f767351c4b85cb29f6ea07d1a3bc27d62cca5150
Test Plan: n/a
Reviewed By: yns88
fbshipit-source-id: d0bfc9e5e62669ada8d56b853490a373eb8ba2f7
* Improvements to GuardElimination and InsertBailouts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25430
Differential Revision: D17584722
Pulled By: Krovatkin
fbshipit-source-id: 9db099b904d71572c1bf3aef5419d38435cecbb5
* add mobile friendly at:parallel_for backend
Summary:
This diff implemented at::parallel_for()/parallel_reduce() and other
ATen/Parallel.h APIs for mobile using caffe2::ThreadPool.
caffe2::ThreadPool doesn't support submitting individual tasks
separately and running them in parallel - all tasks need to be submit in
one batch which will lock the thread pool until all of them finish - as a
result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface
and reuse at::parallel_for() implementation in ParallelNative.h. Because
of this constraint, intraop_launch() / intraop_launch_future() are not
supported yet.
This diff doesn't touch inter-ops pool - it's still default native c10
thread pool. Will work on it when it's widely used.
Test Plan: - This is early draft to receive feedback. Will do more thorough tests.
Differential Revision: D17543412
Pulled By: ljk53
fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30
* remove backward functions from jit-op-registry for mobile build (#26851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851
Add codegen option to remove backward ops from jit-op-registry as they are not
likely to be used for inference only mobile build.
Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219.
Test Plan: - build and integrate with demo app;
Differential Revision: D17587422
Pulled By: ljk53
fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a
* Enable batch_size = 0 support in DNNLOWP Concat operator (#26849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26849
We were having division-by-zero errors when one of the input tensor dimension is 0 . Examples: P111481720 and P111481374
This diff adds unit tests for empty input tensors and fixes division-by-zero errors in the partition function.
Test Plan: buck test caffe2/caffe2/quantization/server:concat_dnnlowp_op_test -- --stress-runs=100
Reviewed By: jianyuh
Differential Revision: D17574566
fbshipit-source-id: 1d2c21308bde99b3c4f2da82f53201eec42b5d8b
* Add more inplace arguments to quantization top level API (#26782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26782
At least we should be consistent on top-level APIs and prepare/convert/etc.
Logic is inplace=False by default but top-level APIs take care of doing fewer copies.
Also renames always-inplace methods like add_observer to have underscore in the end.
One fix for MinMaxObserver was triggered by deepcopy surfacing that we were accidentally keeping autograd around
Test Plan: Imported from OSS
Differential Revision: D17595956
Pulled By: dzhulgakov
fbshipit-source-id: 801f9f5536b553f24c7a660064dd6fce685edd65
* batch size 0 support in ChannelShuffle DNNLOWP op (#26858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26858
Handle batch size = 0 in ChannelShuffle operator
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17591041
fbshipit-source-id: 63373aa752406c1f38401c3e93d8e1954ce7281e
* Make resize_as_ generic, so XLA works. (#26809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26809
resize_as_ shouldn't do multiple dispatch on its second argument. Because it
currently has per CPU/CUDA dispatch, however, it will do proper dispatch on all
arguments. Bad!
There is only a very minor downside to this patch which is we have an extra
dynamic dispatch now.
Thank you Ailing for reporting this problem.
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17581324
Pulled By: ezyang
fbshipit-source-id: e62cbb6cf497a7d6e53c4a24b905fef7a29b0826
* Add some missing constructors to IValue.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26806
Test Plan: Imported from OSS
Differential Revision: D17581325
Pulled By: ezyang
fbshipit-source-id: 1340ed949a649d11cc821775a33f84513e9a5944
* Add bitwise distributed reduction ops (#26824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26824
These ops are named after the bitwise reduction ops in MPI.
This is based on the work done by knottb in #22449.
Closes #22449.
Test Plan: Imported from OSS
Differential Revision: D17600210
Pulled By: pietern
fbshipit-source-id: 44c7041ce01bc5de170a4591c5a696e4f24431ef
* batch size 0 support in Conv DNNLOWP ops (#26871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26871
Add batch_size == 0 handlings in int8 Conv operators. Added associated test cases.
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17594809
fbshipit-source-id: 54506afc7ef4bfbfed0272c52d2842f6e144f725
* batch size 0 tests for element-wise DNNLOWP ops (#26870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26870
Add batch_size == 0 testings of element-wise DNNLOWP operators.
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17595162
fbshipit-source-id: f358748b56b236cce8736bac16054ea84541bf7f
* batch size 0 support in FC DNNLOWP operators (#26872)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26872
Add batch_size == 0 handlings in int8 FC operators. Added associated test cases.
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17595385
fbshipit-source-id: d271b7bdbaf723fd6dee6f194da8c7fdfeef5fa2
* batch size 0 tests for Quantize/Dequantize DNNLOWP ops (#26873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26873
Add batch_size == 0 testings of Quantize and Dequantize DNNLOWP operators.
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17595077
fbshipit-source-id: 4a4f60d471a1b1b5746131b08623aa8b1d0059f5
* Updating submodules
Summary:
GitHub commits:
https://github.com/facebookincubator/katran/commit/cfdf778eaf3c362150d8dd8fe3cd43653cc4a3e1
https://github.com/pytorch/fbgemm/commit/7f55d6c14fb8ff2b0b03ddf9c4166bd052460fec
Test Plan: n/a
Reviewed By: yns88
fbshipit-source-id: 2523bce9933cb27b7a02da1650d7ad6f05b0ff30
* Change calling convention of ATenDispatch from getOp to callUnboxed. (#26857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26857
Previously, ATenDispatch took TensorTypeId and returned a function pointer, to
avoid requiring a direct dependence on Tensor (which would have caused a header
cycle). Thanks to the work of Sebastian, it is now possible to include
TensorBody.h without inducing a cycle; so we can now replace this indirect
implementation with a more direct implementation of unboxedCall and move most of
the implementation details into ATenDispatch (simplifying generated code). This
is a necessary prerequisite for boxed fallback work I want to do, as I want to
handle generation of boxing from inside ATenDispatch, not generated code.
Unfortunately, we still need to generate the multidispatch list in
function_wrapper.py to accommodate c10 dispatcher.
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17602540
Pulled By: ezyang
fbshipit-source-id: 6927e66924405f5bf5cb67f1b57e49bc9a0f58ec
* Refactor dispatch structure so fallback code lives inline. (#26367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26367
This is necessary for boxed fallback, as boxed fallback must
live inside the templated code. Error reporting code never
has to be in templated code, so that stays in the C++ file.
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17448556
Pulled By: ezyang
fbshipit-source-id: 8244589251e359886dbfcd1c306ae6c033c7a222
* Fix circular deps in loading (#26758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26758
This PR changes the order in which we import classes and functions so
that is is no longer necessary for them to defined in order in a file,
or for there to be proper import statements in the exported file.
Actually importing a function/class now is driven by the need to resolve
the entity during unpickling, type resolution, or value resolution.
While this should allow significant simplification to the code that
serializes classes, this work has not been done yet in order to avoid
inevitable forward compat issues in the transition period.
Notes:
* Individual functions have been replaced with a SourceImporter object
that exposes a resolveType method. This method loads the type if
it has not been loaded yet, potentially parsing (but not loading)
the file it exists in if that file hasn't been parsed yet.
* Some legacy functionality needed to be added as a method to this object
since the old format still used some of this logic for class resolution.
Test Plan: Imported from OSS
Differential Revision: D17558989
Pulled By: zdevito
fbshipit-source-id: 7eae3470bcbd388c4de463e3462d527776ed46c6
* Fix nuclear norm with requires_grad=True (#26303)
Summary:
Changelog:
- Selectively assign compute_uv in the at::svd used internally in the implementation of at::nuclear_norm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26303
Test Plan:
- Add tests in common_method_invocations.py
Refixes: https://github.com/pytorch/pytorch/issues/18275
Differential Revision: D17605357
Pulled By: ezyang
fbshipit-source-id: d87d60afe678e2546dca6992ea66f2daeb6b0346
* fix typo in job name: nigthly->nightly
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26881
Differential Revision: D17607874
Pulled By: kostmo
fbshipit-source-id: 758a7c5135eb04ffca8231b5d907ababbe55e74b
* Get rid of -u (expansion of undefined variable) setting (#26907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26907
Somehow CircleCI broke this on update to their OS X workers;
the error looks like
/bin/bash: line 1: PROMPT_COMMAND: unbound variable
I'm not sure if I've killed all the occurrences that are necessary,
let's see!
Signed-off-by: Edward Z. Yang <[email protected]>
Test Plan: Imported from OSS
Differential Revision: D17607486
Pulled By: ezyang
fbshipit-source-id: 5e9a7ff69d4b18e759965bf97c67d38404841187
* Choose num_threads in parallel_for based on GRAIN_SIZE (#26886)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24080
The OpenMP implementation of `parallel_for` now chooses the number of cores to use on a sliding scale between 1 and `OMP_NUM_THREADS`. This prevents wasteful core usage on many-core systems such as in https://github.com/pytorch/pytorch/issues/24080.
This is also consistent with the comment on GRAIN_SIZE:
https://github.com/pytorch/pytorch/blob/e327df396564f937d17b5f28e2529229260c65bf/aten/src/ATen/Parallel.h#L10-L11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26886
Differential Revision: D17610292
Pulled By: ezyang
fbshipit-source-id: 60b9fe4b0eecb41a28c1488e3a575674c8f7000c
* Fix the Bernoulli distribution sampler (#26864)
Summary:
The current Bernoulli distribution sampler is slightly off in that it returns true slightly too often. This is most obvious at very low p values, like p = 0, although it theoretically occurs at every probability. See https://github.com/pytorch/pytorch/issues/26807.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26864
Differential Revision: D17610459
Pulled By: ezyang
fbshipit-source-id: 28215ff820a6046822513f284793e7b850d38438
* Switch internal CUDA build to C++14 (#26757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26757
This doesn't switch any open source builds or CI.
The internal fbcode build is C++17 already for quite some time, but in CUDA code, we had it restricted to C++11.
This diff changes that to C++14.
Because this doesn't change anything open source, the risk of this is low.
ghstack-source-id: 90728524
Test Plan: waitforsandcastle
Differential Revision: D17558142
fbshipit-source-id: 9cfd47e38e71d5a2fdae2f535c01f281bf007d9a
* Use intrinsics for trigonometric functions on CPU (#26431)
Summary:
A little benchmarking shows real improvements.
Benchmarking script:
```python
import timeit
for n, t in [(10_000, 8000),
(100_000, 800)]:
for dtype in ('torch.float', 'torch.double'):
print(f'================ dtype {dtype}, {t} times ================================')
for op in ('sin', 'sinh', 'cos', 'cosh', 'tan'):
print(f'a.{op}() (a.numel() == {n}) for {t} times')
print(timeit.timeit(f'a.{op}()',
setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})',
number=t))
```
RHEL 7.7, Debug build, gcc 8.3, turbo off:
Before this commit:
```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
2.690067914001702
a.sinh() (a.numel() == 10000) for 8000 times
7.025003784001456
a.cos() (a.numel() == 10000) for 8000 times
2.691191975001857
a.cosh() (a.numel() == 10000) for 8000 times
6.7473940790005145
a.tan() (a.numel() == 10000) for 8000 times
39.14060311800131
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
5.442704386001424
a.sinh() (a.numel() == 10000) for 8000 times
6.778444146999391
a.cos() (a.numel() == 10000) for 8000 times
5.429267812000035
a.cosh() (a.numel() == 10000) for 8000 times
6.625128638002934
a.tan() (a.numel() == 10000) for 8000 times
6.888564799002779
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
2.343601189000765
a.sinh() (a.numel() == 100000) for 800 times
6.4455943499997375
a.cos() (a.numel() == 100000) for 800 times
2.3377084899984766
a.cosh() (a.numel() == 100000) for 800 times
6.357531049001409
a.tan() (a.numel() == 100000) for 800 times
46.93665131099988
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
5.122997600999952
a.sinh() (a.numel() == 100000) for 800 times
6.233409892000054
a.cos() (a.numel() == 100000) for 800 times
5.071856587001093
a.cosh() (a.numel() == 100000) for 800 times
6.0974346790026175
a.tan() (a.numel() == 100000) for 800 times
6.5203832980005245
```
After this commit:
```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
1.5905082239987678
a.sinh() (a.numel() == 10000) for 8000 times
6.8216283560032025
a.cos() (a.numel() == 10000) for 8000 times
1.630263119997835
a.cosh() (a.numel() == 10000) for 8000 times
6.738510535000387
a.tan() (a.numel() == 10000) for 8000 times
1.7482984089983802
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
2.0000513029990543
a.sinh() (a.numel() == 10000) for 8000 times
6.876631892999285
a.cos() (a.numel() == 10000) for 8000 times
2.0672772910002095
a.cosh() (a.numel() == 10000) for 8000 times
6.678993797999283
a.tan() (a.numel() == 10000) for 8000 times
2.3625312719996145
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
1.2381345620015054
a.sinh() (a.numel() == 100000) for 800 times
6.400261008999223
a.cos() (a.numel() == 100000) for 800 times
1.284327255001699
a.cosh() (a.numel() == 100000) for 800 times
6.332740200999979
a.tan() (a.numel() == 100000) for 800 times
1.392364119998092
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
1.6348750549987017
a.sinh() (a.numel() == 100000) for 800 times
6.312609101998532
a.cos() (a.numel() == 100000) for 800 times
1.700102185997821
a.cosh() (a.numel() == 100000) for 800 times
6.141731683001126
a.tan() (a.numel() == 100000) for 800 times
1.9891383869980928
```
RHEL 7.7, Release build, gcc 8.3, turbo off:
Before this commit:
```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
1.0220722929989279
a.sinh() (a.numel() == 10000) for 8000 times
0.9413958889999776
a.cos() (a.numel() == 10000) for 8000 times
1.013564700999268
a.cosh() (a.numel() == 10000) for 8000 times
0.9127178879971325
a.tan() (a.numel() == 10000) for 8000 times
25.249723791999713
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
3.3466339340011473
a.sinh() (a.numel() == 10000) for 8000 times
0.909793314000126
a.cos() (a.numel() == 10000) for 8000 times
3.4019737700000405
a.cosh() (a.numel() == 10000) for 8000 times
0.918371007002861
a.tan() (a.numel() == 10000) for 8000 times
4.902741645997594
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
0.9870414770011848
a.sinh() (a.numel() == 100000) for 800 times
0.9038734009991458
a.cos() (a.numel() == 100000) for 800 times
0.9786967349973565
a.cosh() (a.numel() == 100000) for 800 times
0.8774048919985944
a.tan() (a.numel() == 100000) for 800 times
30.299459709000075
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
3.3855797659998643
a.sinh() (a.numel() == 100000) for 800 times
0.8303290260009817
a.cos() (a.numel() == 100000) for 800 times
3.3702223940017575
a.cosh() (a.numel() == 100000) for 800 times
0.822016927999357
a.tan() (a.numel() == 100000) for 800 times
4.889868417001708
```
After this commit:
```
================ dtype torch.float, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
0.542676458000642
a.sinh() (a.numel() == 10000) for 8000 times
0.90598970100109
a.cos() (a.numel() == 10000) for 8000 times
0.6119738140005211
a.cosh() (a.numel() == 10000) for 8000 times
0.902145998999913
a.tan() (a.numel() == 10000) for 8000 times
0.7713400800021191
================ dtype torch.double, 8000 times ================================
a.sin() (a.numel() == 10000) for 8000 times
0.609621113002504
a.sinh() (a.numel() == 10000) for 8000 times
0.8993683010012319
a.cos() (a.numel() == 10000) for 8000 times
0.6876834479990066
a.cosh() (a.numel() == 10000) for 8000 times
0.8859291590015346
a.tan() (a.numel() == 10000) for 8000 times
0.9243346840012236
================ dtype torch.float, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
0.5219837559998268
a.sinh() (a.numel() == 100000) for 800 times
0.8755807839988847
a.cos() (a.numel() == 100000) for 800 times
0.5899826130007568
a.cosh() (a.numel() == 100000) for 800 times
0.8757360769996012
a.tan() (a.numel() == 100000) for 800 times
0.7496912290007458
================ dtype torch.double, 800 times ================================
a.sin() (a.numel() == 100000) for 800 times
0.578619064999657
a.sinh() (a.numel() == 100000) for 800 times
0.7951330530013365
a.cos() (a.numel() == 100000) for 800 times
0.6442456569966453
a.cosh() (a.numel() == 100000) for 800 times
0.7975544330001867
a.tan() (a.numel() == 100000) for 800 times
0.875703464000253
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26431
Differential Revision: D17470502
fbshipit-source-id: 82e930993c7b2827b04cbe5f9a962913a6069b62
* No sccache (#26059)
Summary:
Proposed change:
Check whether sccache is available before running it to show statistics.
(If not available, simply skip it. Showing these stats isn't mandatory to build.)
https://github.com/pytorch/pytorch/issues/26058
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26059
Differential Revision: D17364967
Pulled By: vincentqb
fbshipit-source-id: 0250c6ba5573bc0b292ae8e2188b3e1fa700409e
* Remove an unused function propagate_names_if_namedtensor_enabled
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26176
Differential Revision: D17452289
Pulled By: yf225
fbshipit-source-id: 46926e6774a37e40141763c598b6fe84118ba5be
* Fix Vec256<T>::abs() for floating point when applied on -0.0 (#26422)
Summary:
Currently when a Vec256<T> (base) object contains -0.0, Vec256<T>::abs()
would not produce 0.0, but -0.0 instead. This commit fixes this issue.
This bug will mostly affect CPUs without AVX support, such as ARM,
PowerPC, and older Intel models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26422
Differential Revision: D17607346
fbshipit-source-id: e8d4595f0e88ad93018a61f89b9e3dcada485358
* Migrate lt and lt_ from the TH to Aten (#25998)
Summary:
https://github.com/pytorch/pytorch/issues/24593
https://github.com/pytorch/pytorch/issues/24727
**torch.lt(Tensor a, Tensor b)**
will compute common dtype (highest) based on inputs and then compare values. The result will be Bool tensor
```
>>> x = torch.tensor([0], dtype=torch.int)
>>> y = torch.tensor([0.5], dtype=torch.double)
>>> x < y
tensor([True])
```
Previously it was impossible to make comparison of two tensors with different dtype.
**torch.lt(Tensor a, Tensor b, out=c)**
will compute common dtype (highest) based on inputs and then compare values. The result can be populated only to Bool tensor
```
>>> x = torch.tensor([0], dtype=torch.int)
>>> y = torch.tensor([0.5], dtype=torch.double)
>>> z = torch.empty([1], dtype=torch.bool)
>>> torch.lt(x, y, out=z)
tensor([True])
```
Previously it was impossible to make comparison of two tensors with different dtype. Also previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result.
**a.lt_(Tensor b)**
Expects that a and b has same dtype, otherwise it's possible to get an overflow(Example: 'a' is uint8, 'b' is float32. 'a' will be promoted to float32 and the result will be also float32. Then it will be casted back to uint8 so potential for overflow). Will not compute common dtype. Result will have type of a.
```
>>> x = torch.tensor([0], dtype=torch.double)
>>> y = torch.tensor([0.5], dtype=torch.double)
>>> x < y
tensor([True])
```
Works similar to previous implementation.
**torch.lt(Tensor a, Scalar b)**
will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare.
```
>>> x = torch.tensor([0], dtype=torch.double)
>>> x < 0.5
tensor([True])
>>> x = torch.tensor([0], dtype=torch.int)
>>> x < 0.5
tensor([True])
```
Fix https://github.com/pytorch/pytorch/issues/22301.
**torch.lt(Tensor a, Scalar b, out=c)**
will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. The result can be populated only to Bool tensor
```
>>> x = torch.tensor([0], dtype=torch.double)
>>> torch.lt(x, 0.5, out=z)
tensor([True])
```
Previously the result dtype could be Bool and Byte(deprecated). Currently it will accept only Bool result. The rest works similar to previous implementation.
**torch.lt_(Tensor a, Scalar b)**
will check if there is no overflow when converting b to the same type as a. Then will compute common dtype and compare. Result will have type of a.
```
>>> x = torch.tensor([0], dtype=torch.int)
>>> x.lt_(1)
tensor([1], dtype=torch.int32)
>>> x = torch.tensor([0], dtype=torch.int)
>>> x.lt_(1.0)
tensor([1], dtype=torch.int32)
```
Works similar to previous implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25998
Differential Revision: D17431853
Pulled By: ifedan
fbshipit-source-id: b5effc6a5d9b32da379395b32abc628b604faaf7
* batch size 0 support in norm operators (#26894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26894
Add batch_size == 0 testings of norm DNNLOWP operators.
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17595416
fbshipit-source-id: 23086ecf8818be30da031eb4fc2922daea79ea7c
* batch size 0 tests in BatchMatMul ops (#26874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26874
Add batch_size == 0 testings of BatchMatMul DNNLOWP operator.
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D17596117
fbshipit-source-id: 029e29e6c2bd7894d83dac46e8ce8484cc92b1c0
* Export index_fill and index_copy, fix caffe2 scatter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23052
Reviewed By: hl475
Differential Revision: D16428486
Pulled By: houseroad
fbshipit-source-id: 8c5905052763fd70197c67aba5f28eeff0790721
* Set quantized engine backend for mobile in speed_benchmark_torch (#26911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26911
Check if QNNPACK is present as a backend (should always be present on mobile).
If it is present then set the backend to QNNPACK
Test Plan:
Test on mobile
./speed_benchmark_torch --model mobilenet_quantized_scripted.pt --input_dims="1,3,224,224" --input_type=float --warmup=5 --iter 20 --print_output True
Imported from OSS
Differential Revision: D17613908
fbshipit-source-id: af96722570a0111f13d69c38ccca52416ea5e460
* Check if QNNPACK is supported before set (#26935)
Summary:
ghstack-source-id: 0e873a56a879cab30b7fa1778e65d9cb89474f05
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26935
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26936
Differential Revision: D17617452
Pulled By: IvanKobzarev
fbshipit-source-id: 4dbcdc55044dd2050b28062baa8b58c8387a1e4e
* Support ceil_mode in quantized maxpool
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26916
Test Plan: Imported from OSS
Differential Revision: D17609625
Pulled By: jamesr66a
fbshipit-source-id: a9e1878e7946ee71b6888a91f0dcb2e889939376
* Make quantized max_pool2d error message more specific and less silly
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26918
Test Plan: Imported from OSS
Differential Revision: D17609624
Pulled By: jamesr66a
fbshipit-source-id: 3bc900d5035e9311ab95e3d4a945e95062396afa
* C++ API parity: TensorTest.Data fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26920
Test Plan: Imported from OSS
Differential Revision: D17614135
Pulled By: pbelevich
fbshipit-source-id: 96d70a5e7724338d2829bf006696c2d0ac1025a6
* use parallel_for in DepthwiseConvKernel (#26879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26879
Integrate with the at::parallel_for API for mobile.
Test Plan:
- Verified numerical results are the same as before.
- Benchmarked depthwise3x3_winograd layers in MobileNetV2 on two devices:
```
+-------------------+----------------+--------+-----------+----------+------------+-----------+
| Input | Kernel | Groups | S9 Single | S9 Multi | OP5 Single | OP5 Multi |
+-------------------+----------------+--------+-----------+----------+------------+-----------+
| [1, 32, 112, 112] | [32, 1, 3, 3] | 32 | 6796 | 1676 | 8520 | 5361 |
| [1, 144, 56, 56] | [144, 1, 3, 3] | 144 | 8004 | 5523 | 9591 | 4157 |
| [1, 192, 28, 28] | [192, 1, 3, 3] | 192 | 2771 | 730 | 3345 | 1436 |
| [1, 192, 28, 28] | [192, 1, 3, 3] | 192 | 2688 | 730 | 3358 | 1979 |
| [1, 384, 14, 14] | [384, 1, 3, 3] | 384 | 1641 | 461 | 1895 | 874 |
| [1, 384, 14, 14] | [384, 1, 3, 3] | 384 | 1765 | 444 | 1914 | 870 |
| [1, 384, 14, 14] | [384, 1, 3, 3] | 384 | 1636 | 448 | 1896 | 852 |
| [1, 384, 14, 14] | [384, 1, 3, 3] | 384 | 1639 | 452 | 1964 | 1010 |
| [1, 576, 14, 14] | [576, 1, 3, 3] | 576 | 2575 | 677 | 2854 | 1274 |
| [1, 576, 14, 14] | [576, 1, 3, 3] | 576 | 2595 | 749 | 2836 | 1291 |
| [1, 960, 7, 7] | [960, 1, 3, 3] | 960 | 1586 | 432 | 1714 | 675 |
| [1, 960, 7, 7] | [960, 1, 3, 3] | 960 | 1552 | 421 | 1690 | 1770 |
| [1, 960, 7, 7] | [960, 1, 3, 3] | 960 | 1680 | 424 | 1690 | 837 |
+-------------------+----------------+--------+-----------+----------+------------+-----------+
| TOTAL | 36928 | 13167 | 43267 | 22386 |
+-------------------+----------------+--------+-----------+----------+------------+-----------+
```
Differential Revision: D17598249
Pulled By: ljk53
fbshipit-source-id: aaeea221494f11b153a35af2b818a603f1f32ddf
* Fix c10 registration binary size (#26827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26827
The templates there had a binary size impact of ~20MB. This PR fixes that.
ghstack-source-id: 90842814
Test Plan: build it and see binary size of libtorch.so go down from 95MB to 70MB.
Differential Revision: D17566642
fbshipit-source-id: 57bebffce8e036675a452434bc1a9733f5f2cf6d
* Improve binary size of function schema inference (#26860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26860
This improves libtorch.so size by 100-200kb
ghstack-source-id: 90842815
Test Plan: measure libtorch.so size
Differential Revision: D17593224
fbshipit-source-id: effbb5f3b7690b67edaabacf2ff9292a73c991a4
* Fix shared_ptr binary size in op registration (#26869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26869
Having a lot of shared_ptr<Functor> cost us ~1.1MB of binary size in libtorch.so.
This PR fixes that.
ghstack-source-id: 90842812
Test Plan: measure libtorch.so size
Differential Revision: D17595674
fbshipit-source-id: 05151047ee8e85c05205b7510a33915ba98bab58
* Fix binary size in schema inference (#26878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26878
Before, for each function signature used in one or more ops, there's a template instantiation that creates the FunctionSchema object for it. As we've seen in the past, all these vector<> constructors in the FunctionSchema object take quite some binary size.
With this PR, we now create an intermediate constexpr std::array that has minimal binary size and can be embedded into the executable, then at runtime we will run a small piece of code that constructs the vector<>'s from it.
This reduces libtorch.so binary size by 800kb
ghstack-source-id: 90842811
Test Plan: measure libtorch.so size
Differential Revision: D17597752
fbshipit-source-id: 53442b565a7747c0d0384b2e3b845729c3daddfd
* Make TypeDefault, TypeDerived and VariableType anonymous namespaces (#26882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26882
Reduce binary size by 500kb by making TypeDerived and VariableType anonymous namespaces instead of classes. TypeDefault is also a namespace now but can't be anonymous because VariableType calls into it.his also has the nice side effect that VariableType.h and ${TypeDerived.h} are much smaller because they don't have to list the operator declarations anymore.
ghstack-source-id: 90865080
Test Plan: Measure libtorch.so size
Differential Revision: D17599686
fbshipit-source-id: da3c6641060b7410a7808f36a0a18ee3246ce2d2
* Revert D17610292: [pytorch][PR] Choose num_threads in parallel_for based on GRAIN_SIZE
Test Plan: revert-hammer
Differential Revision:
D17610292
Original commit changeset: 60b9fe4b0eec
fb…1 parent 973d510 commit 886f57f
File tree
3,319 files changed
+343935
-128524
lines changed- .circleci
- cimodel
- data
- lib
- scripts
- verbatim-sources
- .github
- workflows
- .jenkins
- caffe2
- pytorch
- perf_test
- win-test-helpers
- installation-helpers
- android
- gradle
- libs
- fbjni_local
- pytorch_android_torchvision
- src
- androidTest/java/org/pytorch/torchvision
- suite
- main
- java/org/pytorch/torchvision
- res/values
- pytorch_android
- src
- androidTest
- assets
- java/org/pytorch
- suite
- main
- cpp
- java/org/pytorch
- res/values
- aten
- src
- ATen
- core
- boxing
- dispatch
- op_registration
- cpu
- tbb
- extra
- vec256
- cuda
- detail
- nvrtc_stub
- cudnn
- detail
- hip/impl
- miopen
- native
- cpu
- cuda
- cudnn
- miopen
- mkldnn
- mkl
- quantized
- cpu
- kernels
- qnnpack
- bench
- cmake
- deps/clog
- cmake
- include
- src
- test
- include
- scripts
- src
- hgemm
- q8avgpool
- q8conv
- q8dwconv
- q8gavgpool
- q8gemm
- q8vadd
- qnnpack
- requantization
- sconv
- sdwconv
- sgemm
- u8clamp
- u8lut32norm
- u8maxpool
- u8rmax
- x8lut
- x8zip
- test
- cuda
- sparse
- cuda
- utils
- quantized
- templates
- test
- THCUNN
- generic
- THC
- generated
- generic
- THNN
- generic
- TH
- generic
- vector
- tools
- benchmarks
- fastrnns
- framework_overhead_benchmark
- operator_benchmark
- c2
- common
- tests
- ops
- pt_extension
- pt
- binaries
- c10
- core
- impl
- cuda
- impl
- macros
- test
- core
- util
- util
- caffe2
- contrib
- aten
- gloo
- pytorch
- tensorboard
- tensorrt
- core
- nomnigraph
- Representations
- include/nomnigraph
- Converters
- Representations
- ideep/operators
- quantization
- image
- mobile/contrib
- ios/mpscnn
- libopencl-stub/include/CL
- nnapi
- onnx
- operators
- experimental/c10/cpu
- hip
- quantized
- rnn
- opt
- custom
- nql
- tests
- perfkernels
- predictor
- emulator
- proto
- python
- examples
- helpers
- ideep
- layers
- modeling
- models
- onnx
- tests
- operator_test
- predictor
- serialized_test
- data/operator_test
- quantization/server
- serialize
- sgd
- utils
- math
- threadpool
- video
- cmake
- External
- Modules_CUDA_fix
- upstream
- FindCUDA
- Modules
- public
- docker
- caffe2/jenkins
- centos-rocm
- common
- pytorch
- docs
- caffe2
- cpp/source
- notes
- source
- _static/img
- tensorboard
- community
- notes
- org/pytorch
- torchvision
- ios
- TestApp
- TestApp.xcodeproj
- TestApp
- Assets.xcassets
- AppIcon.appiconset
- Base.lproj
- modules
- detectron
- module_test
- observers
- rocksdb
- scripts
- fbcode-dev-setup
- onnx
- test
- backward_compatibility
- cpp_api_parity
- cpp_extensions
- cpp
- api
- dist_autograd
- jit
- custom_operator
- error_messages
- expect
- ffi/src
- cpu
- cuda
- jit
- very
- very
- onnx
- expect
- optim
- third_party
- gemmlowp
- nccl
- tools
- amd_build
- patches
- pyHIPIFY
- autograd
- templates
- cwrap
- plugins
- templates
- jit
- templates
- nnwrap
- pyi
- setup_helpers
- torch
- _thnn
- autograd
- backends
- cudnn
- mkldnn
- quantized
- contrib
- csrc
- api
- include/torch
- data
- dataloader
- datasets
- samplers
- nn
- functional
- modules
- container
- options
- parallel
- optim
- serialize
- src
- data
- datasets
- samplers
- nn
- modules
- container
- options
- optim
- serialize
- autograd
- functions
- utils
- cuda
- distributed
- autograd
- context
- functions
- c10d
- rpc
- generic
- jit
- docs
- fuser
- cpu
- cuda
- mobile
- passes
- onnx
- utils
- script
- testing
- multiprocessing
- nn
- tensor
- utils
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
3,319 files changed
+343935
-128524
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
36 | 51 | | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
| 52 | + | |
50 | 53 | | |
51 | 54 | | |
52 | 55 | | |
| |||
56 | 59 | | |
57 | 60 | | |
58 | 61 | | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
59 | 78 | | |
60 | 79 | | |
61 | 80 | | |
| |||
88 | 107 | | |
89 | 108 | | |
90 | 109 | | |
91 | | - | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
92 | 131 | | |
93 | 132 | | |
94 | 133 | | |
| |||
113 | 152 | | |
114 | 153 | | |
115 | 154 | | |
116 | | - | |
| 155 | + | |
117 | 156 | | |
118 | 157 | | |
119 | 158 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
23 | 27 | | |
24 | 28 | | |
| 29 | + | |
| 30 | + | |
25 | 31 | | |
26 | 32 | | |
27 | 33 | | |
| |||
30 | 36 | | |
31 | 37 | | |
32 | 38 | | |
33 | | - | |
| 39 | + | |
| 40 | + | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
37 | 44 | | |
38 | 45 | | |
39 | 46 | | |
40 | 47 | | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | 48 | | |
53 | | - | |
| 49 | + | |
54 | 50 | | |
55 | 51 | | |
56 | | - | |
| 52 | + | |
57 | 53 | | |
58 | | - | |
59 | | - | |
| 54 | + | |
| 55 | + | |
60 | 56 | | |
61 | | - | |
| 57 | + | |
| 58 | + | |
62 | 59 | | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
63 | 71 | | |
64 | | - | |
| 72 | + | |
65 | 73 | | |
66 | 74 | | |
67 | | - | |
68 | | - | |
| 75 | + | |
69 | 76 | | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
| 77 | + | |
| 78 | + | |
74 | 79 | | |
75 | | - | |
| 80 | + | |
76 | 81 | | |
77 | | - | |
78 | | - | |
79 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
80 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
81 | 90 | | |
82 | 91 | | |
83 | 92 | | |
| |||
101 | 110 | | |
102 | 111 | | |
103 | 112 | | |
104 | | - | |
| 113 | + | |
| 114 | + | |
105 | 115 | | |
106 | 116 | | |
107 | 117 | | |
108 | 118 | | |
109 | 119 | | |
110 | 120 | | |
111 | 121 | | |
112 | | - | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
| 122 | + | |
149 | 123 | | |
150 | 124 | | |
151 | 125 | | |
152 | | - | |
153 | 126 | | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | 127 | | |
164 | 128 | | |
165 | 129 | | |
166 | | - | |
| 130 | + | |
167 | 131 | | |
168 | 132 | | |
169 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
170 | 143 | | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
| 144 | + | |
182 | 145 | | |
183 | 146 | | |
184 | 147 | | |
185 | 148 | | |
186 | | - | |
| 149 | + | |
187 | 150 | | |
188 | 151 | | |
| 152 | + | |
189 | 153 | | |
190 | | - | |
191 | | - | |
| 154 | + | |
192 | 155 | | |
193 | 156 | | |
194 | | - | |
195 | 157 | | |
196 | 158 | | |
197 | 159 | | |
| |||
0 commit comments