Skip to content

Commit 685d432

Browse files
committed
Update on "port spmm_sum to pytorch and optimize it on CPU"
### Motivation of this PR This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of **Gather, Apply Scatter** in Message Passing of GNN inference/training. pytorch#71300 **GAS** is the major step for Message Passing, the behavior of **GAS** can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". `spmm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_spmm_reduce` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. ### Performance Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum|mean` is sequential; the original backward impl for `max|min` is not fused. #### before: ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` #### after ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` [ghstack-poisoned]
2 parents 2c3d187 + 89de8ac commit 685d432

File tree

174 files changed

+5100
-1539
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

174 files changed

+5100
-1539
lines changed

.circleci/docker/build.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ fi
8181

8282
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"
8383
_UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab
84-
_UCC_COMMIT=12944da33f911daf505d9bbc51411233d0ed85e1
84+
_UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee
8585

8686
# It's annoying to rename jobs every time you want to rewrite a
8787
# configuration, so we hardcode everything here rather than do it

.github/ci_commit_pins/vision.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
5b4f79d9ba8cbeeb8d6f0fbba3ba5757b718888b
1+
4a310f26049371959617921d0eb9b001f4d262c6

.github/ci_commit_pins/xla.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
dd9b67ff0d6ba4da6a46ca1b22e35c98dbed0d77
1+
8c2a3c41592aee25dffcf48933e7cbdc5c3fc91c

.github/requirements/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ The list of support files are as follows:
1717
test jobs to setup the conda environment
1818
* conda-env-macOS-X64. This is use by MacOS (x86-64) build and test
1919
jobs to setup the conda environment
20+
* conda-env-Linux-X64. This is used by Linux buck build and test jobs
21+
to setup the conda environment
2022
* Pip:
2123
* pip-requirements-macOS.txt. This is used by MacOS build and test jobs to
2224
setup the pip environment
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
cffi=1.15.1
2+
cmake=3.22.1
3+
mkl=2022.1.0
4+
mkl-include=2022.1.0
5+
ninja=1.10.2
6+
numpy=1.23.3
7+
pyyaml=6.0
8+
requests=2.28.1
9+
setuptools=65.5.0
10+
typing_extensions=4.3.0

.github/workflows/_buck-build-test.yml

Lines changed: 2 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,10 @@ jobs:
2121
distribution: 'temurin'
2222

2323
- name: Setup miniconda
24-
uses: conda-incubator/setup-miniconda@v2
24+
uses: pytorch/test-infra/.github/actions/setup-miniconda@main
2525
with:
26-
auto-update-conda: true
2726
python-version: 3.8
28-
activate-environment: build
29-
30-
- name: Install dependencies
31-
uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
32-
with:
33-
timeout_minutes: 10
34-
max_attempts: 5
35-
command: |
36-
conda install -y \
37-
cffi=1.15.1 \
38-
cmake=3.22.1 \
39-
mkl=2022.1.0 \
40-
mkl-include=2022.1.0 \
41-
ninja=1.10.2 \
42-
numpy=1.23.3 \
43-
pyyaml=6.0 \
44-
requests=2.28.1 \
45-
setuptools=65.5.0 \
46-
typing_extensions=4.3.0
27+
environment-file: .github/requirements/conda-env-${{ runner.os }}-${{ runner.arch }}
4728

4829
- name: Install Buck
4930
uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482

.github/workflows/_docs.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,11 @@ jobs:
6767
uses: pytorch/test-infra/.github/actions/setup-ssh@main
6868
with:
6969
github-secret: ${{ secrets.GITHUB_TOKEN }}
70+
instructions: |
71+
All builds are done inside the container, to start an interactive session run:
72+
docker exec -it $(docker container ps --format '{{.ID}}') bash
73+
To start Python docs build type:
74+
cd docs && make html && make coverage
7075
7176
# [see note: pytorch repo ref]
7277
- name: Checkout PyTorch
@@ -170,3 +175,6 @@ jobs:
170175
if-no-files-found: error
171176
path: functorch_ghpages/nightly/
172177
s3-prefix: pytorch/${{ github.event.pull_request.number }}/functorchdocs
178+
- name: Teardown Linux
179+
uses: pytorch/test-infra/.github/actions/teardown-linux@main
180+
if: always()

.github/workflows/_linux-test.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,9 @@ jobs:
6868
uses: pytorch/test-infra/.github/actions/setup-ssh@main
6969
with:
7070
github-secret: ${{ secrets.GITHUB_TOKEN }}
71+
instructions: |
72+
All testing is done inside the container, to start an interactive session run:
73+
docker exec -it $(docker container ps --format '{{.ID}}') bash
7174
7275
- name: Checkout PyTorch
7376
uses: pytorch/pytorch/.github/actions/checkout-pytorch@master

.github/workflows/_win-build.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,15 @@ jobs:
5050
uses: pytorch/test-infra/.github/actions/setup-ssh@main
5151
with:
5252
github-secret: ${{ secrets.GITHUB_TOKEN }}
53+
instructions: |
54+
To forward remote desktop on your local machine ssh as follows:
55+
ssh -L 3389:localhost:3389 %%username%%@%%hostname%%
56+
And then change password using `passwd` command.
57+
58+
To start build locally, change working folder to \actions-runner\_work\pytorch\pytorch,
59+
Activate miniconda and Visual Studio environment, but running:
60+
call C:\Jenkins\Miniconda3\Scripts\activate.bat C:\Jenkins\Miniconda3
61+
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
5362
5463
# [see note: pytorch repo ref]
5564
- name: Checkout PyTorch

.github/workflows/periodic.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,9 @@ jobs:
167167
cuda-version: "11.7"
168168
test-matrix: |
169169
{ include: [
170-
{ config: "default", shard: 1, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" },
171-
{ config: "default", shard: 2, num_shards: 2, runner: "windows.8xlarge.nvidia.gpu" },
170+
{ config: "default", shard: 1, num_shards: 3, runner: "windows.8xlarge.nvidia.gpu" },
171+
{ config: "default", shard: 2, num_shards: 3, runner: "windows.8xlarge.nvidia.gpu" },
172+
{ config: "default", shard: 3, num_shards: 3, runner: "windows.8xlarge.nvidia.gpu" },
172173
{ config: "force_on_cpu", shard: 1, num_shards: 1, runner: "windows.4xlarge" },
173174
]}
174175

0 commit comments

Comments
 (0)