Add NCCL PreMul Sum to c10d `redce` ops #84243

crcrpar · 2022-08-30T01:48:03Z

This is based on #81272 but this conforms to TorchScript Compiler

cc @ptrblck @kwen2501 @aazzolini
cc @zasdfgbnm for visibility to the TODO above

facebook-github-bot · 2022-08-30T01:48:09Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84243
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 7ebdd05 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

crcrpar · 2022-08-30T01:48:48Z

if this works at least on the public CI, I'll close #84059

crcrpar · 2022-08-30T04:21:18Z

Regarding the failure

2022-08-30T02:10:49.9879118Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-08-30T02:10:49.9879135Z 
2022-08-30T02:10:49.9879245Z Broken ops: [
2022-08-30T02:10:49.9879772Z 	c10d::reduce_(Tensor[] _0, __torch__.torch.classes.c10d.ProcessGroup _1, int _2, int _3, int _4, int _5) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880297Z 	c10d::reduce_scatter_(Tensor[] _0, Tensor[][] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3, int _4) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880791Z 	c10d::allreduce_(Tensor[] _0, __torch__.torch.classes.c10d.ProcessGroup _1, int _2, int _3) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880884Z ]

Should I add c10d reduce ops to ALLOW_LIST like https://github.com/crcrpar/pytorch/blob/a0c6e7499ea81fb0da4858a7ebf27a88c0612493/test/forward_backward_compatibility/check_forward_backward_compatibility.py#L123?

kwen2501 · 2022-08-30T18:43:27Z

Regarding the failure

2022-08-30T02:10:49.9879118Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-08-30T02:10:49.9879135Z 
2022-08-30T02:10:49.9879245Z Broken ops: [
2022-08-30T02:10:49.9879772Z 	c10d::reduce_(Tensor[] _0, __torch__.torch.classes.c10d.ProcessGroup _1, int _2, int _3, int _4, int _5) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880297Z 	c10d::reduce_scatter_(Tensor[] _0, Tensor[][] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3, int _4) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880791Z 	c10d::allreduce_(Tensor[] _0, __torch__.torch.classes.c10d.ProcessGroup _1, int _2, int _3) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880884Z ]

Should I add c10d reduce ops to ALLOW_LIST like https://github.com/crcrpar/pytorch/blob/a0c6e7499ea81fb0da4858a7ebf27a88c0612493/test/forward_backward_compatibility/check_forward_backward_compatibility.py#L123?

@H-Huang seems to have encounter a similar warning. Maybe he knows how to respond to it.

H-Huang · 2022-09-01T16:06:56Z

Regarding the failure

2022-08-30T02:10:49.9879118Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-08-30T02:10:49.9879135Z 
2022-08-30T02:10:49.9879245Z Broken ops: [
2022-08-30T02:10:49.9879772Z 	c10d::reduce_(Tensor[] _0, __torch__.torch.classes.c10d.ProcessGroup _1, int _2, int _3, int _4, int _5) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880297Z 	c10d::reduce_scatter_(Tensor[] _0, Tensor[][] _1, __torch__.torch.classes.c10d.ProcessGroup _2, int _3, int _4) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880791Z 	c10d::allreduce_(Tensor[] _0, __torch__.torch.classes.c10d.ProcessGroup _1, int _2, int _3) -> __torch__.torch.classes.c10d.Work _0
2022-08-30T02:10:49.9880884Z ]

Should I add c10d reduce ops to ALLOW_LIST like https://github.com/crcrpar/pytorch/blob/a0c6e7499ea81fb0da4858a7ebf27a88c0612493/test/forward_backward_compatibility/check_forward_backward_compatibility.py#L123?

@crcrpar @kwen2501 FYI: I am going to update the allow list in this PR to allow all changes to all ops for the dispatchable collectives feature https://github.com/pytorch/pytorch/pull/83735/files#diff-236fbde71e59cb1597cac177a83e49fb62b30770eec55c4e7a0f2650b9eb6203R274-R275. PR will be merged in the next day or 2. Feel free to also include this change.

crcrpar · 2022-09-01T20:17:15Z

Failures of https://app.circleci.com/pipelines/github/pytorch/pytorch/559687/workflows/c323160d-ae56-4ada-a891-50ad050199b3/jobs/17073409 and https://app.circleci.com/pipelines/github/pytorch/pytorch/559687/workflows/c323160d-ae56-4ada-a891-50ad050199b3/jobs/17073412 look unrelated

This PR adds the support for https://docs.nvidia.com/deeplearning/nccl/archives/nccl_21212/user-guide/docs/api/ops.html?highlight=premul#c.ncclRedOpCreatePreMulSum.

- have `_SupplementBase` and `ReduceOp` inherit `torch::CustomClassHolder` - `def` only `c10d::ReduceOp` in `Ops.cpp` - rather `c10::intrusive_ptr<ReduceOp>`, not `int64_t` in dispatch Signed-off-by: Masaki Kozuki <[email protected]>

Signed-off-by: Masaki Kozuki <[email protected]>

Ref: https://github.com/pytorch/pytorch/pull/83735/files#diff-236fbde71e59cb1597cac177a83e49fb62b30770eec55c4e7a0f2650b9eb6203R274-R275 Signed-off-by: Masaki Kozuki <[email protected]> Co-authored-by: Howard Huang <[email protected]>

kwen2501

LGTM! Thanks for the contribution!

crcrpar · 2022-09-02T21:56:09Z

@pytorchmergebot merge

pytorchmergebot · 2022-09-02T21:57:42Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

github-actions · 2022-09-02T21:58:23Z

Hey @crcrpar.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: This is based on #81272 but this conforms to TorchScript Compiler ## TODO - [ ] Update https://github.com/pytorch/pytorch/blob/abaf8112e6d6bed2a5d33dcbc1d46ed20b8e80de/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp#L64-L73 to use `ReduceOp::RedOpType`. In my first try with `USE_SYSTEM_UCC=1`, this change wasn't necessary (I think) because of `ReduceOp::RedOpType` operator. That being said, I want to make it more explicit. cc ptrblck kwen2501 aazzolini cc zasdfgbnm for visibility to the TODO above Pull Request resolved: #84243 Approved by: https://github.com/kwen2501 Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/ab6c57217a97438c8e13952a407e42873e2259f3 Reviewed By: mehtanirav, izaitsevfb Differential Revision: D39277627 fbshipit-source-id: 039c6eef8c4d1c42a18273edb43b40888176d867

Summary: - Customize the metaclass of `torch.distributed.distributed_c10d.ReduceOp` for the sake of custom `__instancecheck__` - Add `copy.copy`, `copy.deepcopy`, and `pickle` support with tests Rel: - #81272 - #84243 - #87191 - #87303 - #87555 Ref: - pybind/pybind11#2696 Pull Request resolved: #88275 Approved by: https://github.com/wanchaol

) Summary: - Customize the metaclass of `torch.distributed.distributed_c10d.ReduceOp` for the sake of custom `__instancecheck__` - Add `copy.copy`, `copy.deepcopy`, and `pickle` support with tests Rel: - pytorch#81272 - pytorch#84243 - pytorch#87191 - pytorch#87303 - pytorch#87555 Ref: - pybind/pybind11#2696 Pull Request resolved: pytorch#88275 Approved by: https://github.com/wanchaol

crcrpar requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, H-Huang, awgu and mingzhe09088 as code owners August 30, 2022 01:48

pytorch-bot bot added the release notes: distributed (c10d) release notes category label Aug 30, 2022

facebook-github-bot added the cla signed label Aug 30, 2022

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 30, 2022

pytorchbot added the open source label Aug 30, 2022

crcrpar changed the title ~~Resubmit #81272~~ Add NCCL PreMul Sum to c10d redce ops Aug 30, 2022

crcrpar force-pushed the ncclpremulsum branch from a0c6e74 to 829b8fc Compare August 31, 2022 04:05

This comment was marked as outdated.

Sign in to view

crcrpar force-pushed the ncclpremulsum branch from b8528f6 to 5a9d0f3 Compare September 1, 2022 01:15

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 1, 2022

crcrpar and others added 5 commits September 2, 2022 08:12

Resubmit pytorch#81272

2abb78c

This PR adds the support for https://docs.nvidia.com/deeplearning/nccl/archives/nccl_21212/user-guide/docs/api/ops.html?highlight=premul#c.ncclRedOpCreatePreMulSum.

Trace ReduceOp

680d190

- have `_SupplementBase` and `ReduceOp` inherit `torch::CustomClassHolder` - `def` only `c10d::ReduceOp` in `Ops.cpp` - rather `c10::intrusive_ptr<ReduceOp>`, not `int64_t` in dispatch Signed-off-by: Masaki Kozuki <[email protected]>

delete outdated TODO comment

2c6bf1a

Signed-off-by: Masaki Kozuki <[email protected]>

Kind -> RedOpType in Python side

4c1d70f

Signed-off-by: Masaki Kozuki <[email protected]>

Allow c10d ops to be updated

7ebdd05

Ref: https://github.com/pytorch/pytorch/pull/83735/files#diff-236fbde71e59cb1597cac177a83e49fb62b30770eec55c4e7a0f2650b9eb6203R274-R275 Signed-off-by: Masaki Kozuki <[email protected]> Co-authored-by: Howard Huang <[email protected]>

crcrpar force-pushed the ncclpremulsum branch from 81c3be0 to 7ebdd05 Compare September 2, 2022 15:12

kwen2501 approved these changes Sep 2, 2022

View reviewed changes

pytorchmergebot added the Merged label Sep 2, 2022

pytorchmergebot closed this in ab6c572 Sep 2, 2022

crcrpar deleted the ncclpremulsum branch September 30, 2022 23:20

This was referenced Oct 18, 2022

Warning and breaking changes around ReduceOP in 1.13 #87191

Closed

Collective's PREMUL_SUM support with PyTorch 1.13 Lightning-AI/pytorch-lightning#15201

Merged

crcrpar mentioned this pull request Oct 22, 2022

Improve c10d::ReduceOp & torch.distributed.distributed_c10d.ReduceOp #87555

Open

crcrpar mentioned this pull request Nov 2, 2022

[c10d] Implement __instancecheck__ for c10d::ReduceOp #88275

Closed

pritamdamania87 mentioned this pull request Dec 2, 2022

[ReduceOP] Type bug since Torch 1.13 #90072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NCCL PreMul Sum to c10d `redce` ops #84243

Add NCCL PreMul Sum to c10d `redce` ops #84243

Uh oh!

crcrpar commented Aug 30, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Aug 30, 2022 •

edited

Loading

Uh oh!

crcrpar commented Aug 30, 2022

Uh oh!

crcrpar commented Aug 30, 2022

Uh oh!

kwen2501 commented Aug 30, 2022

Uh oh!

This comment was marked as outdated.

H-Huang commented Sep 1, 2022

Uh oh!

crcrpar commented Sep 1, 2022

Uh oh!

kwen2501 left a comment

Uh oh!

crcrpar commented Sep 2, 2022

Uh oh!

pytorchmergebot commented Sep 2, 2022

Uh oh!

github-actions bot commented Sep 2, 2022

Uh oh!

Uh oh!

Add NCCL PreMul Sum to c10d redce ops #84243

Add NCCL PreMul Sum to c10d redce ops #84243

Uh oh!

Conversation

crcrpar commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

crcrpar commented Aug 30, 2022

Uh oh!

crcrpar commented Aug 30, 2022

Uh oh!

kwen2501 commented Aug 30, 2022

Uh oh!

This comment was marked as outdated.

H-Huang commented Sep 1, 2022

Uh oh!

crcrpar commented Sep 1, 2022

Uh oh!

kwen2501 left a comment

Choose a reason for hiding this comment

Uh oh!

crcrpar commented Sep 2, 2022

Uh oh!

pytorchmergebot commented Sep 2, 2022

Uh oh!

github-actions bot commented Sep 2, 2022

Uh oh!

Uh oh!

Add NCCL PreMul Sum to c10d `redce` ops #84243

Add NCCL PreMul Sum to c10d `redce` ops #84243

crcrpar commented Aug 30, 2022 •

edited

Loading

facebook-github-bot commented Aug 30, 2022 •

edited

Loading