From eecf826f3ce60f27be0db1567205c7d3daeede02 Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Wed, 7 Dec 2022 22:48:57 +0000
Subject: [PATCH 1/8] Post launch changes
---
_get_started/pytorch.md | 107 ++++++++++++++++++++------------
_includes/pytorch-side-nav.html | 45 ++++++--------
_sass/get-started.scss | 2 +-
3 files changed, 90 insertions(+), 64 deletions(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index ed1a0ec74b44..05dea0ba0b63 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -78,7 +78,7 @@ _"With just one line of code to add, PyTorch 2.0 gives a speedup between 1.5x an
_“It just works out of the box with majority of TIMM models for inference and train workloads with no code changes”_
-**Luca Antiga** the **CTO of grid.ai** and one of the **primary maintainers of PyTorch Lightning**
+**Luca Antiga** the **CTO of [Lightning AI](http://grid.ai/)** and one of the **primary maintainers of PyTorch Lightning**
_“PyTorch 2.0 embodies the future of deep learning frameworks. The possibility to capture a PyTorch program with effectively no user intervention and get massive on-device speedups and program manipulation out of the box unlocks a whole new dimension for AI developers.”_
@@ -533,11 +533,11 @@ def infer(model, input):
DDP and FSDP in Compiled mode can run up to 15% faster than Eager-Mode in FP32 and up to 80% faster in AMP precision. PT2.0 does some extra optimization to ensure DDP’s communication-computation overlap works well with Dynamo’s partial graph creation. Ensure you run DDP with static_graph=False. More details here.
-
- TOPIC |
- HOST |
- DATE | TIME |
+ TOPIC |
+ HOST |
The new developer experience of using 2.0 (install, setup, clone an example, run with 2.0) |
- Suraj Subramanian |
- 12/13/22 | 10 AM PST |
+ Suraj Subramanian
+ LinkedIn |
+ Twitter
+ |
PT2 Profiling and Debugging |
- Bert Maher |
- 12/15/11 | 10 AM PST |
+ Bert Maher
+ LinkedIn |
+ Twitter
+ |
A deep dive on TorchInductor and PT 2.0 Backend Integration |
- Natalia Gimelshein, Bin Bao and Sherlock Huang |
- 1/10/23 | 10 AM PST |
+ Natalia Gimelshein, Bin Bao and Sherlock Huang
+ LinkedIn Natalia Gimelshein |
+ LinkedIn Sherlock Huang
+ |
Extend PyTorch without C++ and functorch: JAX-like composable function transforms for PyTorch |
- Anjali Chourdia and Samantha Andow |
- 1/5/23 | 9 AM PST |
+ Anjali Chourdia and Samantha Andow
+ LinkedIn Anjali Chourdia |
+ Twitter Anjali Chourdia |
+ LinkedIn Samantha Andow |
+ Twitter Samantha Andow |
+ |
A deep dive on TorchDynamo |
- Michael Voznesensky |
- 12/19/22 | 1 PM PST |
+ Michael Voznesensky
+ LinkedIn
+ |
Rethinking data loading with TorchData:Datapipes and Dataloader2 |
- Kevin Tse |
- 2/1/23 | 11 AM PST |
+ Kevin Tse
+ LinkedIn
+ |
Composable training (+ torcheval, torchsnapshot) |
Ananth Subramaniam |
- TBD |
How and why contribute code and tutorials to PyTorch |
- Zain Rizvi, Svetlana Karslioglu and Carl Parker |
- 12/15/22 | 2 PM PST |
+ Zain Rizvi, Svetlana Karslioglu and Carl Parker
+ LinkedIn Zain Rizvi |
+ Twitter Zain Rizvi |
+ LinkedIn Svetlana Karslioglu |
+ Twitter Svetlana Karslioglu
+ |
Dynamic Shapes and Calculating Maximum Batch Size |
- Edward Yang and Elias Ellison |
- 2/7/23 | 1 PM PST |
+ Edward Yang and Elias Ellison
+ Twitter Edward Yang
+ |
PyTorch 2.0 Export: Sound Whole Graph Capture for PyTorch |
- Michael Suo and Yanan Cao |
- 12/21/22 | 2 PM PST |
+ Michael Suo and Yanan Cao
+ LinkedIn Yanan Cao
+ |
2-D Parallelism using DistributedTensor and PyTorch DistributedTensor |
- Wanchao Liang and Alisson Gusatti Azzolini |
- 2/15/23 | 10 AM PST |
+ Wanchao Liang and Alisson Gusatti Azzolini
+ LinkedIn Wanchao Liang |
+ Twitter Wanchao Liang |
+ Alisson Gusatti Azzolini
+ |
TorchRec and FSDP in Production |
- Dennis van der Staay, Andrew Gu and Rohan Varma |
- 12/21/22 | 10 AM PST |
+ Dennis van der Staay, Andrew Gu and Rohan Varma
+ LinkedIn Dennis van der Staay |
+ LinkedIn Rohan Varma |
+ Twitter Rohan Varma
+ |
The Future of PyTorch On-Device |
- Raziel Alvarez Guevara |
- 2/8/23 | 10 AM PST |
+ Raziel Alvarez Guevara
+ LinkedIn |
+ Twitter
+ |
TorchMultiModal
Intro Blog
Scaling Blog |
- Kartikay Khandelwal |
- 2/23/23 | 10 AM PST |
+ Kartikay Khandelwal
+ LinkedIn |
+ Twitter
+ |
BetterTransformers (+ integration with Hugging Face), Model Serving and Optimizations
Blog 1
Github |
- Hamid Shojanazeri and Mark Saroufim |
- 1/10/23 | 1 PM PST |
+ Hamid Shojanazeri and Mark Saroufim
+ LinkedIn Mark Saroufim |
+ Twitter Mark Saroufim
+ |
PT2 and Distributed |
- Will Constable |
- 1/24/23 | 2 PM PST |
+ Will Constable
+ LinkedIn
+ |
diff --git a/_includes/pytorch-side-nav.html b/_includes/pytorch-side-nav.html
index f4a3b1c6663a..5d4cbf1c2417 100644
--- a/_includes/pytorch-side-nav.html
+++ b/_includes/pytorch-side-nav.html
@@ -1,63 +1,58 @@
diff --git a/_sass/get-started.scss b/_sass/get-started.scss
index 1e2981e13e48..c040a7b342b6 100644
--- a/_sass/get-started.scss
+++ b/_sass/get-started.scss
@@ -284,7 +284,7 @@
padding: 10px;
}
- b, em, h3, h2, p, a, strong {
+ b, em, h3, h2, p, a, strong, td, tr {
font-family: Verdana;
}
From ba5d14205759264dd53bc1ceef9087bece14ac13 Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Wed, 7 Dec 2022 23:38:00 +0000
Subject: [PATCH 2/8] more changes
---
_get_started/pytorch.md | 63 +++++++++++++++++++++++++----------------
1 file changed, 38 insertions(+), 25 deletions(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index 05dea0ba0b63..195f5aed4060 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -78,7 +78,7 @@ _"With just one line of code to add, PyTorch 2.0 gives a speedup between 1.5x an
_“It just works out of the box with majority of TIMM models for inference and train workloads with no code changes”_
-**Luca Antiga** the **CTO of [Lightning AI](http://grid.ai/)** and one of the **primary maintainers of PyTorch Lightning**
+**Luca Antiga** the **CTO of Lightning AI** and one of the **primary maintainers of PyTorch Lightning**
_“PyTorch 2.0 embodies the future of deep learning frameworks. The possibility to capture a PyTorch program with effectively no user intervention and get massive on-device speedups and program manipulation out of the box unlocks a whole new dimension for AI developers.”_
@@ -524,7 +524,7 @@ def infer(model, input):
- DDP and FSDP in Compiled mode can run up to 15% faster than Eager-Mode in FP32 and up to 80% faster in AMP precision. PT2.0 does some extra optimization to ensure DDP’s communication-computation overlap works well with Dynamo’s partial graph creation. Ensure you run DDP with static_graph=False. More details here.
- My previously-running code is crashing with 2.0’s Compiled Mode! How do I debug it?
@@ -547,7 +547,7 @@ def infer(model, input):
We will be hosting a series of live Q&A sessions for the community to have deeper questions and dialogue with the experts. Please check back to see the full calendar of topics throughout the year. If you are unable to attend: 1) They will be recorded for future viewing and 2) You can attend our Dev Infra Office Hours every Friday at 10 AM PST @ [https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours)
-Please click here to see dates, times, descriptions and links
+Please click [here](https://pytorchconference22.splashthat.com/) to see dates, times, descriptions and links
Disclaimer: Please do not share your personal information, last name, company when joining the live sessions and submitting questions
@@ -573,17 +573,21 @@ Disclaimer: Please do not share your personal information, last name, company wh
A deep dive on TorchInductor and PT 2.0 Backend Integration |
Natalia Gimelshein, Bin Bao and Sherlock Huang
- LinkedIn Natalia Gimelshein |
- LinkedIn Sherlock Huang
+ Natalia Gimelshein
+ LinkedIn
+ Sherlock Huang
+ LinkedIn
|
Extend PyTorch without C++ and functorch: JAX-like composable function transforms for PyTorch |
Anjali Chourdia and Samantha Andow
- LinkedIn Anjali Chourdia |
- Twitter Anjali Chourdia |
- LinkedIn Samantha Andow |
- Twitter Samantha Andow |
+ Anjali Chourdia
+ LinkedIn |
+ Twitter
+ Samantha Andow
+ LinkedIn |
+ Twitter
|
@@ -605,38 +609,46 @@ Disclaimer: Please do not share your personal information, last name, company wh
How and why contribute code and tutorials to PyTorch |
Zain Rizvi, Svetlana Karslioglu and Carl Parker
- LinkedIn Zain Rizvi |
- Twitter Zain Rizvi |
- LinkedIn Svetlana Karslioglu |
- Twitter Svetlana Karslioglu
+ Zain Rizvi
+ LinkedIn |
+ Twitter
+ Svetlana Karslioglu
+ LinkedIn |
+ Twitter
|
Dynamic Shapes and Calculating Maximum Batch Size |
Edward Yang and Elias Ellison
- Twitter Edward Yang
+ Edward Yang
+ Twitter
|
PyTorch 2.0 Export: Sound Whole Graph Capture for PyTorch |
Michael Suo and Yanan Cao
- LinkedIn Yanan Cao
+ Yanan Cao
+ LinkedIn
|
2-D Parallelism using DistributedTensor and PyTorch DistributedTensor |
Wanchao Liang and Alisson Gusatti Azzolini
- LinkedIn Wanchao Liang |
- Twitter Wanchao Liang |
- Alisson Gusatti Azzolini
+ Wanchao Liang
+ LinkedIn |
+ Twitter
+ Alisson Gusatti Azzolini
+ LinkedIn
|
TorchRec and FSDP in Production |
Dennis van der Staay, Andrew Gu and Rohan Varma
- LinkedIn Dennis van der Staay |
- LinkedIn Rohan Varma |
- Twitter Rohan Varma
+ Dennis van der Staay
+ LinkedIn
+ Rohan Varma
+ LinkedIn |
+ Twitter
|
@@ -660,8 +672,9 @@ Disclaimer: Please do not share your personal information, last name, company wh
Blog 1
Github
Hamid Shojanazeri and Mark Saroufim
- LinkedIn Mark Saroufim |
- Twitter Mark Saroufim
+ Mark Saroufim
+ LinkedIn |
+ Twitter
|
From af6fd397398461c1bed8bd2ec39bdf4b88bad682 Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Thu, 8 Dec 2022 00:03:58 +0000
Subject: [PATCH 3/8] Adding missing FAQ
---
_get_started/pytorch.md | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index 195f5aed4060..ee96da6dc3ba 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -469,7 +469,7 @@ After all, we can’t claim we’re created a breadth-first unless **YOUR** mode
Is 2.0 enabled by default?
- No, you must explicitly enable 2.0 in your PyTorch code by optimizing your model with a single function call.
+ 2.0 is the name of the release. torch.compile is the feature released in 2.0, and you need to explicitly use torch.compile.
How do I migrate my PT1.X code to PT2.0?
@@ -492,7 +492,7 @@ def infer(model, input):
Are there any applications where I should NOT use PT 2.0?
- The current release of PT 2.0 is still experimental and in the nightlies. Dynamic shapes support in torch.compile is still early, and you should not be using it yet, and wait until the Stable 2.0 release lands in March 2023.
+ The current release of PT 2.0 is still experimental and in the nightlies. Dynamic shapes support in torch.compile is still early, and you should not be using it yet, and wait until the Stable 2.0 release lands in March 2023.
That said, even with static-shaped workloads, we’re still building Compiled mode and there might be bugs. Disable Compiled mode for parts of your code that are crashing, and raise an issue (if it isn’t raised already).
@@ -537,6 +537,11 @@ def infer(model, input):
The PyTorch Developers forum is the best place to learn about 2.0 components directly from the developers who build them.
+ Help my code is running slower with 2.0’s Compiled Mode!
+ The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more here.
+
+
+
My previously-running code is crashing with 2.0’s Compiled Mode! How do I debug it?
Here are some techniques to triage where your code might be failing, and printing helpful logs: https://pytorch.org/docs/master/dynamo/faq.html#why-is-my-code-crashing
From 15cf4c1e93da4b071ba593c1b2ebbd204f763b30 Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Thu, 8 Dec 2022 00:17:46 +0000
Subject: [PATCH 4/8] link update
---
_get_started/pytorch.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index ee96da6dc3ba..e4a43f49e8cc 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -538,7 +538,7 @@ def infer(model, input):
Help my code is running slower with 2.0’s Compiled Mode!
- The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more here.
+
The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more here.
From 35737d41b9bbdd2b0604f7cf2898d6c6fed529ec Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Thu, 8 Dec 2022 19:50:29 +0000
Subject: [PATCH 5/8] Translating html to markdown, minor fixes
---
_get_started/pytorch.md | 186 +++++++++++++++++-----------------------
_sass/get-started.scss | 8 +-
2 files changed, 85 insertions(+), 109 deletions(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index e4a43f49e8cc..b104c6ae95d2 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -370,7 +370,7 @@ We are super excited about the direction that we’ve taken for PyTorch 2.0 and
- Getting Started @ [https://pytorch.org/docs/master/dynamo/get-started.html](https://pytorch.org/docs/master/dynamo/get-started.html)
- Tutorials @ [https://pytorch.org/tutorials/](https://pytorch.org/tutorials/)
-- Documentation @ [https://pytorch.org/docs/master](https://pytorch.org/docs/master) and [pytorch.org/docs/master/dynamo](http://pytorch.org/docs/master/dynamo)
+- Documentation @ [https://pytorch.org/docs/master](https://pytorch.org/docs/master) and [http://pytorch.org/docs/master/dynamo](http://pytorch.org/docs/master/dynamo)
- Developer Discussions @ [https://dev-discuss.pytorch.org](https://dev-discuss.pytorch.org)
@@ -442,111 +442,87 @@ The blog tutorial will show you exactly how to replicate those speedups so you c
After all, we can’t claim we’re created a breadth-first unless **YOUR** models actually run faster.
-FAQs
-
-
- - What is PT 2.0?
- 2.0 is the latest PyTorch version. PyTorch 2.0 offers the same eager-mode development experience, while adding a compiled mode via torch.compile. This compiled mode has the potential to speedup your models during training and inference.
-
-
- - Why 2.0 instead of 1.14?
- PyTorch 2.0 is what 1.14 would have been. We were releasing substantial new features that we believe change how you meaningfully use PyTorch, so we are calling it 2.0 instead.
-
-
- -
-
How do I install 2.0? Any additional requirements?
- Install the latest nightlies:
- CUDA 11.7
- pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
- CUDA 11.6
- pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu116
- CPU
- pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cpu
-
-
- - Is 2.0 code backwards-compatible with 1.X?
- Yes, using 2.0 will not require you to modify your PyTorch workflows. A single line of code model = torch.compile(model)
can optimize your model to use the 2.0 stack, and smoothly run with the rest of your PyTorch code. This is completely opt-in, and you are not required to use the new compiler.
-
-
- - Is 2.0 enabled by default?
- 2.0 is the name of the release. torch.compile is the feature released in 2.0, and you need to explicitly use torch.compile.
-
-
- - How do I migrate my PT1.X code to PT2.0?
- Your code should be working as-is without the need for any migrations. If you want to use the new Compiled mode feature introduced in 2.0, then you can start by optimizing your model with one line:
- model = torch.compile(model)
While the speedups are primarily observed during training, you can also use it for inference if your model runs faster than eager mode.
-
- import torch
+## FAQs
+
+1. **What is PT 2.0?**
+2.0 is the latest PyTorch version. PyTorch 2.0 offers the same eager-mode development experience, while adding a compiled mode via torch.compile. This compiled mode has the potential to speedup your models during training and inference.
+
+
+2. **Why 2.0 instead of 1.14?**
+PyTorch 2.0 is what 1.14 would have been. We were releasing substantial new features that we believe change how you meaningfully use PyTorch, so we are calling it 2.0 instead.
+
+3. **How do I install 2.0? Any additional requirements?**
+Install the latest nightlies:
+CUDA 11.7
+```
+pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
+```
+CUDA 11.6
+```
+pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu116
+```
+CPU
+```
+pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cpu
+```
+
+4. **Is 2.0 code backwards-compatible with 1.X?**
+Yes, using 2.0 will not require you to modify your PyTorch workflows. A single line of code `model = torch.compile(model)` can optimize your model to use the 2.0 stack, and smoothly run with the rest of your PyTorch code. This is completely opt-in, and you are not required to use the new compiler.
+
+5. **Is 2.0 enabled by default?**
+2.0 is the name of the release. torch.compile is the feature released in 2.0, and you need to explicitly use torch.compile.
+
+6. **How do I migrate my PT1.X code to PT2.0?**
+Your code should be working as-is without the need for any migrations. If you want to use the new Compiled mode feature introduced in 2.0, then you can start by optimizing your model with one line: `model = torch.compile(model)`.
+While the speedups are primarily observed during training, you can also use it for inference if your model runs faster than eager mode.
+ ```python
+ import torch
+
+ def train(model, dataloader):
+ model = torch.compile(model)
+ for batch in dataloader:
+ run_epoch(model, batch)
+
+ def infer(model, input):
+ model = torch.compile(model)
+ return model(\*\*input)
+ ```
+
+7. **Why should I use PT2.0 instead of PT 1.X?**
+See answer to Question (2)
+
+
+8. **Are there any applications where I should NOT use PT 2.0?**
+The current release of PT 2.0 is still experimental and in the nightlies. Dynamic shapes support in torch.compile is still early, and you should not be using it yet, and wait until the Stable 2.0 release lands in March 2023.
+That said, even with static-shaped workloads, we’re still building Compiled mode and there might be bugs. Disable Compiled mode for parts of your code that are crashing, and raise an [issue](https://github.com/pytorch/pytorch/issues) (if it isn’t raised already).
+
+9. **What is my code doing differently when running PyTorch 2.0?**
+Out of the box, PyTorch 2.0 is the same as PyTorch 1.x, your models run in eager-mode i.e. every line of Python is executed one after the other.
+In 2.0, if you wrap your model in `model = torch.compile(model)`, your model goes through 3 steps before execution:
+ 1. Graph acquisition: first the model is rewritten as blocks of subgraphs. Subgraphs which can be compiled by TorchDynamo are “flattened” and the other subgraphs (which might contain control-flow code or other unsupported Python constructs) will fall back to Eager-Mode.
+ 2. Graph lowering: all the PyTorch operations are decomposed into their constituent kernels specific to the chosen backend.
+ 3. Graph compilation, where the kernels call their corresponding low-level device-specific operations.
+
+10. **What new components does PT2.0 add to PT?**
+ - **TorchDynamo** generates FX Graphs from Python bytecode. It maintains the eager-mode capabilities using [guards](https://pytorch.org/docs/master/dynamo/guards-overview.html#caching-and-guards-overview) to ensure the generated graphs are valid ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-an-experiment-in-dynamic-python-bytecode-transformation/361))
+ - **AOTAutograd** to generate the backward graph corresponding to the forward graph captured by TorchDynamo ([read more](https://dev-discuss.pytorch.org/t/torchdynamo-update-6-training-support-with-aotautograd/570)).
+ - **PrimTorch** to decompose complicated PyTorch operations into simpler and more elementary ops ([read more](https://dev-discuss.pytorch.org/t/tracing-with-primitives-update-2/645)).
+ - **\[Backend]** Backends integrate with TorchDynamo to compile the graph into IR that can run on accelerators. For example, **TorchInductor** compiles the graph to either **Triton** for GPU execution or **OpenMP** for CPU execution ([read more](https://dev-discuss.pytorch.org/t/torchinductor-a-pytorch-native-compiler-with-define-by-run-ir-and-symbolic-shapes/747)).
+
+11. **What compiler backends does 2.0 currently support?**
+The default and the most complete backend is [TorchInductor](https://github.com/pytorch/pytorch/tree/master/torch/_inductor), but TorchDynamo has a growing list of backends that can be found by calling `torchdynamo.list_backends()`.
-def train(model, dataloader):
- model = torch.compile(model)
- for batch in dataloader:
- run_epoch(model, batch)
-
-def infer(model, input):
- model = torch.compile(model)
- return model(\*\*input)
-
- - Why should I use PT2.0 instead of PT 1.X?
- See answer to Question (2)
-
-
- - Are there any applications where I should NOT use PT 2.0?
- The current release of PT 2.0 is still experimental and in the nightlies. Dynamic shapes support in torch.compile is still early, and you should not be using it yet, and wait until the Stable 2.0 release lands in March 2023.
-
- That said, even with static-shaped workloads, we’re still building Compiled mode and there might be bugs. Disable Compiled mode for parts of your code that are crashing, and raise an issue (if it isn’t raised already).
-
-
- - What is my code doing differently when running PyTorch 2.0?
- Out of the box, PyTorch 2.0 is the same as PyTorch 1.x, your models run in eager-mode i.e. every line of Python is executed one after the other.
-
- In 2.0, if you wrap your model in model = torch.compile(model)
, your model goes through 3 steps before execution:
-
- 1. Graph acquisition: first the model is rewritten as blocks of subgraphs. Subgraphs which can be compiled by TorchDynamo are “flattened” and the other subgraphs (which might contain control-flow code or other unsupported Python constructs) will fall back to Eager-Mode.
-
- 2. Graph lowering: all the PyTorch operations are decomposed into their constituent kernels specific to the chosen backend.
-
- 3. Graph compilation, where the kernels call their corresponding low-level device-specific operations
-
-
- - What new components does PT2.0 add to PT?
-
- - TorchDynamo generates FX Graphs from Python bytecode. It maintains the eager-mode capabilities using guards to ensure the generated graphs are valid (read more)
-
- - AOTAutograd to generate the backward graph corresponding to the forward graph captured by TorchDynamo (read more)
-
- - AOTAutograd to generate the backward graph corresponding to the forward graph captured by TorchDynamo (read more)
-
- - PrimTorch to decompose complicated PyTorch operations into simpler and more elementary ops (read more).
-
- - [Backend] Backends integrate with TorchDynamo to compile the graph into IR that can run on accelerators. For example, TorchInductor compiles the graph to either Triton for GPU execution or OpenMP for CPU execution (read more).
-
-
-
-
- - What compiler backends does 2.0 currently support?
-
The default and the most complete backend is TorchInductor, but TorchDynamo has a growing list of backends that can be found by calling torchdynamo.list_backends().
-
-
-
- - How does distributed training work with 2.0?
-
DDP and FSDP in Compiled mode can run up to 15% faster than Eager-Mode in FP32 and up to 80% faster in AMP precision. PT2.0 does some extra optimization to ensure DDP’s communication-computation overlap works well with Dynamo’s partial graph creation. Ensure you run DDP with static_graph=False. More details here.
-
-
- - How can I learn more about PT2.0 developments?
-
The PyTorch Developers forum is the best place to learn about 2.0 components directly from the developers who build them.
-
-
- - Help my code is running slower with 2.0’s Compiled Mode!
-
The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more here.
-
-
-
- - My previously-running code is crashing with 2.0’s Compiled Mode! How do I debug it?
-
Here are some techniques to triage where your code might be failing, and printing helpful logs: https://pytorch.org/docs/master/dynamo/faq.html#why-is-my-code-crashing
-
-
-
+12. **How does distributed training work with 2.0?**
+DDP and FSDP in Compiled mode can run up to 15% faster than Eager-Mode in FP32 and up to 80% faster in AMP precision. PT2.0 does some extra optimization to ensure DDP’s communication-computation overlap works well with Dynamo’s partial graph creation. Ensure you run DDP with static_graph=False. More details [here](https://dev-discuss.pytorch.org/t/torchdynamo-update-9-making-ddp-work-with-torchdynamo/860).
+
+13. **How can I learn more about PT2.0 developments?**
+The [PyTorch Developers forum](http://dev-discuss.pytorch.org/) is the best place to learn about 2.0 components directly from the developers who build them.
+
+14. **Help my code is running slower with 2.0’s Compiled Mode!**
+The most likely reason for performance hits is too many graph breaks. For instance, something innocuous as a print statement in your model’s forward triggers a graph break. We have ways to diagnose these - read more [here](https://pytorch.org/docs/master/dynamo/faq.html#why-am-i-not-seeing-speedups).
+
+15. **My previously-running code is crashing with 2.0’s Compiled Mode! How do I debug it?**
+Here are some techniques to triage where your code might be failing, and printing helpful logs: [https://pytorch.org/docs/master/dynamo/faq.html#why-is-my-code-crashing](https://pytorch.org/docs/master/dynamo/faq.html#why-is-my-code-crashing).
## Ask the Engineers: 2.0 Live Q&A Series
diff --git a/_sass/get-started.scss b/_sass/get-started.scss
index c040a7b342b6..fd638655a43e 100644
--- a/_sass/get-started.scss
+++ b/_sass/get-started.scss
@@ -44,7 +44,7 @@
}
.nav-item {
- padding: 2rem;
+ padding: 1rem;
cursor: pointer;
}
@@ -66,9 +66,7 @@
.nav-link {
font-size: rem(18px);
color: #8c8c8c;
- @include desktop {
- margin-left: rem(30px);
- }
+
&:hover {
color: $orange;
}
@@ -211,6 +209,8 @@
@include desktop {
padding-top: 0;
+ max-height: 100vh;
+ overflow: auto;
}
ul {
From 0a19f676e66df358f91b32f2e4d2d97b3da63683 Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Thu, 8 Dec 2022 20:39:04 +0000
Subject: [PATCH 6/8] some small nits
---
_get_started/pytorch.md | 52 ++++++++++++++++++++---------------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index 146f7a800c50..89ccde944fb3 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -451,20 +451,22 @@ After all, we can’t claim we’re created a breadth-first unless **YOUR** mode
2. **Why 2.0 instead of 1.14?**
PyTorch 2.0 is what 1.14 would have been. We were releasing substantial new features that we believe change how you meaningfully use PyTorch, so we are calling it 2.0 instead.
-3. **How do I install 2.0? Any additional requirements?**
-Install the latest nightlies:
-CUDA 11.7
-```
-pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
-```
-CUDA 11.6
-```
-pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu116
-```
-CPU
-```
-pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cpu
-```
+3. **How do I install 2.0? Any additional requirements?**
+
+ Install the latest nightlies:
+
+ CUDA 11.7
+ ```
+ pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117
+ ```
+ CUDA 11.6
+ ```
+ pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu116
+ ```
+ CPU
+ ```
+ pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cpu
+ ```
4. **Is 2.0 code backwards-compatible with 1.X?**
Yes, using 2.0 will not require you to modify your PyTorch workflows. A single line of code `model = torch.compile(model)` can optimize your model to use the 2.0 stack, and smoothly run with the rest of your PyTorch code. This is completely opt-in, and you are not required to use the new compiler.
@@ -472,9 +474,9 @@ Yes, using 2.0 will not require you to modify your PyTorch workflows. A single l
5. **Is 2.0 enabled by default?**
2.0 is the name of the release. torch.compile is the feature released in 2.0, and you need to explicitly use torch.compile.
-6. **How do I migrate my PT1.X code to PT2.0?**
+6. **How do I migrate my PT1.X code to PT2.0?**
Your code should be working as-is without the need for any migrations. If you want to use the new Compiled mode feature introduced in 2.0, then you can start by optimizing your model with one line: `model = torch.compile(model)`.
-While the speedups are primarily observed during training, you can also use it for inference if your model runs faster than eager mode.
+While the speedups are primarily observed during training, you can also use it for inference if your model runs faster than eager mode.
```python
import torch
@@ -489,7 +491,7 @@ While the speedups are primarily observed during training, you can also use it f
```
7. **Why should I use PT2.0 instead of PT 1.X?**
-See answer to Question (2)
+See answer to Question (2).
8. **Are there any applications where I should NOT use PT 2.0?**
The current release of PT 2.0 is still experimental and in the nightlies. Dynamic shapes support in torch.compile is still early, and you should not be using it yet, and wait until the Stable 2.0 release lands in March 2023.
@@ -525,11 +527,11 @@ Here are some techniques to triage where your code might be failing, and printin
## Ask the Engineers: 2.0 Live Q&A Series
-We will be hosting a series of live Q&A sessions for the community to have deeper questions and dialogue with the experts. Please check back to see the full calendar of topics throughout the year. If you are unable to attend: 1) They will be recorded for future viewing and 2) You can attend our Dev Infra Office Hours every Friday at 10 AM PST @ [https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours)
+We will be hosting a series of live Q&A sessions for the community to have deeper questions and dialogue with the experts. Please check back to see the full calendar of topics throughout the year. If you are unable to attend: 1) They will be recorded for future viewing and 2) You can attend our Dev Infra Office Hours every Friday at 10 AM PST @ [https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours).
-Please click [here](https://pytorchconference22.splashthat.com/) to see dates, times, descriptions and links
+Please click [here](https://pytorchconference22.splashthat.com/) to see dates, times, descriptions and links.
-Disclaimer: Please do not share your personal information, last name, company when joining the live sessions and submitting questions
+Disclaimer: Please do not share your personal information, last name, company when joining the live sessions and submitting questions.
@@ -667,11 +669,9 @@ Disclaimer: Please do not share your personal information, last name, company wh
## Watch the Talks from PyTorch Conference
-
+- [TorchDynamo](https://www.youtube.com/watch?v=vbtGZL7IrAw)
+- [TorchInductor](https://www.youtube.com/watch?v=vbtGZL7IrAw)
+- [Dynamic Shapes](https://www.youtube.com/watch?v=vbtGZL7IrAw)
+- [Export Path](https://www.youtube.com/watch?v=vbtGZL7IrAw)
From f975594ff6dfb2180b30a0c1377c9fa31e978deb Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Thu, 8 Dec 2022 21:04:00 +0000
Subject: [PATCH 7/8] Correcting copy
---
_get_started/pytorch.md | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/_get_started/pytorch.md b/_get_started/pytorch.md
index 89ccde944fb3..90c0630e5e62 100644
--- a/_get_started/pytorch.md
+++ b/_get_started/pytorch.md
@@ -271,9 +271,9 @@ This is in early stages of development. Catch the talk on Export Path at the PyT
A compiled mode is opaque and hard to debug. You will have questions such as:
-- why is my program crashing in compiled mode?
-- is compiled mode as accurate as eager mode?
-- why am I not seeing speedups?
+- Why is my program crashing in compiled mode?
+- Is compiled mode as accurate as eager mode?
+- Why am I not seeing speedups?
If compiled mode produces an error or a crash or diverging results from eager mode (beyond machine precision limits), it is very unlikely that it is your code’s fault. However, understanding what piece of code is the reason for the bug is useful.
@@ -320,12 +320,6 @@ Right: FSDP in Compiled mode takes substantially lesser memory than in eager mod
-External launcher scripts and wrappers that simply apply DDP under the hood generally should work out of the box. Hugging Face Accelerate, Lightning, torchrun, and Ray Train have all been tested and verified working. DeepSpeed and Horovod have not been tested and we expect to enable them soon.
-
-Manual gradient checkpointing (i.e. `torch.utils.checkpoint*` ) is in the works, and expected to be enabled in the near future. There is ongoing work to enable it, and this is partially mitigated by AOTAutograd’s min-cut partitioner, which recomputes some values in the `backward` call to reduce peak memory usage. This is evident from the memory compression results shown in the graph with FSDP in compiled mode.
-
-Other experimental distributed subsystems, such as DistributedTensor and PiPPy, have not yet been tested with TorchDynamo.
-
### DistributedDataParallel (DDP)
DDP relies on overlapping AllReduce communications with backwards computation, and grouping smaller per-layer AllReduce operations into ‘buckets’ for greater efficiency. AOTAutograd functions compiled by TorchDynamo prevent communication overlap, when combined naively with DDP, but performance is recovered by compiling separate subgraphs for each ‘bucket’ and allowing communication ops to happen outside and in-between the subgraphs. DDP support in compiled mode also currently requires `static_graph=True` and `find_unused_parameters=True`, but these shouldn’t be a long term requirement. See [this post](https://dev-discuss.pytorch.org/t/torchdynamo-update-9-making-ddp-work-with-torchdynamo/860) for more details on the approach and results for DDP + TorchDynamo.
@@ -350,7 +344,8 @@ In graphical form, the PT2 stack looks like:
Starting in the middle of the diagram, AOTAutograd dynamically captures autograd logic in an ahead-of-time fashion, producing a graph of forward and backwards operators in FX graph format.
-We provide a set of hardened decompositions (i.e. operator implementations written in terms of other operators) that can be leveraged to **reduce** the number of operators a backend is required to implement. We also **simplify** the semantics of PyTorch operators by selectively rewriting complicated PyTorch logic including mutations and views via a process called _functionalization_, as well as guaranteeing operator metadata information such as shape propagation formulas. This work is actively in progress; our goal is to provide a _primitive_ and _stable_ set of ~250 operators with simplified semantics, called _PrimTorch,_ that vendors can leverage (i.e. opt-in to) in order to simplify their integrations.
+We provide a set of hardened decompositions (i.e. operator implementations written in terms of other operators) that can be leveraged to **reduce** the number of operators a backend is required to implement. We also **simplify** the semantics of PyTorch operators by selectively rewriting complicated PyTorch logic including mutations and views via a process called _functionalization_, as well as guaranteeing operator metadata information such as shape propagation formulas. This work is actively in progress; our goal is to provide a _primitive_ and _stable_ set of ~250 operators with simplified semantics, called _PrimTorch,_ that vendors can leverage (i.e. opt-in to) in order to simplify their integrations.
+After reducing and simplifying the operator set, backends may choose to integrate at the Dynamo (i.e. the middle layer, immediately after AOTAutograd) or Inductor (the lower layer). We describe some considerations in making this choice below, as well as future work around mixtures of backends.
**Dynamo Backend**
From 1ab4959ac7ac5cdaf3cf66e33030c6a48ba9ea03 Mon Sep 17 00:00:00 2001
From: Rita <59368559+ritaiglesias-96@users.noreply.github.com>
Date: Fri, 9 Dec 2022 20:14:47 +0000
Subject: [PATCH 8/8] Fixing z-index
---
_sass/get-started.scss | 1 +
1 file changed, 1 insertion(+)
diff --git a/_sass/get-started.scss b/_sass/get-started.scss
index fd638655a43e..f141d35bb92a 100644
--- a/_sass/get-started.scss
+++ b/_sass/get-started.scss
@@ -206,6 +206,7 @@
padding-top: rem(40px);
padding-bottom: rem(40px);
top: 15%;
+ z-index: 385;
@include desktop {
padding-top: 0;