Skip to content

Commit 46d2edc

Browse files
authored
Merge branch 'main' into gds_tutorial
2 parents 228614e + 5c60fe0 commit 46d2edc

File tree

10 files changed

+313
-39
lines changed

10 files changed

+313
-39
lines changed

_templates/layout.html

-9
Original file line numberDiff line numberDiff line change
@@ -211,14 +211,5 @@
211211

212212
<img height="1" width="1" style="border-style:none;" alt="" src="https://www.googleadservices.com/pagead/conversion/795629140/?label=txkmCPmdtosBENSssfsC&amp;guid=ON&amp;script=0"/>
213213

214-
<script>
215-
//temporarily add a link to survey
216-
var survey = '<div class="survey-banner"><p><i class="fas fa-poll" aria-hidden="true">&nbsp </i> Take the <a href="https://forms.gle/KZ4xGL65VRMYNbbG6">PyTorch Docs/Tutorials survey</a>.</p></div>'
217-
if ($(".pytorch-call-to-action-links").length) {
218-
$(".pytorch-call-to-action-links").before(survey);
219-
} else {
220-
$("#pytorch-article").prepend(survey);
221-
}
222-
</script>
223214

224215
{% endblock %}

beginner_source/basics/optimization_tutorial.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ def forward(self, x):
7676
# (`read more <https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html>`__ about hyperparameter tuning)
7777
#
7878
# We define the following hyperparameters for training:
79-
# - **Number of Epochs** - the number times to iterate over the dataset
79+
# - **Number of Epochs** - the number of times to iterate over the dataset
8080
# - **Batch Size** - the number of data samples propagated through the network before the parameters are updated
8181
# - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.
8282
#

en-wordlist.txt

+11
Original file line numberDiff line numberDiff line change
@@ -698,3 +698,14 @@ TorchServe
698698
Inductor’s
699699
onwards
700700
recompilations
701+
BiasCorrection
702+
ELU
703+
GELU
704+
NNCF
705+
OpenVINO
706+
OpenVINOQuantizer
707+
PReLU
708+
Quantizer
709+
SmoothQuant
710+
quantizer
711+
quantizers

index.rst

+5-7
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,11 @@ Welcome to PyTorch Tutorials
33

44
**What's new in PyTorch tutorials?**
55

6-
* `Dynamic Compilation Control with torch.compiler.set_stance <https://pytorch.org/tutorials/recipes/torch_compiler_set_stance_tutorial.html>`__
7-
* `Accelerating PyTorch Transformers by replacing nn.Transformer with Nested Tensors and torch.compile() <https://pytorch.org/tutorials/intermediate/transformer_building_blocks.html>`__
8-
* `Understanding the torch.export Flow and Solutions to Common Challenges <https://pytorch.org/tutorials/recipes/torch_export_challenges_solutions.html>`__
9-
* Updated `torch.export Tutorial <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html#constraints-dynamic-shapes>`__ with automatic dynamic shapes ``Dim.AUTO``
10-
* Updated `torch.export AOTInductor Tutorial for Python runtime <https://pytorch.org/tutorials/recipes/torch_export_aoti_python.html>`__
11-
* Updated `Using User-Defined Triton Kernels with torch.compile <https://pytorch.org/tutorials/recipes/torch_compile_user_defined_triton_kernel_tutorial.html#composability>`__ with new ``torch.library.triton_op``
12-
* Updated `Compile Time Caching in torch.compile <https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html>`__ with new ``Mega-Cache``
6+
* `Utilizing Torch Function modes with torch.compile <https://pytorch.org/tutorials/recipes/torch_compile_torch_function_modes.html>`__
7+
* `Context Parallel Tutorial <https://pytorch.org/tutorials/prototype/context_parallel.html>`__
8+
* `PyTorch 2 Export Quantization with Intel GPU Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html>`__
9+
* `(beta) Explicit horizontal fusion with foreach_map and torch.compile <https://pytorch.org/tutorials/recipes/foreach_map.html>`__
10+
* Updated `Inductor Windows CPU Tutorial <https://pytorch.org/tutorials/prototype/inductor_windows.html>`__
1311

1412
.. raw:: html
1513

intermediate_source/torch_compile_tutorial.py

+13-4
Original file line numberDiff line numberDiff line change
@@ -101,8 +101,11 @@ def forward(self, x):
101101
return torch.nn.functional.relu(self.lin(x))
102102

103103
mod = MyModule()
104-
opt_mod = torch.compile(mod)
105-
print(opt_mod(t))
104+
mod.compile()
105+
print(mod(t))
106+
## or:
107+
# opt_mod = torch.compile(mod)
108+
# print(opt_mod(t))
106109

107110
######################################################################
108111
# torch.compile and Nested Calls
@@ -135,8 +138,8 @@ def forward(self, x):
135138
return torch.nn.functional.relu(self.outer_lin(x))
136139

137140
outer_mod = OuterModule()
138-
opt_outer_mod = torch.compile(outer_mod)
139-
print(opt_outer_mod(t))
141+
outer_mod.compile()
142+
print(outer_mod(t))
140143

141144
######################################################################
142145
# We can also disable some functions from being compiled by using
@@ -197,6 +200,12 @@ def outer_function():
197200
# 4. **Compile Leaf Functions First:** In complex models with multiple nested
198201
# functions and modules, start by compiling the leaf functions or modules first.
199202
# For more information see `TorchDynamo APIs for fine-grained tracing <https://pytorch.org/docs/stable/torch.compiler_fine_grain_apis.html>`__.
203+
#
204+
# 5. **Prefer ``mod.compile()`` over ``torch.compile(mod)``:** Avoids ``_orig_`` prefix issues in ``state_dict``.
205+
#
206+
# 6. **Use ``fullgraph=True`` to catch graph breaks:** Helps ensure end-to-end compilation, maximizing speedup
207+
# and compatibility with ``torch.export``.
208+
200209

201210
######################################################################
202211
# Demonstrating Speedups

prototype_source/inductor_windows.rst

+15-13
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,9 @@ Install a Compiler
2222

2323
C++ compiler is required for TorchInductor optimization, let's take Microsoft Visual C++ (MSVC) as an example.
2424

25-
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
25+
#. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_.
2626

27-
1. During Installation, select **Workloads** and then **Desktop & Mobile**.
28-
1. Select a checkmark on **Desktop Development with C++** and install.
27+
#. During Installation, select **Workloads** and then **Desktop & Mobile**. Select a checkmark on **Desktop Development with C++** and install.
2928

3029
.. image:: ../_static/img/install_msvc.png
3130

@@ -44,18 +43,21 @@ Next, let's configure our environment.
4443
4544
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
4645
#. Create and activate a virtual environment: ::
46+
4747
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU <https://pytorch.org/docs/main/notes/get_start_xpu.html>`_ for XPU usage.
48+
4849
#. Here is an example of how to use TorchInductor on Windows:
49-
.. code-block:: python
50-
51-
import torch
52-
device="cpu" # or "xpu" for XPU
53-
def foo(x, y):
54-
a = torch.sin(x)
55-
b = torch.cos(x)
56-
return a + b
57-
opt_foo1 = torch.compile(foo)
58-
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device)))
50+
51+
.. code-block:: python
52+
53+
import torch
54+
device="cpu" # or "xpu" for XPU
55+
def foo(x, y):
56+
a = torch.sin(x)
57+
b = torch.cos(x)
58+
return a + b
59+
opt_foo1 = torch.compile(foo)
60+
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device)))
5961
6062
#. Below is the output of the above example::
6163

+250
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
PyTorch 2 Export Quantization for OpenVINO torch.compile Backend
2+
===========================================================================
3+
4+
**Authors**: `Daniil Lyakhov <https://github.com/daniil-lyakhov>`_, `Aamir Nazir <https://github.com/anzr299>`_, `Alexander Suslov <https://github.com/alexsu52>`_, `Yamini Nimmagadda <https://github.com/ynimmaga>`_, `Alexander Kozlov <https://github.com/AlexKoff88>`_
5+
6+
Prerequisites
7+
--------------
8+
- `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
9+
- `How to Write a Quantizer for PyTorch 2 Export Quantization <https://pytorch.org/tutorials/prototype/pt2e_quantizer.html>`_
10+
11+
Introduction
12+
--------------
13+
14+
.. note::
15+
16+
This is an experimental feature, the quantization API is subject to change.
17+
18+
This tutorial demonstrates how to use ``OpenVINOQuantizer`` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
19+
``OpenVINOQuantizer`` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
20+
21+
The PyTorch 2 export quantization flow uses ``torch.export`` to capture the model into a graph and performs quantization transformations on top of the ATen graph.
22+
This approach is expected to have significantly higher model coverage, improved flexibility, and a simplified UX.
23+
OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
24+
25+
The quantization flow mainly includes four steps:
26+
27+
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
28+
- Step 2: Apply the PyTorch 2 Export Quantization flow with OpenVINOQuantizer based on the captured FX Graph.
29+
- Step 3: Lower the quantized model into OpenVINO representation with the `torch.compile <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ API.
30+
- Optional step 4: : Improve quantized model metrics via `quantize_pt2e <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_ method.
31+
32+
The high-level architecture of this flow could look like this:
33+
34+
::
35+
36+
float_model(Python) Example Input
37+
\ /
38+
\ /
39+
—--------------------------------------------------------
40+
| export |
41+
—--------------------------------------------------------
42+
|
43+
FX Graph in ATen
44+
|
45+
| OpenVINOQuantizer
46+
| /
47+
—--------------------------------------------------------
48+
| prepare_pt2e |
49+
| | |
50+
| Calibrate
51+
| | |
52+
| convert_pt2e |
53+
—--------------------------------------------------------
54+
|
55+
Quantized Model
56+
|
57+
—--------------------------------------------------------
58+
| Lower into Inductor |
59+
—--------------------------------------------------------
60+
|
61+
OpenVINO model
62+
63+
Post Training Quantization
64+
----------------------------
65+
66+
Now, we will walk you through a step-by-step tutorial for how to use it with `torchvision resnet18 model <https://download.pytorch.org/models/resnet18-f37072fd.pth>`_
67+
for post training quantization.
68+
69+
Prerequisite: OpenVINO and NNCF installation
70+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
71+
OpenVINO and NNCF could be easily installed via `pip distribution <https://docs.openvino.ai/2024/get-started/install-openvino.html>`_:
72+
73+
.. code-block:: bash
74+
75+
pip install -U pip
76+
pip install openvino, nncf
77+
78+
79+
1. Capture FX Graph
80+
^^^^^^^^^^^^^^^^^^^^^
81+
82+
We will start by performing the necessary imports, capturing the FX Graph from the eager module.
83+
84+
.. code-block:: python
85+
86+
import copy
87+
import openvino.torch
88+
import torch
89+
import torchvision.models as models
90+
from torch.ao.quantization.quantize_pt2e import convert_pt2e
91+
from torch.ao.quantization.quantize_pt2e import prepare_pt2e
92+
93+
import nncf.torch
94+
95+
# Create the Eager Model
96+
model_name = "resnet18"
97+
model = models.__dict__[model_name](pretrained=True)
98+
99+
# Set the model to eval mode
100+
model = model.eval()
101+
102+
# Create the data, using the dummy data here as an example
103+
traced_bs = 50
104+
x = torch.randn(traced_bs, 3, 224, 224)
105+
example_inputs = (x,)
106+
107+
# Capture the FX Graph to be quantized
108+
with torch.no_grad(), nncf.torch.disable_patching():
109+
exported_model = torch.export.export(model, example_inputs).module()
110+
111+
112+
113+
2. Apply Quantization
114+
^^^^^^^^^^^^^^^^^^^^^^^
115+
116+
After we capture the FX Module to be quantized, we will import the OpenVINOQuantizer.
117+
118+
119+
.. code-block:: python
120+
121+
from nncf.experimental.torch.fx import OpenVINOQuantizer
122+
123+
quantizer = OpenVINOQuantizer()
124+
125+
``OpenVINOQuantizer`` has several optional parameters that allow tuning the quantization process to get a more accurate model.
126+
Below is the list of essential parameters and their description:
127+
128+
129+
* ``preset`` - defines quantization scheme for the model. Two types of presets are available:
130+
131+
* ``PERFORMANCE`` (default) - defines symmetric quantization of weights and activations
132+
133+
* ``MIXED`` - weights are quantized with symmetric quantization and the activations are quantized with asymmetric quantization. This preset is recommended for models with non-ReLU and asymmetric activation functions, e.g. ELU, PReLU, GELU, etc.
134+
135+
.. code-block:: python
136+
137+
OpenVINOQuantizer(preset=nncf.QuantizationPreset.MIXED)
138+
139+
* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
140+
141+
.. code-block:: python
142+
143+
OpenVINOQuantizer(model_type=nncf.ModelType.Transformer)
144+
145+
* ``ignored_scope`` - this parameter can be used to exclude some layers from the quantization process to preserve the model accuracy. For example, when you want to exclude the last layer of the model from quantization. Below are some examples of how to use this parameter:
146+
147+
.. code-block:: python
148+
149+
#Exclude by layer name:
150+
names = ['layer_1', 'layer_2', 'layer_3']
151+
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(names=names))
152+
153+
#Exclude by layer type:
154+
types = ['Conv2d', 'Linear']
155+
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(types=types))
156+
157+
#Exclude by regular expression:
158+
regex = '.*layer_.*'
159+
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(patterns=regex))
160+
161+
#Exclude by subgraphs:
162+
# In this case, all nodes along all simple paths in the graph
163+
# from input to output nodes will be excluded from the quantization process.
164+
subgraph = nncf.Subgraph(inputs=['layer_1', 'layer_2'], outputs=['layer_3'])
165+
OpenVINOQuantizer(ignored_scope=nncf.IgnoredScope(subgraphs=[subgraph]))
166+
167+
168+
* ``target_device`` - defines the target device, the specificity of which will be taken into account during optimization. The following values are supported: ``ANY`` (default), ``CPU``, ``CPU_SPR``, ``GPU``, and ``NPU``.
169+
170+
.. code-block:: python
171+
172+
OpenVINOQuantizer(target_device=nncf.TargetDevice.CPU)
173+
174+
For further details on `OpenVINOQuantizer` please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.OpenVINOQuantizer>`_.
175+
176+
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
177+
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
178+
179+
.. code-block:: python
180+
181+
prepared_model = prepare_pt2e(exported_model, quantizer)
182+
183+
Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model.
184+
185+
.. code-block:: python
186+
187+
# We use the dummy data as an example here
188+
prepared_model(*example_inputs)
189+
190+
Finally, we will convert the calibrated Model to a quantized Model. ``convert_pt2e`` takes a calibrated model and produces a quantized model.
191+
192+
.. code-block:: python
193+
194+
quantized_model = convert_pt2e(prepared_model, fold_quantize=False)
195+
196+
After these steps, we finished running the quantization flow, and we will get the quantized model.
197+
198+
199+
3. Lower into OpenVINO representation
200+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
201+
202+
After that the FX Graph can utilize OpenVINO optimizations using `torch.compile(…, backend=”openvino”) <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ functionality.
203+
204+
.. code-block:: python
205+
206+
with torch.no_grad(), nncf.torch.disable_patching():
207+
optimized_model = torch.compile(quantized_model, backend="openvino")
208+
209+
# Running some benchmark
210+
optimized_model(*example_inputs)
211+
212+
213+
214+
The optimized model is using low-level kernels designed specifically for Intel CPU.
215+
This should significantly speed up inference time in comparison with the eager model.
216+
217+
4. Optional: Improve quantized model metrics
218+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
219+
220+
NNCF implements advanced quantization algorithms like `SmoothQuant <https://arxiv.org/abs/2211.10438>`_ and `BiasCorrection <https://arxiv.org/abs/1906.04721>`_, which help
221+
to improve the quantized model metrics while minimizing the output discrepancies between the original and compressed models.
222+
These advanced NNCF algorithms can be accessed via the NNCF `quantize_pt2e` API:
223+
224+
.. code-block:: python
225+
226+
from nncf.experimental.torch.fx import quantize_pt2e
227+
228+
calibration_loader = torch.utils.data.DataLoader(...)
229+
230+
231+
def transform_fn(data_item):
232+
images, _ = data_item
233+
return images
234+
235+
236+
calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)
237+
quantized_model = quantize_pt2e(
238+
exported_model, quantizer, calibration_dataset, smooth_quant=True, fast_bias_correction=False
239+
)
240+
241+
242+
For further details, please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.quantize_pt2e>`_
243+
and a complete `example on Resnet18 quantization <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/torch_fx/resnet18/README.md>`_.
244+
245+
Conclusion
246+
------------
247+
248+
This tutorial introduces how to use torch.compile with the OpenVINO backend and the OpenVINO quantizer.
249+
For more details on NNCF and the NNCF Quantization Flow for PyTorch models, refer to the `NNCF Quantization Guide <https://docs.openvino.ai/2025/openvino-workflow/model-optimization-guide/quantizing-models-post-training/basic-quantization-flow.html.>`_.
250+
For additional information, check out the `OpenVINO Deployment via torch.compile Documentation <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_.

0 commit comments

Comments
 (0)