You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docsrc/tutorials/getting_started_with_fx_path.rst
+37-2Lines changed: 37 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,42 @@ user want to use this tool and we will introduce them here.
20
20
Converting a PyTorch Model to TensorRT Engine
21
21
---------------------------------------------
22
22
In general, users are welcome to use the ``compile()`` to finish the conversion from a model to tensorRT engine. It is a
23
-
wrapper API that consists of the major steps needed to finish this converison. Please refer to ``lower_example.py`` file in ``examples/fx``.
23
+
wrapper API that consists of the major steps needed to finish this converison. Please refer to an example usage in ``lower_example.py`` file under ``examples/fx``.
24
+
25
+
.. code-block:: shell
26
+
27
+
def compile(
28
+
module: nn.Module,
29
+
input,
30
+
max_batch_size: int = 2048,
31
+
max_workspace_size=1 <<25,
32
+
explicit_batch_dimension=False,
33
+
lower_precision=LowerPrecision.FP16,
34
+
verbose_log=False,
35
+
timing_cache_prefix="",
36
+
save_timing_cache=False,
37
+
cuda_graph_batch_size=-1,
38
+
dynamic_batch=True,
39
+
) -> nn.Module:
40
+
"""
41
+
Takes in original module, input and lowering setting, run lowering workflow to turn module
42
+
into lowered module, or so called TRTModule.
43
+
44
+
Args:
45
+
module: Original module for lowering.
46
+
input: Input for module.
47
+
max_batch_size: Maximum batch size (must be >= 1 to be set, 0 means not set)
48
+
max_workspace_size: Maximum size of workspace given to TensorRT.
49
+
explicit_batch_dimension: Use explicit batch dimension in TensorRT if set True, otherwise use implicit batch dimension.
50
+
lower_precision: lower_precision config given to TRTModule.
51
+
verbose_log: Enable verbose log for TensorRT if set True.
52
+
timing_cache_prefix: Timing cache file name for timing cache used by fx2trt.
53
+
save_timing_cache: Update timing cache with current timing cache data if set to True.
54
+
cuda_graph_batch_size: Cuda graph batch size, default to be -1.
55
+
dynamic_batch: batch dimension (dim=0) is dynamic.
56
+
Returns:
57
+
A torch.nn.Module lowered by TensorRT.
58
+
"""
24
59
25
60
In this section, we will go through an example to illustrate the major steps that fx path uses.
26
61
Users can refer to ``fx2trt_example.py`` file in ``examples/fx``.
@@ -56,7 +91,7 @@ Explicit batch is the default mode and it must be set for dynamic shape. For mos
56
91
57
92
For examples of the last path, if we have a 3D tensor t shaped as (batch, sequence, dimension), operations such as torch.transpose(0, 2). If any of these three are not satisfied, we’ll need to specify InputTensorSpec as inputs with dynamic range.
0 commit comments