@@ -39,13 +39,16 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r
39
39
GPU if they are not supported on DLA
40
40
--require-full-compilation Require that the model should be fully
41
41
compiled to TensorRT or throw an error
42
+ --check-method-support=[method_name]
43
+ Check the support for end to end
44
+ compilation of a specified method in the
45
+ TorchScript module
42
46
--disable-tf32 Prevent Float32 layers from using the
43
47
TF32 data format
44
48
--sparse-weights Enable sparsity for weights of conv and
45
49
FC layers
46
50
-p[precision...],
47
- --enabled-precision=[precision...]
48
- (Repeatable) Enabling an operating
51
+ --enable-precision=[precision...] (Repeatable) Enabling an operating
49
52
precision for kernels to use when
50
53
building the engine (Int8 requires a
51
54
calibration-cache argument) [ float |
@@ -64,20 +67,18 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r
64
67
--calibration-cache-file=[file_path]
65
68
Path to calibration cache file to use
66
69
for post training quantization
67
- --teo=[torch-executed-ops...],
68
- --torch-executed-ops=[torch-executed-ops...]
69
- (Repeatable) Operator in the graph that
70
+ --teo=[op_name...],
71
+ --torch-executed-op=[op_name...] (Repeatable) Operator in the graph that
70
72
should always be run in PyTorch for
71
73
execution (partial compilation must be
72
74
enabled)
73
- --tem=[torch-executed-mods ...],
74
- --torch-executed-mods=[torch-executed-mods ...]
75
+ --tem=[module_name ...],
76
+ --torch-executed-mod=[module_name ...]
75
77
(Repeatable) Module that should always
76
78
be run in Pytorch for execution (partial
77
79
compilation must be enabled)
78
- --mbs=[torch-executed-mods...],
79
- --min-block-size=[torch-executed-mods...]
80
- Minimum number of contiguous TensorRT
80
+ --mbs=[num_ops],
81
+ --min-block-size=[num_ops] Minimum number of contiguous TensorRT
81
82
supported ops to compile a subgraph to
82
83
TensorRT
83
84
--embed-engine Whether to treat input file as a
@@ -119,114 +120,6 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r
119
120
32)@f16%NHWC"
120
121
"--" can be used to terminate flag options and force all following
121
122
arguments to be treated as positional options
122
- [input_specs...] {OPTIONS}
123
-
124
- torchtrtc is a compiler for TorchScript, it will compile and optimize
125
- TorchScript programs to run on NVIDIA GPUs using TensorRT
126
-
127
- OPTIONS:
128
-
129
- -h, --help Display this help menu
130
- Verbiosity of the compiler
131
- -v, --verbose Dumps debugging information about the
132
- compilation process onto the console
133
- -w, --warnings Disables warnings generated during
134
- compilation onto the console (warnings
135
- are on by default)
136
- --i, --info Dumps info messages generated during
137
- compilation onto the console
138
- --build-debuggable-engine Creates a debuggable engine
139
- --use-strict-types Restrict operating type to only use set
140
- operation precision
141
- --allow-gpu-fallback (Only used when targeting DLA
142
- (device-type)) Lets engine run layers on
143
- GPU if they are not supported on DLA
144
- --require-full-compilation Require that the model should be fully
145
- compiled to TensorRT or throw an error
146
- --is-supported=[method_name],
147
- --supported=[method_name],
148
- --check-support=[method_name],
149
- --check-method-op-support=[method_name]
150
- Check the support for end to end
151
- compilation of a specified method in the
152
- TorchScript module
153
- --disable-tf32 Prevent Float32 layers from using the
154
- TF32 data format
155
- --sparse-weights Enable sparsity for weights of conv and
156
- FC layers
157
- -p[precision...],
158
- --enable-precision=[precision...] (Repeatable) Enabling an operating
159
- precision for kernels to use when
160
- building the engine (Int8 requires a
161
- calibration-cache argument) [ float |
162
- float32 | f32 | fp32 | half | float16 |
163
- f16 | fp16 | int8 | i8 | char ]
164
- (default: float)
165
- -d[type], --device-type=[type] The type of device the engine should be
166
- built for [ gpu | dla ] (default: gpu)
167
- --gpu-id=[gpu_id] GPU id if running on multi-GPU platform
168
- (defaults to 0)
169
- --dla-core=[dla_core] DLACore id if running on available DLA
170
- (defaults to 0)
171
- --engine-capability=[capability] The type of device the engine should be
172
- built for [ standard | safety |
173
- dla_standalone ]
174
- --calibration-cache-file=[file_path]
175
- Path to calibration cache file to use
176
- for post training quantization
177
- --teo=[op_name...],
178
- --torch-executed-op=[op_name...] (Repeatable) Operator in the graph that
179
- should always be run in PyTorch for
180
- execution (partial compilation must be
181
- enabled)
182
- --tem=[module_name...],
183
- --torch-executed-mod=[module_name...]
184
- (Repeatable) Module that should always
185
- be run in Pytorch for execution (partial
186
- compilation must be enabled)
187
- --mbs=[min-block-size],
188
- --min-block-size=[min-block-size] Minimum number of contiguous TensorRT
189
- supported ops to compile a subgraph to
190
- TensorRT
191
- --embed-engine Whether to treat input file as a
192
- serialized TensorRT engine and embed it
193
- into a TorchScript module (device spec
194
- must be provided)
195
- --num-min-timing-iter=[num_iters] Number of minimization timing iterations
196
- used to select kernels
197
- --num-avg-timing-iters=[num_iters]
198
- Number of averaging timing iterations
199
- used to select kernels
200
- --workspace-size=[workspace_size] Maximum size of workspace given to
201
- TensorRT
202
- -t[threshold],
203
- --threshold=[threshold] Maximum acceptable numerical deviation
204
- from standard torchscript output
205
- (default 2e-5)
206
- --no-threshold-check Skip checking threshold compliance
207
- --truncate-long-double,
208
- --truncate, --truncate-64bit Truncate weights that are provided in
209
- 64bit to 32bit (Long, Double to Int,
210
- Float)
211
- --save-engine Instead of compiling a full a
212
- TorchScript program, save the created
213
- engine to the path specified as the
214
- output path
215
- input_file_path Path to input TorchScript file
216
- output_file_path Path for compiled TorchScript (or
217
- TensorRT engine) file
218
- input_specs... Specs for inputs to engine, can either
219
- be a single size or a range defined by
220
- Min, Optimal, Max sizes, e.g.
221
- "(N,..,C,H,W)"
222
- "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
223
- Data Type and format can be specified by
224
- adding an "@" followed by dtype and "%"
225
- followed by format to the end of the
226
- shape spec. e.g. "(3, 3, 32,
227
- 32)@f16%NHWC"
228
- "--" can be used to terminate flag options and force all following
229
- arguments to be treated as positional options
230
123
231
124
e.g.
232
125
0 commit comments