Skip to content

Commit b663154

Browse files
committed
refactor(//cpp/bin/torchtrtc): Updating help docs for torchtrtc
Signed-off-by: Naren Dasan <[email protected]> Signed-off-by: Naren Dasan <[email protected]>
1 parent 5adad06 commit b663154

File tree

3 files changed

+214
-100
lines changed

3 files changed

+214
-100
lines changed

cpp/bin/torchtrtc/README.md

Lines changed: 101 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -21,101 +21,107 @@ torchtrtc [input_file_path] [output_file_path]
2121
2222
OPTIONS:
2323
24-
-h, --help Display this help menu
25-
Verbiosity of the compiler
26-
-v, --verbose Dumps debugging information about the
27-
compilation process onto the console
28-
-w, --warnings Disables warnings generated during
29-
compilation onto the console (warnings
30-
are on by default)
31-
--i, --info Dumps info messages generated during
32-
compilation onto the console
33-
--build-debuggable-engine Creates a debuggable engine
34-
--allow-gpu-fallback (Only used when targeting DLA
35-
(device-type)) Lets engine run layers on
36-
GPU if they are not supported on DLA
37-
--require-full-compilation Require that the model should be fully
38-
compiled to TensorRT or throw an error
39-
--disable-tf32 Prevent Float32 layers from using the
40-
TF32 data format
41-
--sparse-weights Enable sparsity for weights of conv and
42-
FC layers
43-
-p[precision...],
44-
--enabled-precision=[precision...]
45-
(Repeatable) Enabling an operating
46-
precision for kernels to use when
47-
building the engine (Int8 requires a
48-
calibration-cache argument) [ float |
49-
float32 | f32 | fp32 | half | float16 |
50-
f16 | fp16 | int8 | i8 | char ]
51-
(default: float)
52-
-d[type], --device-type=[type] The type of device the engine should be
53-
built for [ gpu | dla ] (default: gpu)
54-
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
55-
(defaults to 0)
56-
--dla-core=[dla_core] DLACore id if running on available DLA
57-
(defaults to 0)
58-
--engine-capability=[capability] The type of device the engine should be
59-
built for [ standard | safety |
60-
dla_standalone ]
61-
--calibration-cache-file=[file_path]
62-
Path to calibration cache file to use
63-
for post training quantization
64-
--teo=[torch-executed-ops...],
65-
--torch-executed-ops=[torch-executed-ops...]
66-
(Repeatable) Operator in the graph that
67-
should always be run in PyTorch for
68-
execution (partial compilation must be
69-
enabled)
70-
--tem=[torch-executed-mods...],
71-
--torch-executed-mods=[torch-executed-mods...]
72-
(Repeatable) Module that should always
73-
be run in Pytorch for execution (partial
74-
compilation must be enabled)
75-
--mbs=[torch-executed-mods...],
76-
--min-block-size=[torch-executed-mods...]
77-
Minimum number of contiguous TensorRT
78-
supported ops to compile a subgraph to
79-
TensorRT
80-
--embed-engine Whether to treat input file as a
81-
serialized TensorRT engine and embed it
82-
into a TorchScript module (device spec
83-
must be provided)
84-
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
85-
used to select kernels
86-
--num-avg-timing-iters=[num_iters]
87-
Number of averaging timing iterations
88-
used to select kernels
89-
--workspace-size=[workspace_size] Maximum size of workspace given to
90-
TensorRT
91-
-t[threshold],
92-
--threshold=[threshold] Maximum acceptable numerical deviation
93-
from standard torchscript output
94-
(default 2e-5)
95-
--no-threshold-check Skip checking threshold compliance
96-
--truncate-long-double,
97-
--truncate, --truncate-64bit Truncate weights that are provided in
98-
64bit to 32bit (Long, Double to Int,
99-
Float)
100-
--save-engine Instead of compiling a full a
101-
TorchScript program, save the created
102-
engine to the path specified as the
103-
output path
104-
input_file_path Path to input TorchScript file
105-
output_file_path Path for compiled TorchScript (or
106-
TensorRT engine) file
107-
input_specs... Specs for inputs to engine, can either
108-
be a single size or a range defined by
109-
Min, Optimal, Max sizes, e.g.
110-
"(N,..,C,H,W)"
111-
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
112-
Data Type and format can be specified by
113-
adding an "@" followed by dtype and "%"
114-
followed by format to the end of the
115-
shape spec. e.g. "(3, 3, 32,
116-
32)@f16%NHWC"
117-
"--" can be used to terminate flag options and force all following
118-
arguments to be treated as positional options
24+
-h, --help Display this help menu
25+
Verbiosity of the compiler
26+
-v, --verbose Dumps debugging information about the
27+
compilation process onto the console
28+
-w, --warnings Disables warnings generated during
29+
compilation onto the console (warnings
30+
are on by default)
31+
--i, --info Dumps info messages generated during
32+
compilation onto the console
33+
--build-debuggable-engine Creates a debuggable engine
34+
--use-strict-types Restrict operating type to only use set
35+
operation precision
36+
--allow-gpu-fallback (Only used when targeting DLA
37+
(device-type)) Lets engine run layers on
38+
GPU if they are not supported on DLA
39+
--require-full-compilation Require that the model should be fully
40+
compiled to TensorRT or throw an error
41+
--is-supported=[method_name],
42+
--supported=[method_name],
43+
--check-support=[method_name],
44+
--check-method-op-support=[method_name]
45+
Check the support for end to end
46+
compilation of a specified method in the
47+
TorchScript module
48+
--disable-tf32 Prevent Float32 layers from using the
49+
TF32 data format
50+
--sparse-weights Enable sparsity for weights of conv and
51+
FC layers
52+
-p[precision...],
53+
--enable-precision=[precision...] (Repeatable) Enabling an operating
54+
precision for kernels to use when
55+
building the engine (Int8 requires a
56+
calibration-cache argument) [ float |
57+
float32 | f32 | fp32 | half | float16 |
58+
f16 | fp16 | int8 | i8 | char ]
59+
(default: float)
60+
-d[type], --device-type=[type] The type of device the engine should be
61+
built for [ gpu | dla ] (default: gpu)
62+
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
63+
(defaults to 0)
64+
--dla-core=[dla_core] DLACore id if running on available DLA
65+
(defaults to 0)
66+
--engine-capability=[capability] The type of device the engine should be
67+
built for [ standard | safety |
68+
dla_standalone ]
69+
--calibration-cache-file=[file_path]
70+
Path to calibration cache file to use
71+
for post training quantization
72+
--teo=[op_name...],
73+
--torch-executed-op=[op_name...] (Repeatable) Operator in the graph that
74+
should always be run in PyTorch for
75+
execution (partial compilation must be
76+
enabled)
77+
--tem=[module_name...],
78+
--torch-executed-mod=[module_name...]
79+
(Repeatable) Module that should always
80+
be run in Pytorch for execution (partial
81+
compilation must be enabled)
82+
--mbs=[min-block-size],
83+
--min-block-size=[min-block-size] Minimum number of contiguous TensorRT
84+
supported ops to compile a subgraph to
85+
TensorRT
86+
--embed-engine Whether to treat input file as a
87+
serialized TensorRT engine and embed it
88+
into a TorchScript module (device spec
89+
must be provided)
90+
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
91+
used to select kernels
92+
--num-avg-timing-iters=[num_iters]
93+
Number of averaging timing iterations
94+
used to select kernels
95+
--workspace-size=[workspace_size] Maximum size of workspace given to
96+
TensorRT
97+
-t[threshold],
98+
--threshold=[threshold] Maximum acceptable numerical deviation
99+
from standard torchscript output
100+
(default 2e-5)
101+
--no-threshold-check Skip checking threshold compliance
102+
--truncate-long-double,
103+
--truncate, --truncate-64bit Truncate weights that are provided in
104+
64bit to 32bit (Long, Double to Int,
105+
Float)
106+
--save-engine Instead of compiling a full a
107+
TorchScript program, save the created
108+
engine to the path specified as the
109+
output path
110+
input_file_path Path to input TorchScript file
111+
output_file_path Path for compiled TorchScript (or
112+
TensorRT engine) file
113+
input_specs... Specs for inputs to engine, can either
114+
be a single size or a range defined by
115+
Min, Optimal, Max sizes, e.g.
116+
"(N,..,C,H,W)"
117+
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
118+
Data Type and format can be specified by
119+
adding an "@" followed by dtype and "%"
120+
followed by format to the end of the
121+
shape spec. e.g. "(3, 3, 32,
122+
32)@f16%NHWC"
123+
"--" can be used to terminate flag options and force all following
124+
arguments to be treated as positional options
119125
```
120126

121127
e.g.

cpp/bin/torchtrtc/main.cpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ int main(int argc, char** argv) {
5454

5555
args::ValueFlag<std::string> check_method_op_support(
5656
parser,
57-
"check-method-op-support",
57+
"method_name",
5858
"Check the support for end to end compilation of a specified method in the TorchScript module",
5959
{"supported", "is-supported", "check-support", "check-method-op-support"});
6060

@@ -93,15 +93,15 @@ int main(int argc, char** argv) {
9393

9494
args::ValueFlagList<std::string> torch_executed_ops(
9595
parser,
96-
"torch-executed-ops",
96+
"op_name",
9797
"(Repeatable) Operator in the graph that should always be run in PyTorch for execution (partial compilation must be enabled)",
98-
{"teo", "torch-executed-ops"});
98+
{"teo", "torch-executed-op"});
9999

100100
args::ValueFlagList<std::string> torch_executed_mods(
101101
parser,
102-
"torch-executed-mods",
102+
"module_name",
103103
"(Repeatable) Module that should always be run in Pytorch for execution (partial compilation must be enabled)",
104-
{"tem", "torch-executed-mods"});
104+
{"tem", "torch-executed-mod"});
105105

106106
args::ValueFlag<uint64_t> min_block_size(
107107
parser,

0 commit comments

Comments
 (0)