diff --git a/cpp/bin/torchtrtc/BUILD b/cpp/bin/torchtrtc/BUILD index bdc8e71aac..9265948b97 100644 --- a/cpp/bin/torchtrtc/BUILD +++ b/cpp/bin/torchtrtc/BUILD @@ -10,7 +10,14 @@ config_setting( cc_binary( name = "torchtrtc", srcs = [ + "accuracy.h", + "accuracy.cpp", + "fileio.h", + "fileio.cpp", + "luts.h", "main.cpp", + "parser_util.h", + "parser_util.cpp" ], deps = [ "//third_party/args", diff --git a/cpp/bin/torchtrtc/README.md b/cpp/bin/torchtrtc/README.md index 033b36052c..242c0eebad 100644 --- a/cpp/bin/torchtrtc/README.md +++ b/cpp/bin/torchtrtc/README.md @@ -14,108 +14,110 @@ to standard TorchScript. Load with `torch.jit.load()` and run like you would run ``` torchtrtc [input_file_path] [output_file_path] - [input_specs...] {OPTIONS} + [input_specs...] {OPTIONS} - Torch-TensorRT is a compiler for TorchScript, it will compile and optimize - TorchScript programs to run on NVIDIA GPUs using TensorRT + torchtrtc is a compiler for TorchScript, it will compile and optimize + TorchScript programs to run on NVIDIA GPUs using TensorRT -OPTIONS: + OPTIONS: + + -h, --help Display this help menu + Verbiosity of the compiler + -v, --verbose Dumps debugging information about the + compilation process onto the console + -w, --warnings Disables warnings generated during + compilation onto the console (warnings + are on by default) + --i, --info Dumps info messages generated during + compilation onto the console + --build-debuggable-engine Creates a debuggable engine + --allow-gpu-fallback (Only used when targeting DLA + (device-type)) Lets engine run layers on + GPU if they are not supported on DLA + --require-full-compilation Require that the model should be fully + compiled to TensorRT or throw an error + --check-method-support=[method_name] + Check the support for end to end + compilation of a specified method in the + TorchScript module + --disable-tf32 Prevent Float32 layers from using the + TF32 data format + --sparse-weights Enable sparsity for weights of conv and + FC layers + -p[precision...], + --enable-precision=[precision...] (Repeatable) Enabling an operating + precision for kernels to use when + building the engine (Int8 requires a + calibration-cache argument) [ float | + float32 | f32 | fp32 | half | float16 | + f16 | fp16 | int8 | i8 | char ] + (default: float) + -d[type], --device-type=[type] The type of device the engine should be + built for [ gpu | dla ] (default: gpu) + --gpu-id=[gpu_id] GPU id if running on multi-GPU platform + (defaults to 0) + --dla-core=[dla_core] DLACore id if running on available DLA + (defaults to 0) + --engine-capability=[capability] The type of device the engine should be + built for [ standard | safety | + dla_standalone ] + --calibration-cache-file=[file_path] + Path to calibration cache file to use + for post training quantization + --teo=[op_name...], + --torch-executed-op=[op_name...] (Repeatable) Operator in the graph that + should always be run in PyTorch for + execution (partial compilation must be + enabled) + --tem=[module_name...], + --torch-executed-mod=[module_name...] + (Repeatable) Module that should always + be run in Pytorch for execution (partial + compilation must be enabled) + --mbs=[num_ops], + --min-block-size=[num_ops] Minimum number of contiguous TensorRT + supported ops to compile a subgraph to + TensorRT + --embed-engine Whether to treat input file as a + serialized TensorRT engine and embed it + into a TorchScript module (device spec + must be provided) + --num-min-timing-iter=[num_iters] Number of minimization timing iterations + used to select kernels + --num-avg-timing-iters=[num_iters] + Number of averaging timing iterations + used to select kernels + --workspace-size=[workspace_size] Maximum size of workspace given to + TensorRT + -t[threshold], + --threshold=[threshold] Maximum acceptable numerical deviation + from standard torchscript output + (default 2e-5) + --no-threshold-check Skip checking threshold compliance + --truncate-long-double, + --truncate, --truncate-64bit Truncate weights that are provided in + 64bit to 32bit (Long, Double to Int, + Float) + --save-engine Instead of compiling a full a + TorchScript program, save the created + engine to the path specified as the + output path + input_file_path Path to input TorchScript file + output_file_path Path for compiled TorchScript (or + TensorRT engine) file + input_specs... Specs for inputs to engine, can either + be a single size or a range defined by + Min, Optimal, Max sizes, e.g. + "(N,..,C,H,W)" + "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]". + Data Type and format can be specified by + adding an "@" followed by dtype and "%" + followed by format to the end of the + shape spec. e.g. "(3, 3, 32, + 32)@f16%NHWC" + "--" can be used to terminate flag options and force all following + arguments to be treated as positional options - -h, --help Display this help menu - Verbiosity of the compiler - -v, --verbose Dumps debugging information about the - compilation process onto the console - -w, --warnings Disables warnings generated during - compilation onto the console (warnings - are on by default) - --i, --info Dumps info messages generated during - compilation onto the console - --build-debuggable-engine Creates a debuggable engine - --allow-gpu-fallback (Only used when targeting DLA - (device-type)) Lets engine run layers on - GPU if they are not supported on DLA - --require-full-compilation Require that the model should be fully - compiled to TensorRT or throw an error - --disable-tf32 Prevent Float32 layers from using the - TF32 data format - --sparse-weights Enable sparsity for weights of conv and - FC layers - -p[precision...], - --enabled-precision=[precision...] - (Repeatable) Enabling an operating - precision for kernels to use when - building the engine (Int8 requires a - calibration-cache argument) [ float | - float32 | f32 | fp32 | half | float16 | - f16 | fp16 | int8 | i8 | char ] - (default: float) - -d[type], --device-type=[type] The type of device the engine should be - built for [ gpu | dla ] (default: gpu) - --gpu-id=[gpu_id] GPU id if running on multi-GPU platform - (defaults to 0) - --dla-core=[dla_core] DLACore id if running on available DLA - (defaults to 0) - --engine-capability=[capability] The type of device the engine should be - built for [ standard | safety | - dla_standalone ] - --calibration-cache-file=[file_path] - Path to calibration cache file to use - for post training quantization - --teo=[torch-executed-ops...], - --torch-executed-ops=[torch-executed-ops...] - (Repeatable) Operator in the graph that - should always be run in PyTorch for - execution (partial compilation must be - enabled) - --tem=[torch-executed-mods...], - --torch-executed-mods=[torch-executed-mods...] - (Repeatable) Module that should always - be run in Pytorch for execution (partial - compilation must be enabled) - --mbs=[torch-executed-mods...], - --min-block-size=[torch-executed-mods...] - Minimum number of contiguous TensorRT - supported ops to compile a subgraph to - TensorRT - --embed-engine Whether to treat input file as a - serialized TensorRT engine and embed it - into a TorchScript module (device spec - must be provided) - --num-min-timing-iter=[num_iters] Number of minimization timing iterations - used to select kernels - --num-avg-timing-iters=[num_iters] - Number of averaging timing iterations - used to select kernels - --workspace-size=[workspace_size] Maximum size of workspace given to - TensorRT - -t[threshold], - --threshold=[threshold] Maximum acceptable numerical deviation - from standard torchscript output - (default 2e-5) - --no-threshold-check Skip checking threshold compliance - --truncate-long-double, - --truncate, --truncate-64bit Truncate weights that are provided in - 64bit to 32bit (Long, Double to Int, - Float) - --save-engine Instead of compiling a full a - TorchScript program, save the created - engine to the path specified as the - output path - input_file_path Path to input TorchScript file - output_file_path Path for compiled TorchScript (or - TensorRT engine) file - input_specs... Specs for inputs to engine, can either - be a single size or a range defined by - Min, Optimal, Max sizes, e.g. - "(N,..,C,H,W)" - "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]". - Data Type and format can be specified by - adding an "@" followed by dtype and "%" - followed by format to the end of the - shape spec. e.g. "(3, 3, 32, - 32)@f16%NHWC" - "--" can be used to terminate flag options and force all following - arguments to be treated as positional options ``` e.g. diff --git a/cpp/bin/torchtrtc/accuracy.cpp b/cpp/bin/torchtrtc/accuracy.cpp new file mode 100644 index 0000000000..255bfdb1fa --- /dev/null +++ b/cpp/bin/torchtrtc/accuracy.cpp @@ -0,0 +1,27 @@ +#include "accuracy.h" + +#include "torch_tensorrt/logging.h" +#include "torch_tensorrt/torch_tensorrt.h" + +namespace torchtrtc { +namespace accuracy { + +bool check_rtol(const at::Tensor& diff, const std::vector inputs, float threshold) { + double maxValue = 0.0; + for (auto& tensor : inputs) { + maxValue = fmax(tensor.abs().max().item(), maxValue); + } + torchtrt::logging::log( + torchtrt::logging::Level::kDEBUG, + std::string("Max Difference: ") + std::to_string(diff.abs().max().item())); + torchtrt::logging::log( + torchtrt::logging::Level::kDEBUG, std::string("Acceptable Threshold: ") + std::to_string(threshold)); + return diff.abs().max().item() <= threshold * maxValue; +} + +bool almost_equal(const at::Tensor& a, const at::Tensor& b, float threshold) { + return check_rtol(a - b, {a, b}, threshold); +} + +} // namespace accuracy +} // namespace torchtrtc \ No newline at end of file diff --git a/cpp/bin/torchtrtc/accuracy.h b/cpp/bin/torchtrtc/accuracy.h new file mode 100644 index 0000000000..ee54cc2eef --- /dev/null +++ b/cpp/bin/torchtrtc/accuracy.h @@ -0,0 +1,18 @@ +#pragma once + +#include +#include +#include +#include + +#include "torch/script.h" +#include "torch/torch.h" + +namespace torchtrtc { +namespace accuracy { + +bool check_rtol(const at::Tensor& diff, const std::vector inputs, float threshold); +bool almost_equal(const at::Tensor& a, const at::Tensor& b, float threshold); + +} // namespace accuracy +} // namespace torchtrtc \ No newline at end of file diff --git a/cpp/bin/torchtrtc/fileio.cpp b/cpp/bin/torchtrtc/fileio.cpp new file mode 100644 index 0000000000..c7fc0a9b72 --- /dev/null +++ b/cpp/bin/torchtrtc/fileio.cpp @@ -0,0 +1,50 @@ +#include "fileio.h" + +namespace torchtrtc { +namespace fileio { + +std::string read_buf(std::string const& path) { + std::string buf; + std::ifstream stream(path.c_str(), std::ios::binary); + + if (stream) { + stream >> std::noskipws; + std::copy(std::istream_iterator(stream), std::istream_iterator(), std::back_inserter(buf)); + } + + return buf; +} + +std::string get_cwd() { + char buff[FILENAME_MAX]; // create string buffer to hold path + if (getcwd(buff, FILENAME_MAX)) { + std::string current_working_dir(buff); + return current_working_dir; + } else { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Unable to get current directory"); + exit(1); + } +} + +std::string real_path(std::string path) { + auto abs_path = path; + char real_path_c[PATH_MAX]; + char* res = realpath(abs_path.c_str(), real_path_c); + if (res) { + return std::string(real_path_c); + } else { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, std::string("Unable to find file ") + abs_path); + exit(1); + } +} + +std::string resolve_path(std::string path) { + auto rpath = path; + if (!(rpath.rfind("/", 0) == 0)) { + rpath = get_cwd() + '/' + rpath; + } + return rpath; +} + +} // namespace fileio +} // namespace torchtrtc \ No newline at end of file diff --git a/cpp/bin/torchtrtc/fileio.h b/cpp/bin/torchtrtc/fileio.h new file mode 100644 index 0000000000..05d4ede583 --- /dev/null +++ b/cpp/bin/torchtrtc/fileio.h @@ -0,0 +1,38 @@ +#pragma once +#include +#include +#include + +#ifdef __linux__ +#include +#else +#define PATH_MAX 260 +#endif + +#if defined(_WIN32) +#include +#define getcwd _getcwd +#define realpath(N, R) _fullpath((R), (N), PATH_MAX) +#else +#include +#endif + +#include "NvInfer.h" +#include "third_party/args/args.hpp" +#include "torch/script.h" +#include "torch/torch.h" + +#include "torch_tensorrt/logging.h" +#include "torch_tensorrt/ptq.h" +#include "torch_tensorrt/torch_tensorrt.h" + +namespace torchtrtc { +namespace fileio { + +std::string read_buf(std::string const& path); +std::string get_cwd(); +std::string real_path(std::string path); +std::string resolve_path(std::string path); + +} // namespace fileio +} // namespace torchtrtc \ No newline at end of file diff --git a/cpp/bin/torchtrtc/luts.h b/cpp/bin/torchtrtc/luts.h new file mode 100644 index 0000000000..ee75f711f8 --- /dev/null +++ b/cpp/bin/torchtrtc/luts.h @@ -0,0 +1,39 @@ +#pragma once + +#include "NvInfer.h" +#include "third_party/args/args.hpp" +#include "torch/script.h" +#include "torch/torch.h" + +namespace torchtrtc { +namespace luts { + +inline at::ScalarType to_torch_dtype(torchtrt::DataType dtype) { + switch (dtype) { + case torchtrt::DataType::kHalf: + return at::kHalf; + case torchtrt::DataType::kChar: + return at::kChar; + case torchtrt::DataType::kInt: + return at::kInt; + case torchtrt::DataType::kBool: + return at::kBool; + case torchtrt::DataType::kFloat: + default: + return at::kFloat; + } +} + +const std::unordered_map& get_trt_at_type_map() { + static const std::unordered_map trt_at_type_map = { + {nvinfer1::DataType::kFLOAT, at::kFloat}, + {nvinfer1::DataType::kHALF, at::kHalf}, + {nvinfer1::DataType::kINT32, at::kInt}, + {nvinfer1::DataType::kINT8, at::kChar}, + {nvinfer1::DataType::kBOOL, at::kBool}, + }; + return trt_at_type_map; +} + +} // namespace luts +} // namespace torchtrtc \ No newline at end of file diff --git a/cpp/bin/torchtrtc/main.cpp b/cpp/bin/torchtrtc/main.cpp index a437a5e133..4d733f274d 100644 --- a/cpp/bin/torchtrtc/main.cpp +++ b/cpp/bin/torchtrtc/main.cpp @@ -2,210 +2,18 @@ #include #include -#ifdef linux -#include -#else -#define PATH_MAX 260 -#endif - -#if defined(_WIN32) -#include -#define getcwd _getcwd -#define realpath(N, R) _fullpath((R), (N), PATH_MAX) -#else -#include -#endif - #include "NvInfer.h" #include "third_party/args/args.hpp" #include "torch/script.h" -#include "torch/torch.h" #include "torch_tensorrt/logging.h" #include "torch_tensorrt/ptq.h" #include "torch_tensorrt/torch_tensorrt.h" -at::ScalarType to_torch_dtype(torchtrt::DataType dtype) { - switch (dtype) { - case torchtrt::DataType::kHalf: - return at::kHalf; - case torchtrt::DataType::kChar: - return at::kChar; - case torchtrt::DataType::kInt: - return at::kInt; - case torchtrt::DataType::kBool: - return at::kBool; - case torchtrt::DataType::kFloat: - default: - return at::kFloat; - } -} - -const std::unordered_map& get_trt_at_type_map() { - static const std::unordered_map trt_at_type_map = { - {nvinfer1::DataType::kFLOAT, at::kFloat}, - {nvinfer1::DataType::kHALF, at::kHalf}, - {nvinfer1::DataType::kINT32, at::kInt}, - {nvinfer1::DataType::kINT8, at::kChar}, - {nvinfer1::DataType::kBOOL, at::kBool}, - }; - return trt_at_type_map; -} - -bool checkRtol(const at::Tensor& diff, const std::vector inputs, float threshold) { - double maxValue = 0.0; - for (auto& tensor : inputs) { - maxValue = fmax(tensor.abs().max().item(), maxValue); - } - torchtrt::logging::log( - torchtrt::logging::Level::kDEBUG, - std::string("Max Difference: ") + std::to_string(diff.abs().max().item())); - torchtrt::logging::log( - torchtrt::logging::Level::kDEBUG, std::string("Acceptable Threshold: ") + std::to_string(threshold)); - return diff.abs().max().item() <= threshold * maxValue; -} - -bool almostEqual(const at::Tensor& a, const at::Tensor& b, float threshold) { - return checkRtol(a - b, {a, b}, threshold); -} - -torchtrt::TensorFormat parseTensorFormat(std::string str) { - std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { return std::tolower(c); }); - - if (str == "linear" || str == "nchw" || str == "chw" || str == "contiguous") { - return torchtrt::TensorFormat::kContiguous; - } else if (str == "nhwc" || str == "hwc" || str == "channels_last") { - return torchtrt::TensorFormat::kChannelsLast; - } else { - torchtrt::logging::log( - torchtrt::logging::Level::kERROR, - "Invalid tensor format, options are [ linear | nchw | chw | contiguous | nhwc | hwc | channels_last ], found: " + - str); - return torchtrt::TensorFormat::kUnknown; - } -} - -torchtrt::DataType parseDataType(std::string dtype_str) { - std::transform( - dtype_str.begin(), dtype_str.end(), dtype_str.begin(), [](unsigned char c) { return std::tolower(c); }); - if (dtype_str == "float" || dtype_str == "float32" || dtype_str == "f32" || dtype_str == "fp32") { - return torchtrt::DataType::kFloat; - } else if (dtype_str == "half" || dtype_str == "float16" || dtype_str == "f16" || dtype_str == "fp16") { - return torchtrt::DataType::kHalf; - } else if (dtype_str == "char" || dtype_str == "int8" || dtype_str == "i8") { - return torchtrt::DataType::kChar; - } else if (dtype_str == "int" || dtype_str == "int32" || dtype_str == "i32") { - return torchtrt::DataType::kInt; - } else if (dtype_str == "bool" || dtype_str == "b") { - return torchtrt::DataType::kBool; - } else { - torchtrt::logging::log( - torchtrt::logging::Level::kERROR, - "Invalid precision, options are [ float | float32 | fp32 | f32 | half | float16 | fp16 | f16 | char | int8 | i8 | int | int32 | i32 | bool | b], found: " + - dtype_str); - return torchtrt::DataType::kUnknown; - } -} - -std::vector parseSingleDim(std::string shape_str) { - std::vector shape; - std::stringstream ss; - for (auto c : shape_str) { - if (c == '(' || c == ' ') { - continue; - } else if (c == ',') { - int64_t dim; - ss >> dim; - shape.push_back(dim); - ss.clear(); - } else if (c == ')') { - int64_t dim; - ss >> dim; - shape.push_back(dim); - ss.clear(); - return shape; - } else { - ss << c; - } - } - - torchtrt::logging::log( - torchtrt::logging::Level::kERROR, - "Shapes need dimensions delimited by comma in parentheses, \"(N,..,C,H,W)\"\n e.g \"(3,3,200,200)\""); - exit(1); - return {}; -} - -std::vector> parseDynamicDim(std::string shape_str) { - shape_str = shape_str.substr(1, shape_str.size() - 2); - std::vector> shape; - std::stringstream ss; - - std::string delimiter = ";"; - - size_t pos = 0; - while ((pos = shape_str.find(delimiter)) != std::string::npos) { - auto token = shape_str.substr(0, pos); - auto range = parseSingleDim(token); - shape_str.erase(0, pos + delimiter.length()); - shape.push_back(range); - } - - auto range = parseSingleDim(shape_str); - shape.push_back(range); - - if (shape.size() != 3) { - torchtrt::logging::log( - torchtrt::logging::Level::kERROR, - "Dynamic shapes need three sets of dimensions delimited by semi-colons, \"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]\"\n e.g \"[(3,3,100,100);(3,3,200,200);(3,3,300,300)]\""); - exit(1); - } - - return shape; -} - -std::string read_buf(std::string const& path) { - std::string buf; - std::ifstream stream(path.c_str(), std::ios::binary); - - if (stream) { - stream >> std::noskipws; - std::copy(std::istream_iterator(stream), std::istream_iterator(), std::back_inserter(buf)); - } - - return buf; -} - -std::string get_cwd() { - char buff[FILENAME_MAX]; // create string buffer to hold path - if (getcwd(buff, FILENAME_MAX)) { - std::string current_working_dir(buff); - return current_working_dir; - } else { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Unable to get current directory"); - exit(1); - } -} - -std::string real_path(std::string path) { - auto abs_path = path; - char real_path_c[PATH_MAX]; - char* res = realpath(abs_path.c_str(), real_path_c); - if (res) { - return std::string(real_path_c); - } else { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, std::string("Unable to find file ") + abs_path); - exit(1); - } -} - -std::string resolve_path(std::string path) { - auto rpath = path; - if (!(rpath.rfind("/", 0) == 0)) { - rpath = get_cwd() + '/' + rpath; - } - return rpath; -} +#include "accuracy.h" +#include "fileio.h" +#include "luts.h" +#include "parser_util.h" int main(int argc, char** argv) { torchtrt::logging::set_is_colored_output_on(true); @@ -229,8 +37,7 @@ int main(int argc, char** argv) { args::Flag build_debuggable_engine( parser, "build-debuggable-engine", "Creates a debuggable engine", {"build-debuggable-engine"}); - args::Flag use_strict_types( - parser, "use-strict-types", "Restrict operating type to only use set operation precision", {"use-strict-types"}); + args::Flag allow_gpu_fallback( parser, "allow-gpu-fallback", @@ -243,17 +50,23 @@ int main(int argc, char** argv) { "Require that the model should be fully compiled to TensorRT or throw an error", {"require-full-compilation"}); + args::ValueFlag check_method_op_support( + parser, + "method_name", + "Check the support for end to end compilation of a specified method in the TorchScript module", + {"check-method-support"}); + args::Flag disable_tf32( parser, "disable-tf32", "Prevent Float32 layers from using the TF32 data format", {"disable-tf32"}); args::Flag sparse_weights( parser, "sparse-weights", "Enable sparsity for weights of conv and FC layers", {"sparse-weights"}); - args::ValueFlagList enabled_precision( + args::ValueFlagList enabled_precisions( parser, "precision", "(Repeatable) Enabling an operating precision for kernels to use when building the engine (Int8 requires a calibration-cache argument) [ float | float32 | f32 | fp32 | half | float16 | f16 | fp16 | int8 | i8 | char ] (default: float)", - {'p', "enabled-precision"}); + {'p', "enable-precision"}); args::ValueFlag device_type( parser, "type", @@ -278,19 +91,19 @@ int main(int argc, char** argv) { args::ValueFlagList torch_executed_ops( parser, - "torch-executed-ops", + "op_name", "(Repeatable) Operator in the graph that should always be run in PyTorch for execution (partial compilation must be enabled)", - {"teo", "torch-executed-ops"}); + {"teo", "torch-executed-op"}); args::ValueFlagList torch_executed_mods( parser, - "torch-executed-mods", + "module_name", "(Repeatable) Module that should always be run in Pytorch for execution (partial compilation must be enabled)", - {"tem", "torch-executed-mods"}); + {"tem", "torch-executed-mod"}); args::ValueFlag min_block_size( parser, - "min-block-size", + "num_ops", "Minimum number of contiguous TensorRT supported ops to compile a subgraph to TensorRT", {"mbs", "min-block-size"}); @@ -306,8 +119,6 @@ int main(int argc, char** argv) { parser, "num_iters", "Number of averaging timing iterations used to select kernels", {"num-avg-timing-iters"}); args::ValueFlag workspace_size( parser, "workspace_size", "Maximum size of workspace given to TensorRT", {"workspace-size"}); - args::ValueFlag max_batch_size( - parser, "max_batch_size", "Maximum batch size (must be >= 1 to be set, 0 means not set)", {"max-batch-size"}); args::ValueFlag threshold( parser, "threshold", @@ -354,101 +165,67 @@ int main(int argc, char** argv) { torchtrt::logging::set_reportable_log_level(torchtrt::logging::Level::kERROR); } - std::vector ranges; - const std::string spec_err_str = - "Dimensions should be specified in one of these types \"(N,..,C,H,W)\" \"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]\"\n e.g \"(3,3,300,300)\" \"[(3,3,100,100);(3,3,200,200);(3,3,300,300)]\"\nTo specify input type append an @ followed by the precision\n e.g. \"(3,3,300,300)@f32\"\nTo specify input format append an \% followed by the format [contiguous | channel_last]\n e.g. \"(3,3,300,300)@f32\%channel_last\""; - for (const auto spec : args::get(input_shapes)) { - std::string shapes; - std::string dtype; - std::string format; - // THERE IS A SPEC FOR DTYPE - if (spec.find('@') != std::string::npos) { - // THERE IS ALSO A SPEC FOR FORMAT - if (spec.find('%') != std::string::npos) { - auto dtype_delim = spec.find('@'); - auto format_delim = spec.find('%'); - std::string shapes = spec.substr(0, dtype_delim); - std::string dtype = spec.substr(dtype_delim + 1, format_delim - (dtype_delim + 1)); - std::string format = spec.substr(format_delim + 1, spec.size()); - - auto parsed_dtype = parseDataType(dtype); - if (parsed_dtype == torchtrt::DataType::kUnknown) { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid datatype for input specification " + spec); - std::cerr << std::endl << parser; - exit(1); - } - auto parsed_format = parseTensorFormat(format); - if (parsed_format == torchtrt::TensorFormat::kUnknown) { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid format for input specification " + spec); - std::cerr << std::endl << parser; - exit(1); - } - if (shapes.rfind("(", 0) == 0) { - ranges.push_back(torchtrt::Input(parseSingleDim(shapes), parsed_dtype, parsed_format)); - } else if (shapes.rfind("[", 0) == 0) { - auto dyn_shapes = parseDynamicDim(shapes); - ranges.push_back(torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2], parsed_dtype, parsed_format)); - } else { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); - std::cerr << std::endl << parser; - exit(1); - } - // THERE IS NO SPEC FOR FORMAT - } else { - std::string shapes = spec.substr(0, spec.find('@')); - std::string dtype = spec.substr(spec.find('@') + 1, spec.size()); - - auto parsed_dtype = parseDataType(dtype); - if (parsed_dtype == torchtrt::DataType::kUnknown) { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid datatype for input specification " + spec); - std::cerr << std::endl << parser; - exit(1); - } - if (shapes.rfind("(", 0) == 0) { - ranges.push_back(torchtrt::Input(parseSingleDim(shapes), parsed_dtype)); - } else if (shapes.rfind("[", 0) == 0) { - auto dyn_shapes = parseDynamicDim(shapes); - ranges.push_back(torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2], parsed_dtype)); - } else { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); - std::cerr << std::endl << parser; - exit(1); - } - } - // THERE IS A SPEC FOR FORMAT BUT NOT DTYPE - } else if (spec.find('%') != std::string::npos) { - std::string shapes = spec.substr(0, spec.find('%')); - std::string format = spec.substr(spec.find('%') + 1, spec.size()); - - auto parsed_format = parseTensorFormat(format); - if (parsed_format == torchtrt::TensorFormat::kUnknown) { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid format for input specification " + spec); - std::cerr << std::endl << parser; - exit(1); - } - if (shapes.rfind("(", 0) == 0) { - ranges.push_back(torchtrt::Input(parseSingleDim(shapes), parsed_format)); - } else if (shapes.rfind("[", 0) == 0) { - auto dyn_shapes = parseDynamicDim(shapes); - ranges.push_back(torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2], parsed_format)); - } else { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); - std::cerr << std::endl << parser; - exit(1); - } - // JUST SHAPE USE DEFAULT DTYPE + auto real_input_path = torchtrtc::fileio::resolve_path(args::get(input_path)); + + if (check_method_op_support) { + torch::jit::Module mod; + try { + // Deserialize the ScriptModule from a file using torch::jit::load(). + mod = torch::jit::load(real_input_path); + } catch (const c10::Error& e) { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Error loading the model (path may be incorrect)"); + return 1; + } + + auto method = args::get(check_method_op_support); + auto result = torchtrt::ts::check_method_operator_support(mod, method); + if (result) { + std::cout << "The method is supported end to end by Torch-TensorRT" << std::endl; + return 0; } else { - if (spec.rfind("(", 0) == 0) { - ranges.push_back(torchtrt::Input(parseSingleDim(spec))); - } else if (spec.rfind("[", 0) == 0) { - auto dyn_shapes = parseDynamicDim(spec); - ranges.push_back(torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2])); - } else { - torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); - std::cerr << std::endl << parser; - exit(1); + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Method is not currently supported by Torch-TensorRT"); + return 1; + } + } + + auto real_output_path = torchtrtc::fileio::resolve_path(args::get(output_path)); + + // Instead of compiling, just embed engine in a PyTorch module + if (embed_engine) { + auto device_str = args::get(device_type); + std::transform( + device_str.begin(), device_str.end(), device_str.begin(), [](unsigned char c) { return std::tolower(c); }); + + torchtrt::Device device; + + if (gpu_id) { + device.gpu_id = args::get(gpu_id); + torchtrt::set_device(device.gpu_id); + } + + if (device_str == "gpu") { + device.device_type = torchtrt::Device::DeviceType::kGPU; + } else if (device_str == "dla") { + device.device_type = torchtrt::Device::DeviceType::kDLA; + if (dla_core) { + device.dla_core = args::get(dla_core); } + } else { + torchtrt::logging::log( + torchtrt::logging::Level::kERROR, "Invalid device type, options are [ gpu | dla ] found: " + device_type); + std::cerr << std::endl << parser; + return 1; } + + std::string serialized_engine = torchtrtc::fileio::read_buf(real_input_path); + auto trt_mod = torchtrt::ts::embed_engine_in_new_module(serialized_engine, device); + trt_mod.save(real_output_path); + return 0; + } + + std::vector ranges; + for (const auto spec : args::get(input_shapes)) { + ranges.push_back(torchtrtc::parserutil::parse_input(spec)); std::stringstream ss; ss << "Parsed Input: " << ranges.back(); torchtrt::logging::log(torchtrt::logging::Level::kDEBUG, ss.str()); @@ -460,7 +237,6 @@ int main(int argc, char** argv) { compile_settings.debug = true; } - if (allow_gpu_fallback) { compile_settings.device.allow_gpu_fallback = true; } @@ -475,7 +251,7 @@ int main(int argc, char** argv) { std::string calibration_cache_file_path = ""; if (calibration_cache_file) { - calibration_cache_file_path = resolve_path(args::get(calibration_cache_file)); + calibration_cache_file_path = torchtrtc::fileio::resolve_path(args::get(calibration_cache_file)); } auto calibrator = torchtrt::ptq::make_int8_cache_calibrator(calibration_cache_file_path); @@ -501,9 +277,9 @@ int main(int argc, char** argv) { } } - if (enabled_precision) { - for (const auto precision : args::get(enabled_precision)) { - auto dtype = parseDataType(precision); + if (enabled_precisions) { + for (const auto precision : args::get(enabled_precisions)) { + auto dtype = torchtrtc::parserutil::parse_dtype(precision); if (dtype == torchtrt::DataType::kFloat) { compile_settings.enabled_precisions.insert(torch::kF32); } else if (dtype == torchtrt::DataType::kHalf) { @@ -583,22 +359,10 @@ int main(int argc, char** argv) { compile_settings.workspace_size = args::get(workspace_size); } - if (truncate_long_and_double) { compile_settings.truncate_long_and_double = true; } - auto real_input_path = resolve_path(args::get(input_path)); - auto real_output_path = resolve_path(args::get(output_path)); - - // Instead of compiling, just embed engine in a PyTorch module - if (embed_engine) { - std::string serialized_engine = read_buf(real_input_path); - auto trt_mod = torchtrt::ts::embed_engine_in_new_module(serialized_engine, compile_settings.device); - trt_mod.save(real_output_path); - return 0; - } - torch::jit::Module mod; try { // Deserialize the ScriptModule from a file using torch::jit::load(). @@ -638,7 +402,7 @@ int main(int argc, char** argv) { for (auto i : ranges) { auto in = at::randn(i.opt_shape, {at::kCUDA}); - in = in.to(to_torch_dtype(i.dtype)); + in = in.to(torchtrtc::luts::to_torch_dtype(i.dtype)); jit_inputs_ivalues.push_back(in.clone()); trt_inputs_ivalues.push_back(in.clone()); } @@ -667,7 +431,8 @@ int main(int argc, char** argv) { } for (size_t i = 0; i < trt_results.size(); i++) { - if (!almostEqual(jit_results[i], trt_results[i].reshape_as(jit_results[i]), threshold_val)) { + if (!torchtrtc::accuracy::almost_equal( + jit_results[i], trt_results[i].reshape_as(jit_results[i]), threshold_val)) { std::ostringstream threshold_ss; threshold_ss << threshold_val; torchtrt::logging::log( diff --git a/cpp/bin/torchtrtc/parser_util.cpp b/cpp/bin/torchtrtc/parser_util.cpp new file mode 100644 index 0000000000..86d7b32242 --- /dev/null +++ b/cpp/bin/torchtrtc/parser_util.cpp @@ -0,0 +1,190 @@ +#include "parser_util.h" + +namespace torchtrtc { +namespace parserutil { + +torchtrt::TensorFormat parse_tensor_format(std::string str) { + std::transform(str.begin(), str.end(), str.begin(), [](unsigned char c) { return std::tolower(c); }); + + if (str == "linear" || str == "nchw" || str == "chw" || str == "contiguous") { + return torchtrt::TensorFormat::kContiguous; + } else if (str == "nhwc" || str == "hwc" || str == "channels_last") { + return torchtrt::TensorFormat::kChannelsLast; + } else { + torchtrt::logging::log( + torchtrt::logging::Level::kERROR, + "Invalid tensor format, options are [ linear | nchw | chw | contiguous | nhwc | hwc | channels_last ], found: " + + str); + return torchtrt::TensorFormat::kUnknown; + } +} + +torchtrt::DataType parse_dtype(std::string dtype_str) { + std::transform( + dtype_str.begin(), dtype_str.end(), dtype_str.begin(), [](unsigned char c) { return std::tolower(c); }); + if (dtype_str == "float" || dtype_str == "float32" || dtype_str == "f32" || dtype_str == "fp32") { + return torchtrt::DataType::kFloat; + } else if (dtype_str == "half" || dtype_str == "float16" || dtype_str == "f16" || dtype_str == "fp16") { + return torchtrt::DataType::kHalf; + } else if (dtype_str == "char" || dtype_str == "int8" || dtype_str == "i8") { + return torchtrt::DataType::kChar; + } else if (dtype_str == "int" || dtype_str == "int32" || dtype_str == "i32") { + return torchtrt::DataType::kInt; + } else if (dtype_str == "bool" || dtype_str == "b") { + return torchtrt::DataType::kBool; + } else { + torchtrt::logging::log( + torchtrt::logging::Level::kERROR, + "Invalid precision, options are [ float | float32 | fp32 | f32 | half | float16 | fp16 | f16 | char | int8 | i8 | int | int32 | i32 | bool | b], found: " + + dtype_str); + return torchtrt::DataType::kUnknown; + } +} + +std::vector parse_single_dim(std::string shape_str) { + std::vector shape; + std::stringstream ss; + for (auto c : shape_str) { + if (c == '(' || c == ' ') { + continue; + } else if (c == ',') { + int64_t dim; + ss >> dim; + shape.push_back(dim); + ss.clear(); + } else if (c == ')') { + int64_t dim; + ss >> dim; + shape.push_back(dim); + ss.clear(); + return shape; + } else { + ss << c; + } + } + + torchtrt::logging::log( + torchtrt::logging::Level::kERROR, + "Shapes need dimensions delimited by comma in parentheses, \"(N,..,C,H,W)\"\n e.g \"(3,3,200,200)\""); + exit(1); + return {}; +} + +std::vector> parse_dynamic_dim(std::string shape_str) { + shape_str = shape_str.substr(1, shape_str.size() - 2); + std::vector> shape; + std::stringstream ss; + + std::string delimiter = ";"; + + size_t pos = 0; + while ((pos = shape_str.find(delimiter)) != std::string::npos) { + auto token = shape_str.substr(0, pos); + auto range = parse_single_dim(token); + shape_str.erase(0, pos + delimiter.length()); + shape.push_back(range); + } + + auto range = parse_single_dim(shape_str); + shape.push_back(range); + + if (shape.size() != 3) { + torchtrt::logging::log( + torchtrt::logging::Level::kERROR, + "Dynamic shapes need three sets of dimensions delimited by semi-colons, \"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]\"\n e.g \"[(3,3,100,100);(3,3,200,200);(3,3,300,300)]\""); + exit(1); + } + + return shape; +} + +torchtrt::Input parse_input(std::string spec) { + const std::string spec_err_str = + "Dimensions should be specified in one of these types \"(N,..,C,H,W)\" \"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]\"\n e.g \"(3,3,300,300)\" \"[(3,3,100,100);(3,3,200,200);(3,3,300,300)]\"\nTo specify input type append an @ followed by the precision\n e.g. \"(3,3,300,300)@f32\"\nTo specify input format append an \% followed by the format [contiguous | channel_last]\n e.g. \"(3,3,300,300)@f32\%channel_last\""; + std::string shapes; + std::string dtype; + std::string format; + // THERE IS A SPEC FOR DTYPE + if (spec.find('@') != std::string::npos) { + // THERE IS ALSO A SPEC FOR FORMAT + if (spec.find('%') != std::string::npos) { + auto dtype_delim = spec.find('@'); + auto format_delim = spec.find('%'); + std::string shapes = spec.substr(0, dtype_delim); + std::string dtype = spec.substr(dtype_delim + 1, format_delim - (dtype_delim + 1)); + std::string format = spec.substr(format_delim + 1, spec.size()); + + auto parsed_dtype = parse_dtype(dtype); + if (parsed_dtype == torchtrt::DataType::kUnknown) { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid datatype for input specification " + spec); + exit(1); + } + auto parsed_format = parse_tensor_format(format); + if (parsed_format == torchtrt::TensorFormat::kUnknown) { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid format for input specification " + spec); + exit(1); + } + if (shapes.rfind("(", 0) == 0) { + return torchtrt::Input(parse_single_dim(shapes), parsed_dtype, parsed_format); + } else if (shapes.rfind("[", 0) == 0) { + auto dyn_shapes = parse_dynamic_dim(shapes); + return torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2], parsed_dtype, parsed_format); + } else { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); + exit(1); + } + // THERE IS NO SPEC FOR FORMAT + } else { + std::string shapes = spec.substr(0, spec.find('@')); + std::string dtype = spec.substr(spec.find('@') + 1, spec.size()); + + auto parsed_dtype = parse_dtype(dtype); + if (parsed_dtype == torchtrt::DataType::kUnknown) { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid datatype for input specification " + spec); + exit(1); + } + if (shapes.rfind("(", 0) == 0) { + return torchtrt::Input(parse_single_dim(shapes), parsed_dtype); + } else if (shapes.rfind("[", 0) == 0) { + auto dyn_shapes = parse_dynamic_dim(shapes); + return torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2], parsed_dtype); + } else { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); + exit(1); + } + } + // THERE IS A SPEC FOR FORMAT BUT NOT DTYPE + } else if (spec.find('%') != std::string::npos) { + std::string shapes = spec.substr(0, spec.find('%')); + std::string format = spec.substr(spec.find('%') + 1, spec.size()); + + auto parsed_format = parse_tensor_format(format); + if (parsed_format == torchtrt::TensorFormat::kUnknown) { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, "Invalid format for input specification " + spec); + exit(1); + } + if (shapes.rfind("(", 0) == 0) { + return torchtrt::Input(parse_single_dim(shapes), parsed_format); + } else if (shapes.rfind("[", 0) == 0) { + auto dyn_shapes = parse_dynamic_dim(shapes); + return torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2], parsed_format); + } else { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); + exit(1); + } + // JUST SHAPE USE DEFAULT DTYPE + } else { + if (spec.rfind("(", 0) == 0) { + return torchtrt::Input(parse_single_dim(spec)); + } else if (spec.rfind("[", 0) == 0) { + auto dyn_shapes = parse_dynamic_dim(spec); + return torchtrt::Input(dyn_shapes[0], dyn_shapes[1], dyn_shapes[2]); + } else { + torchtrt::logging::log(torchtrt::logging::Level::kERROR, spec_err_str); + exit(1); + } + } +} + +} // namespace parserutil +} // namespace torchtrtc \ No newline at end of file diff --git a/cpp/bin/torchtrtc/parser_util.h b/cpp/bin/torchtrtc/parser_util.h new file mode 100644 index 0000000000..7246929130 --- /dev/null +++ b/cpp/bin/torchtrtc/parser_util.h @@ -0,0 +1,34 @@ +#pragma once +#include +#include +#include + +#include "NvInfer.h" +#include "third_party/args/args.hpp" +#include "torch/script.h" +#include "torch/torch.h" + +#include "torch_tensorrt/logging.h" +#include "torch_tensorrt/ptq.h" +#include "torch_tensorrt/torch_tensorrt.h" + +namespace torchtrtc { +namespace parserutil { + +// String to TensorFormat Enum +torchtrt::TensorFormat parse_tensor_format(std::string str); + +// String to data type +torchtrt::DataType parse_dtype(std::string dtype_str); + +// String to a vector of ints which represents a dimension spec +std::vector parse_single_dim(std::string shape_str); + +// String to a vector of 3 dimension specs specs (each a vector of ints) +std::vector> parse_dynamic_dim(std::string shape_str); + +// String to a torchtrt::Input +torchtrt::Input parse_input(std::string input_specs); + +} // namespace parserutil +} // namespace torchtrtc \ No newline at end of file diff --git a/docsrc/tutorials/torchtrtc.rst b/docsrc/tutorials/torchtrtc.rst index f1741f373a..b841c891e5 100644 --- a/docsrc/tutorials/torchtrtc.rst +++ b/docsrc/tutorials/torchtrtc.rst @@ -19,7 +19,7 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r torchtrtc [input_file_path] [output_file_path] [input_specs...] {OPTIONS} - Torch-TensorRT is a compiler for TorchScript, it will compile and optimize + torchtrtc is a compiler for TorchScript, it will compile and optimize TorchScript programs to run on NVIDIA GPUs using TensorRT OPTIONS: @@ -39,13 +39,16 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r GPU if they are not supported on DLA --require-full-compilation Require that the model should be fully compiled to TensorRT or throw an error + --check-method-support=[method_name] + Check the support for end to end + compilation of a specified method in the + TorchScript module --disable-tf32 Prevent Float32 layers from using the TF32 data format --sparse-weights Enable sparsity for weights of conv and FC layers -p[precision...], - --enabled-precision=[precision...] - (Repeatable) Enabling an operating + --enable-precision=[precision...] (Repeatable) Enabling an operating precision for kernels to use when building the engine (Int8 requires a calibration-cache argument) [ float | @@ -64,20 +67,18 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r --calibration-cache-file=[file_path] Path to calibration cache file to use for post training quantization - --teo=[torch-executed-ops...], - --torch-executed-ops=[torch-executed-ops...] - (Repeatable) Operator in the graph that + --teo=[op_name...], + --torch-executed-op=[op_name...] (Repeatable) Operator in the graph that should always be run in PyTorch for execution (partial compilation must be enabled) - --tem=[torch-executed-mods...], - --torch-executed-mods=[torch-executed-mods...] + --tem=[module_name...], + --torch-executed-mod=[module_name...] (Repeatable) Module that should always be run in Pytorch for execution (partial compilation must be enabled) - --mbs=[torch-executed-mods...], - --min-block-size=[torch-executed-mods...] - Minimum number of contiguous TensorRT + --mbs=[num_ops], + --min-block-size=[num_ops] Minimum number of contiguous TensorRT supported ops to compile a subgraph to TensorRT --embed-engine Whether to treat input file as a