Skip to content

Commit 248d8aa

Browse files
committed
Merge branch 'master' into fix_renaming_itensor
2 parents 74d4df1 + 84bad88 commit 248d8aa

File tree

262 files changed

+7137
-1736
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

262 files changed

+7137
-1736
lines changed

.circleci/config.yml

Lines changed: 639 additions & 53 deletions
Large diffs are not rendered by default.

.github/code-owners.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@
110110
- "peri044"
111111
- "bowang007"
112112

113-
"component: docker":
113+
"channel: docker":
114114
- "andi4191"
115115
- "narendasan"
116116

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,6 @@ bazel-Torch-TensorRT-Preview
6262
docsrc/src/
6363
bazel-TensorRT
6464
bazel-tensorrt
65+
.pytest_cache
66+
*.cache
67+
*cifar-10-batches-py*

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22

33
[![Documentation](https://img.shields.io/badge/docs-master-brightgreen)](https://nvidia.github.io/Torch-TensorRT/)
44

5-
> Ahead of Time (AOT) compiling for PyTorch JIT
5+
> Ahead of Time (AOT) compiling for PyTorch JIT and FX
66
7-
Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
7+
Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
88

99
Resources:
1010
- [Documentation](https://nvidia.github.io/Torch-TensorRT/)
11-
- [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
11+
- [FX path Documentation](https://github.com/pytorch/TensorRT/blob/master/docsrc/tutorials/getting_started_with_fx_path.rst)
12+
- [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
1213
- [Comprehensive Discusion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
1314
- [Pre-built Docker Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). To use this container, make an NGC account and sign in to NVIDIA's registry with an API key. Refer to [this guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html#registering-activating-ngc-account) for the same.
1415

@@ -111,10 +112,10 @@ torch.jit.save(trt_ts_module, "trt_torchscript_module.ts") # save the TRT embedd
111112
These are the following dependencies used to verify the testcases. Torch-TensorRT can work with other versions, but the tests are not guaranteed to pass.
112113

113114
- Bazel 5.1.1
114-
- Libtorch 1.11.0 (built with CUDA 11.3)
115+
- Libtorch 1.12.0 (built with CUDA 11.3)
115116
- CUDA 11.3
116-
- cuDNN 8.2.1
117-
- TensorRT 8.2.4.2
117+
- cuDNN 8.4.1
118+
- TensorRT 8.4.1.5
118119

119120
## Prebuilt Binaries and Wheel files
120121

@@ -213,7 +214,7 @@ bazel build //:libtorchtrt --compilation_mode opt
213214
```
214215

215216
### FX path (Python only) installation
216-
If the user plan to try FX path (Python only) and would like to avoid bazel build. Please follow the steps below.
217+
If the user plans to try FX path (Python only) and would like to avoid bazel build. Please follow the steps below.
217218
``` shell
218219
cd py && python3 setup.py install --fx-only
219220
```

WORKSPACE

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -56,17 +56,17 @@ new_local_repository(
5656
http_archive(
5757
name = "libtorch",
5858
build_file = "@//third_party/libtorch:BUILD",
59-
sha256 = "8d9e829ce9478db4f35bdb7943308cf02e8a2f58cf9bb10f742462c1d57bf287",
59+
sha256 = "80f089939de20e68e3fcad4dfa72a26c8bf91b5e77b11042f671f39ebac35865",
6060
strip_prefix = "libtorch",
61-
urls = ["https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.11.0%2Bcu113.zip"],
61+
urls = ["https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.12.0%2Bcu113.zip"],
6262
)
6363

6464
http_archive(
6565
name = "libtorch_pre_cxx11_abi",
6666
build_file = "@//third_party/libtorch:BUILD",
67-
sha256 = "90159ecce3ff451f3ef3f657493b6c7c96759c3b74bbd70c1695f2ea2f81e1ad",
67+
sha256 = "8e35371403f7052d9e9b43bcff383980dbde4df028986dc1dab539953481d55f",
6868
strip_prefix = "libtorch",
69-
urls = ["https://download.pytorch.org/libtorch/cu113/libtorch-shared-with-deps-1.11.0%2Bcu113.zip"],
69+
urls = ["https://download.pytorch.org/libtorch/cu113/libtorch-shared-with-deps-1.12.0%2Bcu113.zip"],
7070
)
7171

7272
# Download these tarballs manually from the NVIDIA website
@@ -76,20 +76,20 @@ http_archive(
7676
http_archive(
7777
name = "cudnn",
7878
build_file = "@//third_party/cudnn/archive:BUILD",
79-
sha256 = "0e5d2df890b9967efa6619da421310d97323565a79f05a1a8cb9b7165baad0d7",
80-
strip_prefix = "cuda",
79+
sha256 = "ec96d2376d81fca42bdd3d4c3d705a99b29a065bab57f920561c763e29c67d01",
80+
strip_prefix = "cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive",
8181
urls = [
82-
"https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.2.4/11.4_20210831/cudnn-11.4-linux-x64-v8.2.4.15.tgz",
82+
"https://developer.nvidia.com/compute/cudnn/secure/8.4.1/local_installers/11.6/cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive.tar.xz",
8383
],
8484
)
8585

8686
http_archive(
8787
name = "tensorrt",
8888
build_file = "@//third_party/tensorrt/archive:BUILD",
89-
sha256 = "826180eaaecdf9a7e76116855b9f1f3400ea9b06e66b06a3f6a0747ba6f863ad",
90-
strip_prefix = "TensorRT-8.2.4.2",
89+
sha256 = "8107861af218694130f170e071f49814fa3e27f1386ce7cb6d807ac05a7fcf0e",
90+
strip_prefix = "TensorRT-8.4.1.5",
9191
urls = [
92-
"https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.2.4/tars/tensorrt-8.2.4.2.linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz",
92+
"https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.4.1/tars/tensorrt-8.4.1.5.linux.x86_64-gnu.cuda-11.6.cudnn8.4.tar.gz",
9393
],
9494
)
9595

core/compiler.cpp

Lines changed: 14 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -359,14 +359,6 @@ void MapInputsAndDetermineDTypes(
359359
}
360360
}
361361

362-
uint64_t GetRecommendedWorkspaceSize(const runtime::CudaDevice& device) {
363-
if (device.major < 6) {
364-
return 256 * (1 << 20);
365-
} else {
366-
return 1 << 30;
367-
}
368-
}
369-
370362
std::string ConvertGraphToTRTEngine(const torch::jit::script::Module& mod, std::string method_name, CompileSpec cfg) {
371363
// Go through Lowering to simplify graph and extract weight parameters
372364
auto graph_and_parameters = lowering::Lower(mod, method_name, cfg.lower_info);
@@ -380,14 +372,14 @@ std::string ConvertGraphToTRTEngine(const torch::jit::script::Module& mod, std::
380372
// Infer the type of an input from the weights of the calculation
381373
auto first_use_types = ir::get_block_first_calc_dtypes_opt(g->block());
382374

383-
// GPU default WS size : 1 GB
384-
// Set WS = 256 Mb for Jetson nano/TX1 like platforms whose compute capability is 5.X.
385-
auto workspace_size = cfg.convert_info.engine_settings.workspace_size;
386-
auto device_spec = cfg.convert_info.engine_settings.device;
387-
auto cuda_device = runtime::CudaDevice(device_spec.gpu_id, device_spec.device_type);
388-
if (workspace_size == 0) {
389-
cfg.convert_info.engine_settings.workspace_size = GetRecommendedWorkspaceSize(cuda_device);
390-
}
375+
// // GPU default WS size : 1 GB
376+
// // Set WS = 256 Mb for Jetson nano/TX1 like platforms whose compute capability is 5.X.
377+
// auto workspace_size = cfg.convert_info.engine_settings.workspace_size;
378+
// auto device_spec = cfg.convert_info.engine_settings.device;
379+
// auto cuda_device = runtime::CudaDevice(device_spec.gpu_id, device_spec.device_type);
380+
// if (workspace_size == 0) {
381+
// cfg.convert_info.engine_settings.workspace_size = GetRecommendedWorkspaceSize(cuda_device);
382+
// }
391383

392384
MapInputsAndDetermineDTypes(cfg, g, static_params, first_use_types);
393385

@@ -399,14 +391,14 @@ std::string ConvertGraphToTRTEngine(const torch::jit::script::Module& mod, std::
399391
torch::jit::Module CompileGraph(const torch::jit::Module& mod, CompileSpec cfg) {
400392
torch::jit::Module new_mod(mod._ivalue()->name() + "_trt");
401393

402-
// GPU default WS size : 1 GB
403-
// Set WS = 256 Mb for Jetson nano/TX1 like platforms whose compute capability is 5.X.
404-
auto workspace_size = cfg.convert_info.engine_settings.workspace_size;
394+
// // GPU default WS size : 1 GB
395+
// // Set WS = 256 Mb for Jetson nano/TX1 like platforms whose compute capability is 5.X.
396+
// auto workspace_size = cfg.convert_info.engine_settings.workspace_size;
405397
auto device_spec = cfg.convert_info.engine_settings.device;
406398
auto cuda_device = runtime::CudaDevice(device_spec.gpu_id, device_spec.device_type);
407-
if (workspace_size == 0) {
408-
cfg.convert_info.engine_settings.workspace_size = GetRecommendedWorkspaceSize(cuda_device);
409-
}
399+
// if (workspace_size == 0) {
400+
// cfg.convert_info.engine_settings.workspace_size = GetRecommendedWorkspaceSize(cuda_device);
401+
// }
410402

411403
for (const torch::jit::Method& method : mod.get_methods()) {
412404
if (method.name().compare("forward") == 0) {

core/conversion/conversionctx/ConversionCtx.cpp

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,11 @@ std::ostream& operator<<(std::ostream& os, const BuilderSettings& s) {
2020
<< "\n Debuggable Engine: " << s.debug \
2121
<< "\n GPU ID: " << s.device.gpu_id \
2222
<< "\n Allow GPU Fallback (if running on DLA): " << s.device.allow_gpu_fallback \
23-
<< "\n Min Timing Iterations: " << s.num_min_timing_iters \
2423
<< "\n Avg Timing Iterations: " << s.num_avg_timing_iters \
25-
<< "\n Max Workspace Size: " << s.workspace_size;
24+
<< "\n Max Workspace Size: " << s.workspace_size \
25+
<< "\n DLA SRAM Size: " << s.dla_sram_size \
26+
<< "\n DLA Local DRAM Size: " << s.dla_local_dram_size \
27+
<< "\n DLA Global DRAM Size: " << s.dla_global_dram_size;
2628

2729
os << "\n Device Type: " << s.device.device_type \
2830
<< "\n GPU ID: " << s.device.gpu_id;
@@ -104,9 +106,11 @@ ConversionCtx::ConversionCtx(BuilderSettings build_settings)
104106
cfg->setFlag(nvinfer1::BuilderFlag::kGPU_FALLBACK);
105107
}
106108

107-
cfg->setMinTimingIterations(settings.num_min_timing_iters);
108109
cfg->setAvgTimingIterations(settings.num_avg_timing_iters);
109-
cfg->setMaxWorkspaceSize(settings.workspace_size);
110+
if (settings.workspace_size != 0){
111+
cfg->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE, settings.workspace_size);
112+
}
113+
110114
cfg->setDefaultDeviceType(settings.device.device_type);
111115
cfg->setEngineCapability(settings.capability);
112116

@@ -120,6 +124,15 @@ ConversionCtx::ConversionCtx(BuilderSettings build_settings)
120124
settings.enabled_precisions.find(nvinfer1::DataType::kFLOAT) == settings.enabled_precisions.end(),
121125
"DLA supports only fp16 or int8 precision");
122126
cfg->setDLACore(settings.device.dla_core);
127+
if (settings.dla_sram_size != 1048576){
128+
cfg->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kDLA_MANAGED_SRAM, settings.dla_sram_size);
129+
}
130+
if (settings.dla_local_dram_size != 1073741824){
131+
cfg->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kDLA_LOCAL_DRAM, settings.dla_local_dram_size);
132+
}
133+
if (settings.dla_global_dram_size != 536870912){
134+
cfg->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kDLA_GLOBAL_DRAM, settings.dla_global_dram_size);
135+
}
123136
}
124137
}
125138

core/conversion/conversionctx/ConversionCtx.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,11 @@ struct BuilderSettings {
3333
Device device;
3434
nvinfer1::EngineCapability capability = TRT_ENGINE_CAPABILITY_STANDARD;
3535
nvinfer1::IInt8Calibrator* calibrator = nullptr;
36-
uint64_t num_min_timing_iters = 2;
3736
uint64_t num_avg_timing_iters = 1;
3837
uint64_t workspace_size = 0;
38+
uint64_t dla_sram_size = 1048576;
39+
uint64_t dla_local_dram_size = 1073741824;
40+
uint64_t dla_global_dram_size = 536870912;
3941

4042
BuilderSettings() = default;
4143
BuilderSettings(const BuilderSettings& other) = default;

core/conversion/converters/converter_util.cpp

Lines changed: 128 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,9 +135,10 @@ nvinfer1::ITensor* castITensor(ConversionCtx* ctx, nvinfer1::ITensor* tensor, nv
135135

136136
auto id_layer = ctx->net->addIdentity(*tensor);
137137
TORCHTRT_CHECK(id_layer, "Unable to create identity layer for ITensor: " << tensor_id.str());
138-
auto casted_tensor = id_layer->getOutput(0);
139-
casted_tensor->setType(dtype);
138+
// layer->setOutputType should be used for casting and not manually setting output_tensor->setType()
139+
id_layer->setOutputType(0, dtype);
140140

141+
auto casted_tensor = id_layer->getOutput(0);
141142
LOG_DEBUG(ctx->logger, "Casting ITensor " << tensor_id.str() << " from " << tensor->getType() << " to " << dtype);
142143

143144
std::stringstream ss;
@@ -199,6 +200,131 @@ nvinfer1::ITensor* tensor_to_const(ConversionCtx* ctx, at::Tensor t, const std::
199200
return out;
200201
}
201202

203+
// clamp x to [lower_bound, upper_bound]
204+
nvinfer1::ITensor* clamp(
205+
ConversionCtx* ctx,
206+
nvinfer1::ITensor* x,
207+
nvinfer1::ITensor* lower_bound,
208+
nvinfer1::ITensor* upper_bound,
209+
std::string const& name) {
210+
211+
auto max_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMAX, x, lower_bound, "max layer for " + name);
212+
TORCHTRT_CHECK(max_layer, "Unable to create max layer for clamp");
213+
LOG_DEBUG(ctx->logger, "Create " << max_layer->getName() << " for clamp");
214+
auto max_itensor = max_layer->getOutput(0);
215+
216+
auto min_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMIN, max_itensor, upper_bound, "min layer for " + name);
217+
TORCHTRT_CHECK(min_layer, "Unable to create min layer for clamp");
218+
LOG_DEBUG(ctx->logger, "Create " << min_layer->getName() << " for clamp");
219+
auto min_itensor = min_layer->getOutput(0);
220+
return min_itensor;
221+
}
222+
223+
// clamp x to [0, input_dim]
224+
nvinfer1::ITensor* clamp_to_input_dim(
225+
ConversionCtx* ctx,
226+
nvinfer1::ITensor* x,
227+
nvinfer1::ITensor* input_dim,
228+
int nbdims,
229+
std::string const& name) {
230+
231+
auto zero = torch::zeros({nbdims}).to(torch::kI32);
232+
auto zero_itensor = tensor_to_const(ctx, zero);
233+
auto one = torch::ones({nbdims}).to(torch::kI32);
234+
auto one_itensor = tensor_to_const(ctx, one);
235+
236+
auto upper_bound_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUB, input_dim, one_itensor, "sub layer for " + name);
237+
TORCHTRT_CHECK(upper_bound_layer, "Unable to create sub layer for clamp to inputDim");
238+
LOG_DEBUG(ctx->logger, "Create " << upper_bound_layer->getName() << " for clamp to inputDim");
239+
auto upper_bound = upper_bound_layer->getOutput(0);
240+
241+
auto max_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMAX, x, zero_itensor, "max layer for " + name);
242+
TORCHTRT_CHECK(max_layer, "Unable to create max_layer for clamp to inputDim");
243+
LOG_DEBUG(ctx->logger, "Create " << max_layer->getName() << " for clamp to inputDim");
244+
auto max_itensor = max_layer->getOutput(0);
245+
246+
auto min_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMIN, max_itensor, upper_bound, "min layer for " + name);
247+
TORCHTRT_CHECK(min_layer, "Unable to create min_layer for clamp to inputDim");
248+
LOG_DEBUG(ctx->logger, "Create " << min_layer->getName() << " for clamp to inputDim");
249+
auto min_itensor = min_layer->getOutput(0);
250+
return min_itensor;
251+
}
252+
253+
// return indices < 0 ? inputDims + indices : indices
254+
nvinfer1::ITensor* normalize_indices(
255+
ConversionCtx* ctx,
256+
nvinfer1::ITensor* input_dim,
257+
nvinfer1::ITensor* indices,
258+
int nbdims,
259+
std::string const& name) {
260+
261+
auto zero = torch::zeros({nbdims}).to(torch::kI32);
262+
auto neg = -torch::ones({nbdims}).to(torch::kI32);
263+
auto zero_itensor = tensor_to_const(ctx, zero);
264+
auto neg_itensor = tensor_to_const(ctx, neg);
265+
// find the indices that = -1
266+
auto signs = clamp(ctx, indices, neg_itensor, zero_itensor, "clamp layer for " + name);
267+
268+
// get the inputDim value where indices == -1, else 0
269+
auto mul = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kPROD, signs, input_dim, "prod layer for " + name);
270+
TORCHTRT_CHECK(mul, "Unable to create mul layer in normalize_indices");
271+
LOG_DEBUG(ctx->logger, "Create " << mul->getName() << " for normalize_indices");
272+
auto mul_itensor = mul->getOutput(0);
273+
274+
// add the inputDim value to indices where indices == -1
275+
auto sub = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUB, indices, mul_itensor, "sub layer for " + name);
276+
TORCHTRT_CHECK(sub, "Unable to create sub layer in normalize_indices");
277+
LOG_DEBUG(ctx->logger, "Create " << sub->getName() << " for normalize_indices");
278+
auto sub_itensor = sub->getOutput(0);
279+
return sub_itensor;
280+
}
281+
282+
std::vector<nvinfer1::ITensor*> normalize_start_and_end(
283+
ConversionCtx* ctx,
284+
nvinfer1::ITensor* in_shape,
285+
nvinfer1::ITensor* in_start,
286+
nvinfer1::ITensor* in_end,
287+
int nbdims,
288+
std::string const& name) {
289+
auto start = normalize_indices(ctx, in_shape, in_start, nbdims, "normalize start of " + name);
290+
auto out_start = clamp_to_input_dim(ctx, start, in_shape, nbdims, "clamp start to inputDim for " + name);
291+
auto end = normalize_indices(ctx, in_shape, in_end, nbdims, "normalize end of " + name);
292+
auto out_end = clamp_to_input_dim(ctx, end, in_shape, nbdims, "clamp end to inputDim for " + name);
293+
std::vector<nvinfer1::ITensor*> outputs;
294+
outputs.push_back(out_start);
295+
outputs.push_back(out_end);
296+
return outputs;
297+
}
298+
299+
// size = (end - start) / stride + 1, where range is [start, end], end is included
300+
nvinfer1::ITensor* get_slice_size(
301+
ConversionCtx* ctx,
302+
nvinfer1::ITensor* start,
303+
nvinfer1::ITensor* end,
304+
nvinfer1::ITensor* stride,
305+
int nbdims,
306+
std::string const& name) {
307+
at::Tensor one_tensor = torch::ones({nbdims}).to(torch::kI32);
308+
auto one_itensor = tensor_to_const(ctx, one_tensor);
309+
310+
auto sub_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUB, end, start, "get_slice_size sub layer for " + name);
311+
TORCHTRT_CHECK(sub_layer, "Unable to create sub layer in calculate_output_size");
312+
LOG_DEBUG(ctx->logger, "Create " << sub_layer->getName() << " for calculate_output_size");
313+
auto sub_itensor = sub_layer->getOutput(0);
314+
315+
auto div_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kDIV, sub_itensor, stride, "get_slice_size div layer for " + name);
316+
TORCHTRT_CHECK(div_layer, "Unable to create div layer in calculate_output_size");
317+
LOG_DEBUG(ctx->logger, "Create " << div_layer->getName() << " for calculate_output_size");
318+
auto div_itensor = div_layer->getOutput(0);
319+
320+
auto add_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUM, div_itensor, one_itensor, "get_slice_size sum layer for " + name);
321+
TORCHTRT_CHECK(add_layer, "Unable to create add layer in calculate_output_size");
322+
LOG_DEBUG(ctx->logger, "Create " << add_layer->getName() << " for calculate_output_size");
323+
auto size_itensor = add_layer->getOutput(0);
324+
325+
return size_itensor;
326+
}
327+
202328
} // namespace converters
203329
} // namespace conversion
204330
} // namespace core

0 commit comments

Comments
 (0)