Skip to content

Commit efadc57

Browse files
borisfompre-commit-ci[bot]yiheng-wang-nvbinliunls
authored
Implementation of TRT wrapping via inference.json (#620)
Sample implementation of using new trt_wrap() from MONAI Depends on Project-MONAI/MONAI#7990 --------- Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Yiheng Wang <[email protected]> Signed-off-by: binliu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yiheng Wang <[email protected]> Co-authored-by: binliu <[email protected]>
1 parent ae0beb1 commit efadc57

File tree

30 files changed

+282
-73
lines changed

30 files changed

+282
-73
lines changed

ci/run_premerge_gpu.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ verify_bundle() {
117117
fi
118118
test_cmd="python $(pwd)/ci/unit_tests/runner.py --b \"$bundle\""
119119
if [ "$dist_flag" = "True" ]; then
120-
test_cmd="$test_cmd --dist True"
120+
test_cmd="torchrun $(pwd)/ci/unit_tests/runner.py --b \"$bundle\" --dist True"
121121
fi
122122
eval $test_cmd
123123
# if not maisi_ct_generative, remove venv
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"+imports": [
3+
"$from monai.networks import trt_compile"
4+
],
5+
"diffusion": "$trt_compile(@network_def.to(@device), @load_diffusion_path)",
6+
"autoencoder": "$trt_compile(@autoencoder_def.to(@device), @load_autoencoder_path)"
7+
}

models/brats_mri_axial_slices_generative_diffusion/configs/metadata.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
{
22
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json",
3-
"version": "1.1.0",
3+
"version": "1.1.1",
44
"changelog": {
5+
"1.1.1": "enable tensorrt",
56
"1.1.0": "update to use monai 1.4, model ckpt not changed, rm GenerativeAI repo",
67
"1.0.9": "update to use monai 1.3.1",
78
"1.0.8": "define arg for output file and put infer logic into a function",
@@ -15,7 +16,7 @@
1516
"1.0.0": "Initial release"
1617
},
1718
"monai_version": "1.4.0",
18-
"pytorch_version": "2.2.2",
19+
"pytorch_version": "2.4.0",
1920
"numpy_version": "1.24.4",
2021
"required_packages_version": {
2122
"nibabel": "5.2.1",

models/brats_mri_axial_slices_generative_diffusion/docs/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,31 @@ If you face memory issues with data loading, you can lower the caching rate `cac
8585

8686
![A graph showing the latent diffusion training curve](https://developer.download.nvidia.com/assets/Clara/Images/monai_brain_image_gen_ldm2d_train_diffusion_loss_v3.png)
8787

88+
#### TensorRT speedup
89+
This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that 32-bit precision models are benchmarked with tf32 weight format.
90+
91+
| method | torch_tf32(ms) | torch_amp(ms) | trt_tf32(ms) | trt_fp16(ms) | speedup amp | speedup tf32 | speedup fp16 | amp vs fp16|
92+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
93+
| model computation (diffusion) | 32.11 | 32.45 | 2.58 | 2.11 | 0.99 | 12.45 | 15.22 | 15.38 |
94+
| model computation (autoencoder) | 17.74 | 18.15 | 5.47 | 3.66 | 0.98 | 3.24 | 4.85 | 4.96 |
95+
| end2end | 1389 | 1973 | 332 | 314 | 0.70 | 4.18 | 4.42 | 6.28 |
96+
97+
Where:
98+
- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
99+
- `end2end` means run the bundle end-to-end with the TensorRT based model.
100+
- `torch_tf32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
101+
- `trt_tf32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
102+
- `speedup amp`, `speedup tf32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
103+
- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
104+
105+
This result is benchmarked under:
106+
- TensorRT: 10.3.0+cuda12.6
107+
- Torch-TensorRT Version: 2.4.0
108+
- CPU Architecture: x86-64
109+
- OS: ubuntu 20.04
110+
- Python version:3.10.12
111+
- CUDA version: 12.6
112+
- GPU models and configuration: A100 80G
88113

89114
## MONAI Bundle Commands
90115
In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
@@ -143,6 +168,12 @@ The following code generates a synthetic image from a random sampled noise.
143168
python -m monai.bundle run --config_file configs/inference.json
144169
```
145170

171+
#### Execute inference with the TensorRT model:
172+
173+
```
174+
python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
175+
```
176+
146177
# References
147178
[1] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
148179

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
large_files:
22
- path: "models/model_autoencoder.pt"
3-
url: "https://drive.google.com/uc?id=1x4JEfWwCnR0wvS9v5TBWX1n9xl51xZj9"
3+
url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_autoencoder_brats_mri_axial_slices_generative_diffusion_v1.pt"
44
hash_val: "847a61ad13a68ebfca9c0a8fa6d0d6bd"
55
hash_type: "md5"
66
- path: "models/model.pt"
7-
url: "https://drive.google.com/uc?id=1CJmlrLY4SYHl4swtnY1EJmuiNt1H7Jzu"
7+
url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_brats_mri_axial_slices_generative_diffusion_v1.pt"
88
hash_val: "93a19ea3eaafd9781b4140286b121f37"
99
hash_type: "md5"
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"+imports": [
3+
"$from monai.networks import trt_compile"
4+
],
5+
"diffusion": "$trt_compile(@network_def.to(@device), @load_diffusion_path)",
6+
"autoencoder": "$trt_compile(@autoencoder_def.to(@device), @load_autoencoder_path)"
7+
}

models/brats_mri_generative_diffusion/configs/metadata.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
{
22
"schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json",
3-
"version": "1.1.0",
3+
"version": "1.1.1",
44
"changelog": {
5+
"1.1.1": "enable tensorrt",
56
"1.1.0": "update to use monai 1.4, model ckpt not changed, rm GenerativeAI repo",
67
"1.0.9": "update to use monai 1.3.1",
78
"1.0.8": "update run section",
@@ -15,7 +16,7 @@
1516
"1.0.0": "Initial release"
1617
},
1718
"monai_version": "1.4.0",
18-
"pytorch_version": "2.2.2",
19+
"pytorch_version": "2.4.0",
1920
"numpy_version": "1.24.4",
2021
"required_packages_version": {
2122
"nibabel": "5.2.1",

models/brats_mri_generative_diffusion/docs/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,32 @@ If you face memory issues with data loading, you can lower the caching rate `cac
8282

8383
![A graph showing the latent diffusion training curve](https://developer.download.nvidia.com/assets/Clara/Images/monai_brain_image_gen_ldm3d_train_diffusion_loss_v2.png)
8484

85+
#### TensorRT speedup
86+
This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that 32-bit precision models are benchmarked with tf32 weight format.
87+
88+
| method | torch_tf32(ms) | torch_amp(ms) | trt_tf32(ms) | trt_fp16(ms) | speedup amp | speedup tf32 | speedup fp16 | amp vs fp16|
89+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
90+
| model computation (diffusion) | 44.57 | 44.59 | 40.89 | 18.79 | 1.00 | 1.09 | 2.37 | 2.37 |
91+
| model computation (autoencoder) | 96.29 | 97.01 | 78.51 | 44.03 | 0.99 | 1.23 | 2.19 | 2.20 |
92+
| end2end | 2826 | 2538 | 2759 | 1472 | 1.11 | 1.02 | 1.92 | 1.72 |
93+
94+
Where:
95+
- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
96+
- `end2end` means run the bundle end-to-end with the TensorRT based model.
97+
- `torch_tf32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
98+
- `trt_tf32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
99+
- `speedup amp`, `speedup tf32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
100+
- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
101+
102+
This result is benchmarked under:
103+
- TensorRT: 10.3.0+cuda12.6
104+
- Torch-TensorRT Version: 2.4.0
105+
- CPU Architecture: x86-64
106+
- OS: ubuntu 20.04
107+
- Python version:3.10.12
108+
- CUDA version: 12.6
109+
- GPU models and configuration: A100 80G
110+
85111
## MONAI Bundle Commands
86112

87113
In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
@@ -143,6 +169,13 @@ The following code generates a synthetic image from a random sampled noise.
143169
python -m monai.bundle run --config_file configs/inference.json
144170
```
145171

172+
#### Execute inference with the TensorRT model:
173+
174+
```
175+
python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
176+
```
177+
178+
146179
# References
147180
[1] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
148181

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
large_files:
22
- path: "models/model_autoencoder.pt"
3-
url: "https://drive.google.com/uc?id=1arp3w8glsQw2h7mQBbk71krqmaG_std6"
3+
url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_autoencoder_brats_mri_generative_diffusion_v1.pt"
44
hash_val: "9e6df4cc9a2decf49ab3332606b32c55"
55
hash_type: "md5"
66
- path: "models/model.pt"
7-
url: "https://drive.google.com/uc?id=1m2pcbj8NMoxEIAOmD9dgYBN4gNcrMx6e"
7+
url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_brats_mri_generative_diffusion_v1.pt"
88
hash_val: "35258b1112f701f3d485676d33141a55"
99
hash_type: "md5"

models/pathology_nuclei_classification/configs/inference.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
"$import os"
77
],
88
"bundle_root": ".",
9+
"checkpoint": "$@bundle_root + '/models/model.pt'",
910
"output_dir": "$@bundle_root + '/eval'",
1011
"dataset_dir": "/workspace/data/CoNSePNuclei",
1112
"images": "$list(sorted(glob.glob(@dataset_dir + '/Test/Images/*.png')))[:1]",
@@ -88,7 +89,7 @@
8889
"handlers": [
8990
{
9091
"_target_": "CheckpointLoader",
91-
"load_path": "$@bundle_root + '/models/model.pt'",
92+
"load_path": "$@checkpoint",
9293
"load_dict": {
9394
"model": "@network"
9495
}

0 commit comments

Comments
 (0)