Project-MONAI
diff --git a/‎ci/run_premerge_gpu.sh‎
Lines changed: 1 addition & 1 deletion b/‎ci/run_premerge_gpu.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎models/brats_mri_axial_slices_generative_diffusion/configs/inference_trt.json‎
Lines changed: 7 additions & 0 deletions b/‎models/brats_mri_axial_slices_generative_diffusion/configs/inference_trt.json‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎models/brats_mri_axial_slices_generative_diffusion/configs/metadata.json‎
Lines changed: 3 additions & 2 deletions b/‎models/brats_mri_axial_slices_generative_diffusion/configs/metadata.json‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎models/brats_mri_axial_slices_generative_diffusion/docs/README.md‎
Lines changed: 31 additions & 0 deletions b/‎models/brats_mri_axial_slices_generative_diffusion/docs/README.md‎
Lines changed: 31 additions & 0 deletions
diff --git a/‎models/brats_mri_axial_slices_generative_diffusion/large_files.yml‎
Lines changed: 2 additions & 2 deletions b/‎models/brats_mri_axial_slices_generative_diffusion/large_files.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎models/brats_mri_generative_diffusion/configs/inference_trt.json‎
Lines changed: 7 additions & 0 deletions b/‎models/brats_mri_generative_diffusion/configs/inference_trt.json‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎models/brats_mri_generative_diffusion/configs/metadata.json‎
Lines changed: 3 additions & 2 deletions b/‎models/brats_mri_generative_diffusion/configs/metadata.json‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎models/brats_mri_generative_diffusion/docs/README.md‎
Lines changed: 33 additions & 0 deletions b/‎models/brats_mri_generative_diffusion/docs/README.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎models/brats_mri_generative_diffusion/large_files.yml‎
Lines changed: 2 additions & 2 deletions b/‎models/brats_mri_generative_diffusion/large_files.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎models/pathology_nuclei_classification/configs/inference.json‎
Lines changed: 2 additions & 1 deletion b/‎models/pathology_nuclei_classification/configs/inference.json‎
Lines changed: 2 additions & 1 deletion
@@ -117,7 +117,7 @@ verify_bundle() {
             fi
             test_cmd="python $(pwd)/ci/unit_tests/runner.py --b \"$bundle\""
             if [ "$dist_flag" = "True" ]; then
-                test_cmd="$test_cmd --dist True"
+                test_cmd="torchrun $(pwd)/ci/unit_tests/runner.py --b \"$bundle\" --dist True"
             fi
             eval $test_cmd
             # if not maisi_ct_generative, remove venv
 
@@ -0,0 +1,7 @@
+{
+    "+imports": [
+        "$from monai.networks import trt_compile"
+    ],
+    "diffusion": "$trt_compile(@network_def.to(@device), @load_diffusion_path)",
+    "autoencoder": "$trt_compile(@autoencoder_def.to(@device), @load_autoencoder_path)"
+}
@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json",
-    "version": "1.1.0",
+    "version": "1.1.1",
     "changelog": {
+        "1.1.1": "enable tensorrt",
         "1.1.0": "update to use monai 1.4, model ckpt not changed, rm GenerativeAI repo",
         "1.0.9": "update to use monai 1.3.1",
         "1.0.8": "define arg for output file and put infer logic into a function",
@@ -15,7 +16,7 @@
         "1.0.0": "Initial release"
     },
     "monai_version": "1.4.0",
-    "pytorch_version": "2.2.2",
+    "pytorch_version": "2.4.0",
     "numpy_version": "1.24.4",
     "required_packages_version": {
         "nibabel": "5.2.1",
 
@@ -85,6 +85,31 @@ If you face memory issues with data loading, you can lower the caching rate `cac
 
 ![A graph showing the latent diffusion training curve](https://developer.download.nvidia.com/assets/Clara/Images/monai_brain_image_gen_ldm2d_train_diffusion_loss_v3.png)
 
+#### TensorRT speedup
+This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that 32-bit precision models are benchmarked with tf32 weight format.
+
+| method | torch_tf32(ms) | torch_amp(ms) | trt_tf32(ms) | trt_fp16(ms) | speedup amp | speedup tf32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation (diffusion) | 32.11 | 32.45 | 2.58 | 2.11 | 0.99 | 12.45 | 15.22 | 15.38 |
+| model computation (autoencoder) | 17.74 | 18.15 | 5.47 | 3.66 | 0.98 | 3.24 | 4.85 | 4.96 |
+| end2end | 1389 | 1973 | 332 | 314 | 0.70 | 4.18 | 4.42 | 6.28 |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_tf32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_tf32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup tf32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.4.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
 
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
@@ -143,6 +168,12 @@ The following code generates a synthetic image from a random sampled noise.
 python -m monai.bundle run --config_file configs/inference.json
 ```
 
+#### Execute inference with the TensorRT model:
+
+```
+python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
+```
+
 # References
 [1] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
 
 
@@ -1,9 +1,9 @@
 large_files:
   - path: "models/model_autoencoder.pt"
-    url: "https://drive.google.com/uc?id=1x4JEfWwCnR0wvS9v5TBWX1n9xl51xZj9"
+    url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_autoencoder_brats_mri_axial_slices_generative_diffusion_v1.pt"
     hash_val: "847a61ad13a68ebfca9c0a8fa6d0d6bd"
     hash_type: "md5"
   - path: "models/model.pt"
-    url: "https://drive.google.com/uc?id=1CJmlrLY4SYHl4swtnY1EJmuiNt1H7Jzu"
+    url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_brats_mri_axial_slices_generative_diffusion_v1.pt"
     hash_val: "93a19ea3eaafd9781b4140286b121f37"
     hash_type: "md5"
@@ -0,0 +1,7 @@
+{
+    "+imports": [
+        "$from monai.networks import trt_compile"
+    ],
+    "diffusion": "$trt_compile(@network_def.to(@device), @load_diffusion_path)",
+    "autoencoder": "$trt_compile(@autoencoder_def.to(@device), @load_autoencoder_path)"
+}
@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20240725.json",
-    "version": "1.1.0",
+    "version": "1.1.1",
     "changelog": {
+        "1.1.1": "enable tensorrt",
         "1.1.0": "update to use monai 1.4, model ckpt not changed, rm GenerativeAI repo",
         "1.0.9": "update to use monai 1.3.1",
         "1.0.8": "update run section",
@@ -15,7 +16,7 @@
         "1.0.0": "Initial release"
     },
     "monai_version": "1.4.0",
-    "pytorch_version": "2.2.2",
+    "pytorch_version": "2.4.0",
     "numpy_version": "1.24.4",
     "required_packages_version": {
         "nibabel": "5.2.1",
 
@@ -82,6 +82,32 @@ If you face memory issues with data loading, you can lower the caching rate `cac
 
 ![A graph showing the latent diffusion training curve](https://developer.download.nvidia.com/assets/Clara/Images/monai_brain_image_gen_ldm3d_train_diffusion_loss_v2.png)
 
+#### TensorRT speedup
+This bundle supports acceleration with TensorRT. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that 32-bit precision models are benchmarked with tf32 weight format.
+
+| method | torch_tf32(ms) | torch_amp(ms) | trt_tf32(ms) | trt_fp16(ms) | speedup amp | speedup tf32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation (diffusion) | 44.57 | 44.59 | 40.89 | 18.79 | 1.00 | 1.09 | 2.37 | 2.37 |
+| model computation (autoencoder) | 96.29 | 97.01 | 78.51 | 44.03 | 0.99 | 1.23 | 2.19 | 2.20 |
+| end2end | 2826 | 2538 | 2759 | 1472 | 1.11 | 1.02 | 1.92 | 1.72 |
+
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_tf32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_tf32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup tf32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+
+This result is benchmarked under:
+ - TensorRT: 10.3.0+cuda12.6
+ - Torch-TensorRT Version: 2.4.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.10.12
+ - CUDA version: 12.6
+ - GPU models and configuration: A100 80G
+
 ## MONAI Bundle Commands
 
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
@@ -143,6 +169,13 @@ The following code generates a synthetic image from a random sampled noise.
 python -m monai.bundle run --config_file configs/inference.json
 ```
 
+#### Execute inference with the TensorRT model:
+
+```
+python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
+```
+
+
 # References
 [1] Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. https://openaccess.thecvf.com/content/CVPR2022/papers/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.pdf
 
 
@@ -1,9 +1,9 @@
 large_files:
   - path: "models/model_autoencoder.pt"
-    url: "https://drive.google.com/uc?id=1arp3w8glsQw2h7mQBbk71krqmaG_std6"
+    url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_autoencoder_brats_mri_generative_diffusion_v1.pt"
     hash_val: "9e6df4cc9a2decf49ab3332606b32c55"
     hash_type: "md5"
   - path: "models/model.pt"
-    url: "https://drive.google.com/uc?id=1m2pcbj8NMoxEIAOmD9dgYBN4gNcrMx6e"
+    url: "https://developer.download.nvidia.com/assets/Clara/monai/tutorials/model_zoo/model_brats_mri_generative_diffusion_v1.pt"
     hash_val: "35258b1112f701f3d485676d33141a55"
     hash_type: "md5"
@@ -6,6 +6,7 @@
         "$import os"
     ],
     "bundle_root": ".",
+    "checkpoint": "$@bundle_root + '/models/model.pt'",
     "output_dir": "$@bundle_root + '/eval'",
     "dataset_dir": "/workspace/data/CoNSePNuclei",
     "images": "$list(sorted(glob.glob(@dataset_dir + '/Test/Images/*.png')))[:1]",
@@ -88,7 +89,7 @@
     "handlers": [
         {
             "_target_": "CheckpointLoader",
-            "load_path": "$@bundle_root + '/models/model.pt'",
+            "load_path": "$@checkpoint",
             "load_dict": {
                 "model": "@network"
             }