Some fixes for the dynamic memory setting #3729

narendasan · 2025-07-29T22:58:37Z

Description

Allows the allocation strategy to be set at build time, fixes some of the mode switching and cleans up some naming

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

…the mode switching

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/register_jit_hooks.cpp b/tmp/changes.txt
index 6d15bd8..b6f2d5b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/register_jit_hooks.cpp
+++ b/tmp/changes.txt
@@ -109,7 +109,10 @@ static auto TORCHTRT_UNUSED TRTEngineTSRegistrtion =
            [](const c10::intrusive_ptr<TRTEngine>& self) -> std::vector<std::string> { return self->serialize(); },
            [](std::vector<std::string> serialized_info) -> c10::intrusive_ptr<TRTEngine> {
              serialized_info[ENGINE_IDX] = base64_decode(serialized_info[ENGINE_IDX]);
-              LOG_DEBUG("Deserialized resource allocation strategy: " << (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX])) ? "Dynamic" : "Static"));
+              LOG_DEBUG(
+                  "Deserialized resource allocation strategy: "
+                  << (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX])) ? "Dynamic"
+                                                                                                      : "Static"));
              TRTEngine::verify_serialization_fmt(serialized_info);
              return c10::make_intrusive<TRTEngine>(serialized_info);
            });
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 253738b..de70331 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -86,7 +86,9 @@ TRTEngine::TRTEngine(std::vector<std::string> serialized_info)
          static_cast<bool>(std::stoi(serialized_info[HW_COMPATIBLE_IDX])),
          static_cast<bool>(std::stoi(serialized_info[REQUIRES_OUTPUT_ALLOCATOR_IDX])),
          serialized_info[SERIALIZED_METADATA_IDX],
-          (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX])) ? ResourceAllocationStrategy::kDynamic : ResourceAllocationStrategy::kStatic)) {}
+          (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX]))
+               ? ResourceAllocationStrategy::kDynamic
+               : ResourceAllocationStrategy::kStatic)) {}

TRTEngine::TRTEngine(
    const std::string& mod_name,
@@ -129,7 +131,9 @@ TRTEngine::TRTEngine(
  }

  this->resource_allocation_strategy = resource_allocation_strategy;
-  LOG_DEBUG("Resource allocation strategy: " << (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "Dynamic" : "Static"));
+  LOG_DEBUG(
+      "Resource allocation strategy: "
+      << (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "Dynamic" : "Static"));
  if (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic) {
    this->exec_ctx =
        make_trt(cuda_engine->createExecutionContext(nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
@@ -472,7 +476,8 @@ std::vector<std::string> TRTEngine::serialize() {
  serialized_info[REQUIRES_OUTPUT_ALLOCATOR_IDX] = this->requires_output_allocator ? "1" : "0";
  serialized_info[SERIALIZED_METADATA_IDX] = this->serialized_metadata;
  serialized_info[TARGET_PLATFORM_IDX] = this->target_platform.serialize();
-  serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX] = this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "1" : "0";
+  serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX] =
+      this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "1" : "0";

  return serialized_info;
}
@@ -486,11 +491,11 @@ void TRTEngine::set_resource_allocation_strategy(TRTEngine::ResourceAllocationSt
    this->resource_allocation_strategy = new_strategy;
    if (this->resource_allocation_strategy == TRTEngine::ResourceAllocationStrategy::kDynamic) {
      LOG_DEBUG("Setting resource allocation strategy to dynamic");
-      this->exec_ctx = make_trt(cuda_engine->createExecutionContext(nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
+      this->exec_ctx =
+          make_trt(cuda_engine->createExecutionContext(nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
    } else {
      LOG_DEBUG("Setting resource allocation strategy to static");
-      this->exec_ctx = make_trt(
-          cuda_engine->createExecutionContext());
+      this->exec_ctx = make_trt(cuda_engine->createExecutionContext());
    }
  }
}
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/examples/dynamo/dynamic_memory_allocation.py	2025-07-29 23:09:46.508169+00:00
+++ /home/runner/work/TensorRT/TensorRT/examples/dynamo/dynamic_memory_allocation.py	2025-07-29 23:10:09.855773+00:00
@@ -14,21 +14,22 @@
    "ir": "dynamo",
    "use_python_runtime": False,
    "enabled_precisions": {torch.float32},
    "immutable_weights": False,
    "lazy_engine_init": True,
-    "dynamically_allocate_resources": True
-
+    "dynamically_allocate_resources": True,
}

model = models.resnet152(pretrained=True).eval().to("cuda")
compiled_module = torch_trt.compile(model, inputs=inputs, **settings)
print((torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3)
compiled_module(*inputs)

time.sleep(30)
-with torch_trt.dynamo.runtime.ResourceAllocationStrategy(compiled_module, dynamically_allocate_resources=False):
+with torch_trt.dynamo.runtime.ResourceAllocationStrategy(
+    compiled_module, dynamically_allocate_resources=False
+):
    print(
        "Memory used (GB):",
        (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3,
    )
    compiled_module(*inputs)
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_ResourceAllocator.py	2025-07-29 23:09:46.525169+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_ResourceAllocator.py	2025-07-29 23:10:11.748306+00:00
@@ -12,21 +12,25 @@
    """

    def __init__(
        self,
        compiled_module: torch.nn.Module,
-        dynamically_allocate_resources: bool = True
+        dynamically_allocate_resources: bool = True,
    ) -> None:
        super(ResourceAllocationStrategy, self).__init__()
        self.compiled_module = compiled_module
        self.dynamically_allocate_resources = dynamically_allocate_resources

    def __enter__(self) -> None:
        print("Entering resource allocator context")
        for name, submodule in self.compiled_module.named_modules():
            if "_run_on_acc" in name:
-                submodule.use_dynamically_allocated_resources(dynamically_allocate_resources=self.dynamically_allocate_resources)
+                submodule.use_dynamically_allocated_resources(
+                    dynamically_allocate_resources=self.dynamically_allocate_resources
+                )

    def __exit__(self, exc_type: Any, exc_value: Any, exc_tb: Any) -> None:
        for name, submodule in self.compiled_module.named_modules():
            if "_run_on_acc" in name:
-                submodule.use_dynamically_allocated_resources(dynamically_allocate_resources=self.dynamically_allocate_resources)
+                submodule.use_dynamically_allocated_resources(
+                    dynamically_allocate_resources=self.dynamically_allocate_resources
+                )
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2025-07-29 23:09:46.525169+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2025-07-29 23:10:12.090030+00:00
@@ -186,11 +186,13 @@
        engine_info[SERIALIZED_METADATA_IDX] = self.encode_metadata(metadata)
        engine_info[TARGET_PLATFORM_IDX] = target_platform._to_serialized_rt_platform()
        engine_info[REQUIRES_OUTPUT_ALLOCATOR_IDX] = str(
            int(self.requires_output_allocator)
        )
-        print(f"PROVIDED RESOURCE ALLOCATION STRATEGY: {self.dynamically_allocate_resources}")
+        print(
+            f"PROVIDED RESOURCE ALLOCATION STRATEGY: {self.dynamically_allocate_resources}"
+        )
        engine_info[RESOURCE_ALLOCATION_STRATEGY_IDX] = str(
            int(self.dynamically_allocate_resources)
        )
        print(engine_info[RESOURCE_ALLOCATION_STRATEGY_IDX])

@@ -219,13 +221,17 @@
        return budget_bytes

    def _reset_captured_graph(self) -> None:
        self.engine.reset_captured_graph()

-    def use_dynamically_allocated_resources(self, dynamically_allocate_resources: bool = False) -> None:
+    def use_dynamically_allocated_resources(
+        self, dynamically_allocate_resources: bool = False
+    ) -> None:
        self.dynamically_allocate_resources = dynamically_allocate_resources
-        self.engine.use_dynamically_allocated_resources(self.dynamically_allocate_resources)
+        self.engine.use_dynamically_allocated_resources(
+            self.dynamically_allocate_resources
+        )

    def setup_engine(self) -> None:
        """
        Setup engine for a module which has deferred engine setup.

Allowing the allocation mode to be set at build time, some fixes for …

47e5da2

…the mode switching

meta-cla bot added the cla signed label Jul 29, 2025

github-actions bot added component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jul 29, 2025

github-actions bot requested a review from peri044 July 29, 2025 22:58

narendasan changed the base branch from dynamic_allocate to dynamic-allocation July 29, 2025 22:59

Update _settings.py

83dbf3f

github-actions bot requested changes Jul 29, 2025

View reviewed changes

cehongwang approved these changes Jul 29, 2025

View reviewed changes

cehongwang merged commit 50678a5 into dynamic-allocation Jul 29, 2025
52 of 55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some fixes for the dynamic memory setting #3729

Some fixes for the dynamic memory setting #3729

Uh oh!

narendasan commented Jul 29, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

Some fixes for the dynamic memory setting #3729

Some fixes for the dynamic memory setting #3729

Uh oh!

Conversation

narendasan commented Jul 29, 2025

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!