Skip to content

Conversation

@zewenli98
Copy link
Collaborator

@zewenli98 zewenli98 commented Nov 24, 2025

Description

As I requested, TensorRT 10.14 added an argument trt.SerializationFlag.INCLUDE_REFIT to allow refitted engines to keep refittable. That means engines can be refitted multiple times. Based on the capability, this PR enhances the existing engine caching and refitting features as follows:

  1. To save hard disk space, engine caching will only save weight-stripped engines on disk regardless of compilation_settings.strip_engine_weights. Then, when users pull out the cached engine, it will be automatically refitted and kept refittable.
  2. Compiled TRT modules can be refitted multiple times with refit_module_weights(). e.g.:
for _ in range(3):
    trt_gm = refit_module_weights(trt_gm, exp_program)
  1. Due to some changes, the insertion and pulling of cached engines are located in different places, which causes 🐛 [Bug] Engine cache failed on torch.compile backend=tensorrt #3909. This PR unified the insertion and pulling in _conversion.py.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@zewenli98 zewenli98 self-assigned this Nov 24, 2025
@meta-cla meta-cla bot added the cla signed label Nov 24, 2025
@github-actions github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: torch_compile labels Nov 24, 2025
"""

def _insert_engine_to_cache(
hash_val: str, interpreter_result: TRTInterpreterResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that the function needs to be in the interpret functions scope?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a specific reason, but I just don't know when the engine_cache will be used other than in the function interpret_module_to_result(). To make it safe and self-contained, I picked the smallest scope. Is there any other cases that might use engine_cache?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it will get redefined each time interpret is called if its in scope, not sure its necessary unless there was some reason like it uses context from interpret module

logger.info(f"Engine was successfully inserted into cache for hash: {hash_val}")

@needs_refit # type: ignore[misc]
def _pull_cached_engine(hash_val: str) -> Optional[SerializedInterpreterResult]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"""

def _insert_engine_to_cache(
hash_val: str, interpreter_result: TRTInterpreterResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that the function needs to be in the interpret functions scope?

)
logger.info(f"Engine was successfully inserted into cache for hash: {hash_val}")

@needs_refit # type: ignore[misc]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the insert and extract both be needs refit?

Also shouldnt this gracefully pass through vs the typically unimplemented error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the insert and extract both be needs refit?

insert seems not to involve any refitting stuff. It supports a scenario that users insert engines on machine A that doesn't support refit but pull engines on machine B that supports refit. Please correct me if wrong.

Also shouldnt this gracefully pass through vs the typically unimplemented error?

Not sure if I understand your question correctly. The reason why we need refit in pull is that we save weight-stripped engine in this implementation, which needs to be refitted to get correct weights before using.

Copy link
Collaborator

@narendasan narendasan Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the insert and extract both be needs refit?

If we support saving and pulling identical immutable weights engines then we dont need to guard at all, but if we are just saving immutable weights engines but never reusing them I dont see the point of supporting just insert

Also shouldnt this gracefully pass through vs the typically unimplemented error?

The needs_X decorator will throw unimplemented if its hit with the feature not enabled. But in the case of refit if its not enabled then we should just bypass the cache

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I got you. fixed the two issues

@narendasan
Copy link
Collaborator

@cehongwang please take a pass so we have multiple eyes on this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests component: torch_compile

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants