Closed
Description
TL;DR
Improve the performance acceleration, robustness, and overall quality of the torch_tensorrt.dynamo.compile
path. Simultaneously use these improvements to improve the quality of the torch_tensorrt.dynamo.fx_ts_compat
path.
Goal(s)
By adding lowering passes, improving converter coverage, and building features to accelerate the performance of our torch.compile
backend, we can improve the quality of the path. Additionally, in conjunction with #1940, many of these upgrades can be ported over to our torch._dynamo.export
path.
Implementation Phases
- Module-Level Acceleration ✨[Feature] Upgrade Prototype System for Module-Level Acceleration in Dynamo Path #1894
- Improved Converter Coverage (↔ [Converter] Add support for assorted operators in the FX
aten
path #1769, ↔ [Converter] Add operations to accelerate Transformer encoder #1754) - Support for
truncate_long_and_double
(fix: Add support fortruncate_long_and_double
in FX #1865) - Engine caching + cross-session state capture
- Dynamic shape support + avoiding unnecessary recompilation
Additional context
Related to #1940