@@ -34,3 +34,37 @@ Plugin Library
3434In the case you use Torch-TensorRT as a converter to a TensorRT engine and your engine uses plugins provided by Torch-TensorRT, Torch-TensorRT
3535ships the library ``libtorchtrt_plugins.so `` which contains the implementation of the TensorRT plugins used by Torch-TensorRT during
3636compilation. This library can be ``DL_OPEN `` or ``LD_PRELOAD `` similar to other TensorRT plugin libraries.
37+
38+ Multi Device Safe Mode
39+ ---------------
40+
41+ Multi-device safe mode is a setting in Torch-TensorRT which allows the user to determine whether
42+ the runtime checks for device consistency prior to every inference call.
43+
44+ There is a non-negligible, fixed cost per-inference call when multi-device safe mode is enabled, which is why
45+ it is now disabled by default. It can be controlled via the following convenience function which
46+ doubles as a context manager.
47+
48+ .. code-block :: python
49+
50+ # Enables Multi Device Safe Mode
51+ torch_tensorrt.runtime.set_multi_device_safe_mode(True )
52+
53+ # Disables Multi Device Safe Mode [Default Behavior]
54+ torch_tensorrt.runtime.set_multi_device_safe_mode(False )
55+
56+ # Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
57+ with torch_tensorrt.runtime.set_multi_device_safe_mode(True ):
58+ ...
59+
60+ TensorRT requires that each engine be associated with the CUDA context in the active thread from which it is invoked.
61+ Therefore, if the device were to change in the active thread, which may be the case when invoking
62+ engines on multiple GPUs from the same Python process, safe mode will cause Torch-TensorRT to display
63+ an alert and switch GPUs accordingly. If safe mode were not enabled, there could be a mismatch in the engine
64+ device and CUDA context device, which could lead the program to crash.
65+
66+ One technique for managing multiple TRT engines on different GPUs while not sacrificing performance for
67+ multi-device safe mode is to use Python threads. Each thread is responsible for all of the TRT engines
68+ on a single GPU, and the default CUDA device on each thread corresponds to the GPU for which it is
69+ responsible (can be set via ``torch.cuda.set_device(...) ``). In this way, multiple threads can be used in the same
70+ Python script without needing to switch CUDA contexts and incur performance overhead.
0 commit comments