You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rfcs/20200624-pluggable-device-for-tensorflow.md
+28-23Lines changed: 28 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ With the RFC, existing TensorFlow GPU programs can run on a plugged device witho
50
50
51
51
This section describes the user scenarios that are supported/unsupported for PluggableDevice.
52
52
***Supported scenario**: Single PluggableDevice registered as "GPU" device type
53
-
In the case of installing one plugin that registers its PluggableDevice as "GPU" device type, the default GPUDevice will be invalid when the plugin is loaded. When user specifies the "GPU" device for ops under `with tf.device("gpu:0")`, PluggableDevice registered will be selected to run those ops.
53
+
In the case of installing one plugin that registers its PluggableDevice as "GPU" device type, the default GPUDevice will be overrided by PluggableDevice when the plugin is loaded. When user specifies the "GPU" device for ops under `with tf.device("gpu:0")`, PluggableDevice registered will be selected to run those ops.
@@ -68,43 +68,49 @@ This section describes the user scenarios that are supported/unsupported for Plu
68
68
</div>
69
69
70
70
***Non-Supported scenario**: Multiple PluggableDevices registered as the same device type.
71
-
In the case of installing multiple plugins that registers PluggableDevice as the same device type, e.g., more than one plugin registers its PluggableDevice as "GPU" device type, these plugins's initialization will fail due to registration conflict. User needs to manually select which platform they want to use(either unloads the conflicting plugin or reconfigures the plugin with python API).
71
+
In the case of installing multiple plugins that registers PluggableDevice as the same device type, e.g., more than one plugin registers its PluggableDevice as "GPU" device type, these plugins's initialization will fail due to registration conflict. Users need to manually select which platform they want to use(either unloads the conflicting plugin or reconfigures the plugin with python API).
This section describes the front-end mirroring mechanism for python users, pointing at previous user scenarios.
76
+
### Device mapping mechanism
77
+
This section describes the device mapping mechanism for python users, pointing at previous user scenarios.
78
78
***Device type && Subdevice type**
79
-
Device type is user visible. User can specify the device type for the ops. e.g, "gpu", "xpu", "cpu". Subdevice type is user visible and user can specify which subdevice to use for the device type(mirroring), e.g.("NVIDIA_GPU", "INTEL_GPU", "AMD_GPU").
79
+
Device type is user visible. User can specify the device type for the ops. e.g, "gpu", "xpu", "cpu". Subdevice type is user visible and user can specify which subdevice to use for the device type(device mapping), e.g.("NVIDIA_GPU", "INTEL_GPU", "AMD_GPU").
80
80
```
81
81
>> with tf.device("/gpu:0"):
82
82
...
83
83
>> with tf.device("/xpu:0"):
84
84
...
85
85
```
86
-
***Front-end mirroring**
87
-
In the case of two GPUs in the same system, e.g. NVIDIA GPU + INTEL GPU and installing the Intel GPU plugin.
88
-
***Option 1**
89
-
Only plugged gpu device is visible, PluggableDevice(INTEL GPU) overrides the default GPUDevice(NVIDIA GPU). If user want to use NVIDIA GPU, he needs to manually uninstall the plugin.
86
+
***Device mapping**
87
+
In the case of two GPUs in the same system, e.g. NVIDIA GPU + X GPU and installing the X GPU plugin.
88
+
***Option 1**: override CUDA device, only plugged gpu device visible
89
+
Only plugged gpu device is visible, PluggableDevice(X GPU) overrides the default GPUDevice(CUDA GPU). If users want to use CUDA GPU, they need to manually uninstall the plugin.
* **Option 2**: both visible, but plugged gpu is default, user can set the device mapping
98
+
Both plugged gpu device and default gpu device are visible, but only one gpu can work at the same time, plugged gpu device is default enabled(with higher priority), if users want to use NVIDIA GPU, they need to call device mapping API(set_sub_device_mapping()) to switch to CUDA device.
96
99
```
97
-
* **Option 2**
98
-
Both plugged gpu device and default gpu device are visible, but only one gpu can work at the same time, plugged gpu device is default enabled, if user want to use NVIDIA GPU, he need to call mirroring API(set_sub_device_mapping()) to switch to NVIDIA gpu device.
physical device name is user visible. User can query the physical device name(e.g. "Titan V") for the specified device instance through [tf.config.experimental.get_device_details()](https://www.tensorflow.org/api_docs/python/tf/config/experimental/get_device_details).
110
116
```
@@ -114,18 +120,17 @@ This section describes the front-end mirroring mechanism for python users, point
114
120
>> print(details.get(`device_name`))
115
121
"TITAN_V, XXX"
116
122
```
117
-
118
123
### Device Discovery
119
124
120
125
Upon initialization of TensorFlow, it uses platform independent `LoadLibrary()` to load the dynamic library. The plugin library should be installed to the default plugin directory "…python_dir.../site-packages/tensorflow-plugins". The modular tensorflow [RFC](https://github.com/tensorflow/community/pull/77) describes the process of loading plugins.
121
126
122
-
During the plugin library initialization, TensorFlow proper calls the `SE_InitializePlugin` API (part of StreamExecutor C API) to retrieve nescessary informations from the plugin to instantiate a StreamExecutor platform([se::platform](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/platform.h#L93) class) and registers the platform to a global object [se::MultiPlatformManager](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/multi_platform_manager.h#L82), TensorFlow proper gets a device type and a subdevice type from plugin through `SE_InitializePlugin` and then registers the `PluggableDeviceFactory`with the registered device type. The device type string will be used to access PluggableDevice with tf.device() in python layer. The subdevice type is used for low-level specialization of GPU device(kernel, StreamExecutor, common runtime, grapper, placer..). If the user cares whether he is running on Intel/NVIDIA GPU, he can call python API (such as `tf.config.list_physical_devices`) to get the subdevice type for identification. user can also use `tf.config.get_device_details` to get the real device name(e.g. "TITAN V")for the specified device.
127
+
During the plugin library initialization, TensorFlow proper calls the `SE_InitializePlugin` API (part of StreamExecutor C API) to retrieve nescessary informations from the plugin to instantiate a StreamExecutor platform([se::platform](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/platform.h#L93) class) and registers the platform to a global object [se::MultiPlatformManager](https://github.com/tensorflow/tensorflow/blob/cb32cf0f0160d1f582787119d0480de3ba8b9b53/tensorflow/stream_executor/multi_platform_manager.h#L82), TensorFlow proper gets a device type and a subdevice type from plugin through `SE_InitializePlugin` and then registers the `PluggableDeviceFactory`with the registered device type. The device type string will be used to access PluggableDevice with tf.device() in python layer. The subdevice type is used for low-level specialization of GPU device(kernel, StreamExecutor, common runtime, grapper, placer..). If the users care whether they are running on NVIDIA GPU or X GPU(Intel GPU, Amd GPU..), they can call python API (such as `tf.config.list_physical_devices`) to get the subdevice type for identification. user can also use `tf.config.get_device_details` to get the real device name(e.g. "TITAN V")for the specified device.
123
128
Plugin authors need to implement `SE_InitializePlugin` and provide the necessary informations:
std::string name = "My_GPU"; //StreamExecutor platform name && subdevice type
133
+
std::string name = "X_GPU"; //StreamExecutor platform name && subdevice type
129
134
std::string type = "GPU"; // device type
130
135
131
136
params.params.id = plugin_id_value;
@@ -277,20 +282,20 @@ Plugin authors need to provide those C functions implementation defined in Strea
277
282
### PluggableDevice kernel registration
278
283
279
284
This section shows an example of kernel registration for PluggableDevice. Kernel registration and implementation API is addressed in a separate [RFC](https://github.com/tensorflow/community/blob/master/rfcs/20190814-kernel-and-op-registration.md).
280
-
To avoid kernel registration conflict with existing GPU(CUDA) kernels, plugin author needs to provide a device type(such as "GPU") as well as a subdevice type(such as "INTEL_GPU") to TensorFlow proper for kernel registration and dispatch. The device type indicates the device the kernel runs on, the subdevice type is for low-level specialization of the device.
285
+
To avoid kernel registration conflict with existing GPU(CUDA) kernels, plugin author needs to provide a device type(such as "GPU") as well as a subdevice type(such as "X_GPU") to TensorFlow proper for kernel registration and dispatch. The device type indicates the device the kernel runs on, the subdevice type is for low-level specialization of the device.
0 commit comments