You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[docs] add notes for stateful model changes (#3252)
* [docs] add notes for stateful model changes
* Update docs/source/en/optimization/fp16.mdx
Co-authored-by: Pedro Cuenca <[email protected]>
* link to accelerate docs for discarding hooks
---------
Co-authored-by: Pedro Cuenca <[email protected]>
**Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information.
204
204
205
+
**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models.
206
+
205
207
206
208
<aname="model_offloading"></a>
207
209
## Model offloading for fast inference and memory savings
This feature requires `accelerate` version 0.17.0 or larger.
252
254
</Tip>
253
255
256
+
**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload
257
+
models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution
258
+
if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
259
+
for further docs on removing hooks.
260
+
254
261
## Using Channels Last memory format
255
262
256
263
Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.
0 commit comments