Skip to content

Commit 256e696

Browse files
[docs] add notes for stateful model changes (#3252)
* [docs] add notes for stateful model changes * Update docs/source/en/optimization/fp16.mdx Co-authored-by: Pedro Cuenca <[email protected]> * link to accelerate docs for discarding hooks --------- Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 329d1df commit 256e696

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

docs/source/en/optimization/fp16.mdx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,8 @@ image = pipe(prompt).images[0]
202202

203203
**Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information.
204204

205+
**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models.
206+
205207

206208
<a name="model_offloading"></a>
207209
## Model offloading for fast inference and memory savings
@@ -251,6 +253,11 @@ image = pipe(prompt).images[0]
251253
This feature requires `accelerate` version 0.17.0 or larger.
252254
</Tip>
253255

256+
**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload
257+
models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution
258+
if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
259+
for further docs on removing hooks.
260+
254261
## Using Channels Last memory format
255262

256263
Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.

0 commit comments

Comments
 (0)