Skip to content

[docs] add notes for stateful model changes #3252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/en/optimization/fp16.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,8 @@ image = pipe(prompt).images[0]

**Note**: When using `enable_sequential_cpu_offload()`, it is important to **not** move the pipeline to CUDA beforehand or else the gain in memory consumption will only be minimal. See [this issue](https://github.com/huggingface/diffusers/issues/1934) for more information.

**Note**: `enable_sequential_cpu_offload()` is a stateful operation that installs hooks on the models.


<a name="model_offloading"></a>
## Model offloading for fast inference and memory savings
Expand Down Expand Up @@ -251,6 +253,11 @@ image = pipe(prompt).images[0]
This feature requires `accelerate` version 0.17.0 or larger.
</Tip>

**Note**: `enable_model_cpu_offload()` is a stateful operation that installs hooks on the models and state on the pipeline. In order to properly offload
models after they are called, it is required that the entire pipeline is run and models are called in the order the pipeline expects them to be. Exercise caution
if models are re-used outside the context of the pipeline after hooks have been installed. See [accelerate](https://huggingface.co/docs/accelerate/v0.18.0/en/package_reference/big_modeling#accelerate.hooks.remove_hook_from_module)
for further docs on removing hooks.

## Using Channels Last memory format

Channels last memory format is an alternative way of ordering NCHW tensors in memory preserving dimensions ordering. Channels last tensors ordered in such a way that channels become the densest dimension (aka storing images pixel-per-pixel). Since not all operators currently support channels last format it may result in a worst performance, so it's better to try it and see if it works for your model.
Expand Down