Skip to content

feat: allow disk offload for diffuser models #3285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 85 commits into from
May 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
8511176
allow disk offload for diffuser models
hari10599 Apr 29, 2023
26e322d
sort import
hari10599 Apr 29, 2023
6653047
add max_memory argument
hari10599 May 20, 2023
87c3968
Changed sample[0] to images[0] (#3304)
IliaLarchenko May 1, 2023
4a16aab
Typo in tutorial (#3295)
IliaLarchenko May 1, 2023
73e9c24
Torch compile graph fix (#3286)
patrickvonplaten May 1, 2023
1786d33
Postprocessing refactor img2img (#3268)
yiyixuxu May 1, 2023
05c6060
[Torch 2.0 compile] Fix more torch compile breaks (#3313)
patrickvonplaten May 2, 2023
7a5811b
fix: scale_lr and sync example readme and docs. (#3299)
sayakpaul May 3, 2023
f7b042c
Update stable_diffusion.mdx (#3310)
mu94-csl May 3, 2023
2d3067e
Fix missing variable assign in DeepFloyd-IF-II (#3315)
gitmylo May 3, 2023
add3f97
Correct doc build for patch releases (#3316)
patrickvonplaten May 3, 2023
7be5b0a
Add Stable Diffusion RePaint to community pipelines (#3320)
Markus-Pobitzer May 3, 2023
22bfb08
Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-i…
LuChengTHU May 3, 2023
ab0f138
[docs] Improve LoRA docs (#3311)
stevhliu May 4, 2023
4f32c18
Added input pretubation (#3292)
isamu-isozaki May 4, 2023
5faa00f
Update write_own_pipeline.mdx (#3323)
csaybar May 4, 2023
6316e88
update controlling generation doc with latest goodies. (#3321)
sayakpaul May 5, 2023
ffe3738
[Quality] Make style (#3341)
patrickvonplaten May 5, 2023
880a83b
Fix config dpm (#3343)
patrickvonplaten May 5, 2023
ae6444d
Add the SDE variant of DPM-Solver and DPM-Solver++ (#3344)
LuChengTHU May 5, 2023
9532719
Add upsample_size to AttnUpBlock2D, AttnDownBlock2D (#3275)
will-rice May 5, 2023
7315d6d
Rename --only_save_embeds to --save_as_full_pipeline (#3206)
arrufat May 6, 2023
de96990
[AudioLDM] Generalise conversion script (#3328)
sanchit-gandhi May 6, 2023
7161fbd
Fix TypeError when using prompt_embeds and negative_prompt (#2982)
At-sushi May 6, 2023
75ed789
Fix pipeline class on README (#3345)
themrzmaster May 6, 2023
3572fd8
Inpainting: typo in docs (#3331)
LysandreJik May 6, 2023
5c51eab
Add `use_Karras_sigmas` to LMSDiscreteScheduler (#3351)
Isotr0py May 6, 2023
9c6ae6c
Batched load of textual inversions (#3277)
pdoane May 8, 2023
bd9cf76
make fix-copies
patrickvonplaten May 8, 2023
361b62f
[docs] Fix docstring (#3334)
stevhliu May 8, 2023
92c2b0c
if dreambooth lora (#3360)
williamberman May 9, 2023
39e6998
Postprocessing refactor all others (#3337)
yiyixuxu May 9, 2023
fc4e4f4
[docs] Improve safetensors docstring (#3368)
stevhliu May 9, 2023
af6e35a
add: a warning message when using xformers in a PT 2.0 env. (#3365)
sayakpaul May 10, 2023
7fa90fb
StableDiffusionInpaintingPipeline - resize image w.r.t height and wid…
rupertmenneer May 10, 2023
45b86c9
make style
patrickvonplaten May 10, 2023
dbe3316
[docs] Adapt a model (#3326)
stevhliu May 10, 2023
6174325
[docs] Load safetensors (#3333)
stevhliu May 11, 2023
9c99e8f
make style
patrickvonplaten May 11, 2023
1e49073
[Docs] Fix stable_diffusion.mdx typo (#3398)
sudowind May 11, 2023
7275de1
Support ControlNet v1.1 shuffle properly (#3340)
takuma104 May 11, 2023
6e4b195
[Tests] better determinism (#3374)
sayakpaul May 11, 2023
5b8ba3a
[docs] Add transformers to install (#3388)
stevhliu May 11, 2023
3bab713
[deepspeed] partial ZeRO-3 support (#3076)
stas00 May 11, 2023
8b66534
Add omegaconf for tests (#3400)
patrickvonplaten May 11, 2023
574b6c8
Fix various bugs with LoRA Dreambooth and Dreambooth script (#3353)
patrickvonplaten May 11, 2023
137a7a3
Fix docker file (#3402)
patrickvonplaten May 11, 2023
2458119
fix: deepseepd_plugin retrieval from accelerate state (#3410)
sayakpaul May 12, 2023
ce28477
[Docs] Add `sigmoid` beta_scheduler to docstrings of relevant Schedul…
Laurent2916 May 12, 2023
c54bea1
Don't install accelerate and transformers from source (#3415)
patrickvonplaten May 12, 2023
70090a1
Don't install transformers and accelerate from source (#3414)
patrickvonplaten May 12, 2023
0cb751c
Improve fast tests (#3416)
patrickvonplaten May 12, 2023
0f3ceda
attention refactor: the trilogy (#3387)
williamberman May 12, 2023
42c4bfe
[Docs] update the PT 2.0 optimization doc with latest findings (#3370)
sayakpaul May 13, 2023
341f907
Fix style rendering (#3433)
pcuenca May 15, 2023
a1ce3e7
unCLIP scheduler do not use note (#3417)
williamberman May 15, 2023
fff7660
Replace deprecated command with environment file (#3409)
jongwooo May 16, 2023
2592d97
fix warning message pipeline loading (#3446)
patrickvonplaten May 16, 2023
728db02
add stable diffusion tensorrt img2img pipeline (#3419)
asfiyab-nvidia May 16, 2023
a2b6478
Refactor controlnet and add img2img and inpaint (#3386)
patrickvonplaten May 16, 2023
34e8868
[Scheduler] DPM-Solver (++) Inverse Scheduler (#3335)
clarencechen May 16, 2023
83280ac
[Docs] Fix incomplete docstring for resnet.py (#3438)
Laurent2916 May 16, 2023
f84485b
fix tiled vae blend extent range (#3384)
superlabs-dev May 16, 2023
bf0b0e3
Small update to "Next steps" section (#3443)
pcuenca May 16, 2023
fa9a44a
Allow arbitrary aspect ratio in IFSuperResolutionPipeline (#3298)
devxpy May 17, 2023
8804fee
Adding 'strength' parameter to StableDiffusionInpaintingPipeline (#3…
rupertmenneer May 17, 2023
2306dc4
[WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline i…
vimarshc May 17, 2023
3271531
Fix gradient checkpointing bugs in freezing part of models (requires_…
IrisRainbowNeko May 17, 2023
0bde5e4
Make dreambooth lora more robust to orig unet (#3462)
patrickvonplaten May 17, 2023
1e67788
Reduce peak VRAM by releasing large attention tensors (as soon as the…
cmdr2 May 17, 2023
e3cfd35
Add min snr to text2img lora training script (#3459)
wfng92 May 17, 2023
78771bf
Add inpaint lora scale support (#3460)
Glaceon-Hyy May 17, 2023
4fd6e7b
[From ckpt] Fix from_ckpt (#3466)
patrickvonplaten May 17, 2023
327b94b
Update full dreambooth script to work with IF (#3425)
williamberman May 17, 2023
13b2226
Add IF dreambooth docs (#3470)
williamberman May 17, 2023
74e6eb6
parameterize pass single args through tuple (#3477)
williamberman May 18, 2023
0691dec
attend and excite tests disable determinism on the class level (#3478)
williamberman May 18, 2023
b05d5b4
dreambooth docs torch.compile note (#3471)
williamberman May 19, 2023
7683b56
add: if entry in the dreambooth training docs. (#3472)
sayakpaul May 19, 2023
0562e74
[docs] Textual inversion inference (#3473)
stevhliu May 19, 2023
0e1f339
[docs] Distributed inference (#3376)
stevhliu May 19, 2023
573b5d4
[{Up,Down}sample1d] explicit view kernel size as number elements in f…
williamberman May 19, 2023
dfc2549
mps & onnx tests rework (#3449)
pcuenca May 20, 2023
49a56d0
Merge branch 'huggingface:main' into hkr/disk-offload
hari10599 May 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion src/diffusers/models/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,15 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For
more information about each option see [designing a device
map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map).
max_memory (`Dict`, *optional*):
A dictionary device identifier to maximum memory. Will default to the maximum memory available for each
GPU and the available CPU RAM if unset.
offload_folder (`str` or `os.PathLike`, *optional*):
If the `device_map` contains any value `"disk"`, the folder where we will offload weights.
offload_state_dict (`bool`, *optional*):
If `True`, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU
RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to
`True` when there is some disk offload.
low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`):
Speed up model loading by not initializing the weights and only loading the pre-trained weights. This
also tries to not use more than 1x model size in CPU memory (including peak memory) while loading the
Expand Down Expand Up @@ -439,6 +448,9 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
torch_dtype = kwargs.pop("torch_dtype", None)
subfolder = kwargs.pop("subfolder", None)
device_map = kwargs.pop("device_map", None)
max_memory = kwargs.pop("max_memory", None)
offload_folder = kwargs.pop("offload_folder", None)
offload_state_dict = kwargs.pop("offload_state_dict", False)
low_cpu_mem_usage = kwargs.pop("low_cpu_mem_usage", _LOW_CPU_MEM_USAGE_DEFAULT)
variant = kwargs.pop("variant", None)
use_safetensors = kwargs.pop("use_safetensors", None)
Expand Down Expand Up @@ -510,6 +522,9 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
revision=revision,
subfolder=subfolder,
device_map=device_map,
max_memory=max_memory,
offload_folder=offload_folder,
offload_state_dict=offload_state_dict,
user_agent=user_agent,
**kwargs,
)
Expand Down Expand Up @@ -614,7 +629,15 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
else: # else let accelerate handle loading and dispatching.
# Load weights and dispatch according to the device_map
# by default the device_map is None and the weights are loaded on the CPU
accelerate.load_checkpoint_and_dispatch(model, model_file, device_map, dtype=torch_dtype)
accelerate.load_checkpoint_and_dispatch(
model,
model_file,
device_map,
max_memory=max_memory,
offload_folder=offload_folder,
offload_state_dict=offload_state_dict,
dtype=torch_dtype,
)

loading_info = {
"missing_keys": [],
Expand Down
21 changes: 21 additions & 0 deletions src/diffusers/pipelines/pipeline_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,9 @@ def load_sub_model(
provider: Any,
sess_options: Any,
device_map: Optional[Union[Dict[str, torch.device], str]],
max_memory: Optional[Dict[Union[int, str], Union[int, str]]],
offload_folder: Optional[Union[str, os.PathLike]],
offload_state_dict: bool,
model_variants: Dict[str, str],
name: str,
from_flax: bool,
Expand Down Expand Up @@ -416,6 +419,9 @@ def load_sub_model(
# This makes sure that the weights won't be initialized which significantly speeds up loading.
if is_diffusers_model or is_transformers_model:
loading_kwargs["device_map"] = device_map
loading_kwargs["max_memory"] = max_memory
loading_kwargs["offload_folder"] = offload_folder
loading_kwargs["offload_state_dict"] = offload_state_dict
loading_kwargs["variant"] = model_variants.pop(name, None)
if from_flax:
loading_kwargs["from_flax"] = True
Expand Down Expand Up @@ -808,6 +814,15 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For
more information about each option see [designing a device
map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map).
max_memory (`Dict`, *optional*):
A dictionary device identifier to maximum memory. Will default to the maximum memory available for each
GPU and the available CPU RAM if unset.
offload_folder (`str` or `os.PathLike`, *optional*):
If the `device_map` contains any value `"disk"`, the folder where we will offload weights.
offload_state_dict (`bool`, *optional*):
If `True`, will temporarily offload the CPU state dict to the hard drive to avoid getting out of CPU
RAM if the weight of the CPU state dict + the biggest shard of the checkpoint does not fit. Defaults to
`True` when there is some disk offload.
low_cpu_mem_usage (`bool`, *optional*, defaults to `True` if torch version >= 1.9.0 else `False`):
Speed up model loading by not initializing the weights and only loading the pre-trained weights. This
also tries to not use more than 1x model size in CPU memory (including peak memory) while loading the
Expand Down Expand Up @@ -873,6 +888,9 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
provider = kwargs.pop("provider", None)
sess_options = kwargs.pop("sess_options", None)
device_map = kwargs.pop("device_map", None)
max_memory = kwargs.pop("max_memory", None)
offload_folder = kwargs.pop("offload_folder", None)
offload_state_dict = kwargs.pop("offload_state_dict", False)
low_cpu_mem_usage = kwargs.pop("low_cpu_mem_usage", _LOW_CPU_MEM_USAGE_DEFAULT)
variant = kwargs.pop("variant", None)
use_safetensors = kwargs.pop("use_safetensors", None if is_safetensors_available() else False)
Expand Down Expand Up @@ -1046,6 +1064,9 @@ def load_module(name, value):
provider=provider,
sess_options=sess_options,
device_map=device_map,
max_memory=max_memory,
offload_folder=offload_folder,
offload_state_dict=offload_state_dict,
model_variants=model_variants,
name=name,
from_flax=from_flax,
Expand Down