Skip to content

Conversation

@gumgood
Copy link
Contributor

@gumgood gumgood commented Aug 8, 2024

What does this PR do?

Fixed the issue preventing model and inference with StableDiffusionXLInpaintPipeline class in UNet where in_channels=9.

1. When loading models of the StableDiffusionXLInpaintPipeline class like diffusers/stable-diffusion-xl-1.0-inpainting-0.1, the load fails if enable_pag=True.

error:

File /hdd/repo/diffusers/src/diffusers/pipelines/auto_pipeline.py:958, in AutoPipelineForInpainting.from_pretrained(cls, pretrained_model_or_path, **kwargs)
    954     if enable_pag:
    955         orig_class_name = config["_class_name"].replace("InpaintPipeline",
    956                                                         "PAG3InpaintPipeline")
--> 958 inpainting_cls = _get_task_class(AUTO_INPAINT_PIPELINES_MAPPING, orig_class_name)
    960 kwargs = {**load_config_kwargs, **kwargs}
    961 return inpainting_cls.from_pretrained(pretrained_model_or_path, **kwargs)
...
    198 if throw_error_if_not_exist:
--> 199     raise ValueError(
    200         f"AutoPipeline can't find a pipeline linked to {pipeline_class_name} for {model_name}")

ValueError: AutoPipeline can't find a pipeline linked to StableDiffusionXLPAG3InpaintPipeline for None

Fixed to find the correct class StableDiffusionXLPAGInpaintPipeline.

2. When calling __call__ with a model where in_channels=9 in the UNet, latent_model_input is not created correctly

error:

File /hdd/repo/diffusers/src/diffusers/pipelines/pag/pipeline_pag_sd_xl_inpaint.py:1623, in StableDiffusionXLPAGInpaintPipeline.__call__(self, prompt, prompt_2, image, mask_image, masked_image_latents, height, width, padding_mask_crop, strength, num_inference_steps, timesteps, sigmas, denoising_start, denoising_end, guidance_scale, negative_prompt, negative_prompt_2, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, ip_adapter_image, ip_adapter_image_embeds, output_type, return_dict, cross_attention_kwargs, guidance_rescale, original_size, crops_coords_top_left, target_size, negative_original_size, negative_crops_coords_top_left, negative_target_size, aesthetic_score, negative_aesthetic_score, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, pag_scale, pag_adaptive_scale)
   1620 latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
   1622 if num_channels_unet == 9:
-> 1623     latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1)
   1625 # predict the noise residual
   1626 added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids}

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 3 but got size 2 for tensor number 1 in the list.

The batch size of mask and masked_image_latents differs from that of latent_model_input. This has been fixed.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul @yiyixuxu @DN6

@gumgood gumgood force-pushed the fix-pag-inpaint-pipeline branch from 2d7ff3b to 12bf313 Compare August 11, 2024 17:42
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@DN6
Copy link
Collaborator

DN6 commented Aug 12, 2024

Hi @gumgood PR looks good. Could you please run make style && make quality so the QC checks pass on the CI.

@gumgood gumgood force-pushed the fix-pag-inpaint-pipeline branch from 9d7ef91 to ff5fd9d Compare August 13, 2024 00:16
@gumgood
Copy link
Contributor Author

gumgood commented Aug 13, 2024

@DN6 Thank you for the review. I've applied the style changes. Before applying the style, I ran the tests and fixed a few additional issues:

  1. Fixed the *Pipeline model to load as *PAGInpaintPipeline.
  2. Fixed batch size of init_mask in the 4-channel UNet.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! I left a comment

return image_latents

# Copied from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_inpaint.StableDiffusionXLInpaintPipeline.prepare_mask_latents
# Modified from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_inpaint.StableDiffusionXLInpaintPipeline.prepare_mask_latents
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! is it possible to keep this method Copied from but modify the output instead? e.g. this is how we modify ip_adapter_image_embeds for PAG https://github.com/huggingface/diffusers/blob/f848febacdc54c351ed0ed23fcc4c9349828021e/src/diffusers/pipelines/pag/pipeline_pag_sd_xl_inpaint.py#L1558C1-L1559C1

it's a little bit weird and requires more code but this way it will be easier for us to maintain all the PAG pipelines

in the future, we should make sure these methods that prepare conditions for the denoiser model always return negative condition and condition instead of a combined output

enable_pag = kwargs.pop("enable_pag")
if enable_pag:
orig_class_name = config["_class_name"].replace("Pipeline", "PAGPipeline")
orig_class_name = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do this, the orig_class_name for StableDiffusionXLPipeline would be StableDiffusionXLPAGInpaintPipeline too - in this case will get the correct result regardless but logic is incorrect

the orig_class_name for StableDiffusionXLPipeline should be StableDiffusionXLPAGPipeline and then we map it to the corresponding to the inpainting class through _get_task_class(...) and get StableDiffusionXLPAGInpaintPipeline

we can maybe do something like

to_replace = "InpaintPipeline" if "Inpaint` in config["_class_name"] else "Pipeline"
orig_class_name = config["_class_name"].replace(to_replace, "PAG" + to_replace)

Copy link
Contributor Author

@gumgood gumgood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu thank you! I have made updates according to the suggestion.

.replace("Pipeline", "PAGInpaintPipeline")
)
to_replace = "InpaintPipeline" if "Inpaint" in config["_class_name"] else "Pipeline"
orig_class_name = config["_class_name"].replace(to_replace, "PAG" + to_replace)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do this, the orig_class_name for StableDiffusionXLPipeline would be StableDiffusionXLPAGInpaintPipeline too - in this case will get the correct result regardless but logic is incorrect

the orig_class_name for StableDiffusionXLPipeline should be StableDiffusionXLPAGPipeline and then we map it to the corresponding to the inpainting class through _get_task_class(...) and get StableDiffusionXLPAGInpaintPipeline

we can maybe do something like

to_replace = "InpaintPipeline" if "Inpaint` in config["_class_name"] else "Pipeline"
orig_class_name = config["_class_name"].replace(to_replace, "PAG" + to_replace)

Nice code. I have appied it and verified that it works well

mask = self._prepare_perturbed_attention_guidance(mask, mask, self.do_classifier_free_guidance)
masked_image_latents = self._prepare_perturbed_attention_guidance(
masked_image_latents, masked_image_latents, self.do_classifier_free_guidance
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to handle batch size without modifying the prepare_mask_latents function.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu yiyixuxu merged commit 16a3dad into huggingface:main Aug 20, 2024
@gumgood gumgood deleted the fix-pag-inpaint-pipeline branch August 22, 2024 13:20
@gumgood gumgood restored the fix-pag-inpaint-pipeline branch August 22, 2024 13:20
@gumgood gumgood deleted the fix-pag-inpaint-pipeline branch August 22, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants