Skip to content

[WIP]Vae preprocessor refactor (PR1) #3557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Jun 5, 2023
Merged

[WIP]Vae preprocessor refactor (PR1) #3557

merged 56 commits into from
Jun 5, 2023

Conversation

yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented May 25, 2023

VaeImageProcessor.preprocess refactor

  • refactored VaeImageProcessor a little bit
    • allow passing optional height and width argument to resize()
    • add convert_to_rgb
  • refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again
  • added a test in test_pipelines_common.py to test latents as image inputs
  • refactored img2img pipelines that accept latents as image: controlnet img2img, stable diffusion img2img , instruct_pix2pix

@yiyixuxu yiyixuxu changed the title Vae preprocessor Vae preprocessor refactor (PR1) May 25, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 25, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start! Can we maybe first merge the PR that adds VAE preprocess and then merge this one? Otherwise people will see lots of deprecation warnings 😅

@yiyixuxu yiyixuxu changed the title Vae preprocessor refactor (PR1) [WIP]Vae preprocessor refactor (PR1) May 26, 2023
else:
do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]

image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten I refactored the 4x upscaler here (just preprocess and postprocess, not accepting latents)
however I think I changed the logic of postprocess here, i.e. if output_type ==pt, currently it will return a pytorch tensor that's unnormalized, which is inconsistent with image_processor.postprocess. Let me know if we actually intend to return pytorch tensor between [-1,1] for this pipeline though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If previously, the latent upscaler was returning unnormalized tensors, I would prefer keeping it that way to avoid any unforeseen consequences?

Maybe, we could add a flag to image_processor.postprocess to check if normalization is needed? To me, that is a cleaner and more idiomatic approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu could we for now make sure that the output stays exactly the same. I.e. we should not change the behavior of the pipelines in any way IMO.

"""
Convert a PIL image or a list of PIL images to numpy image
"""
if not isinstance(images, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to also check if image is of type PIL.Image.Image?

Comment on lines +142 to +145
width, height = (
x - x % self.config.vae_scale_factor for x in (width, height)
) # resize to integer multiple of vae_scale_factor
image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Comment on lines +174 to +177
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True)
self.control_image_processor = VaeImageProcessor(
vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this coming to fruition.

Comment on lines +597 to +604
if (
not image_is_pil
and not image_is_tensor
and not image_is_np
and not image_is_pil_list
and not image_is_tensor_list
and not image_is_np_list
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of conditions hahaha.

Comment on lines -658 to +638

image = self.control_image_processor.preprocess(image, height=height, width=width).to(dtype=torch.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, all of this logic is handled by preprocess() now? That's amazing!

Comment on lines +38 to +42
warnings.warn(
"The preprocess method is deprecated and will be removed in a future version. Please"
" use VaeImageProcessor.preprocess instead",
FutureWarning,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, our plan is to refactor this in a future PR, yes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this function should be fully deprecated and removed in the future

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@@ -199,6 +204,28 @@ def test_stable_diffusion_pix2pix_euler(self):
def test_inference_batch_single_identical(self):
super().test_inference_batch_single_identical(expected_max_diff=3e-3)

# Overwrite the default test_latents_inputs because pix2pix encode the image differently
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this more or less. The final missing piece seems to be this: https://github.com/huggingface/diffusers/pull/3557/files#r1214586673

Can we make sure that we don't change the output behavior in any way?

This reverts commit 0ca3473.
@yiyixuxu
Copy link
Collaborator Author

yiyixuxu commented Jun 2, 2023

@patrickvonplaten
reverted changes I made to x4 upscaler and created a separate issue here #3654

@patrickvonplaten
Copy link
Contributor

Having changed this: #3557 (comment) I think we can merge this PR 🥳

…sion_latent_upscale.py

Co-authored-by: Patrick von Platen <[email protected]>
@yiyixuxu yiyixuxu merged commit 5990014 into main Jun 5, 2023
@patrickvonplaten patrickvonplaten deleted the vae-preprocessor branch June 6, 2023 09:20
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
VaeImageProcessor.preprocess refactor

* refactored VaeImageProcessor 
   -  allow passing optional height and width argument to resize()
   - add convert_to_rgb
* refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again
* added a test in test_pipelines_common.py to test latents as image inputs
* refactored img2img pipelines that accept latents as image: 
   - controlnet img2img, stable diffusion img2img , instruct_pix2pix

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
VaeImageProcessor.preprocess refactor

* refactored VaeImageProcessor 
   -  allow passing optional height and width argument to resize()
   - add convert_to_rgb
* refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again
* added a test in test_pipelines_common.py to test latents as image inputs
* refactored img2img pipelines that accept latents as image: 
   - controlnet img2img, stable diffusion img2img , instruct_pix2pix

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants