-
Notifications
You must be signed in to change notification settings - Fork 6.1k
[WIP]Vae preprocessor refactor (PR1) #3557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start! Can we maybe first merge the PR that adds VAE preprocess and then merge this one? Otherwise people will see lots of deprecation warnings 😅
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
else: | ||
do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept] | ||
|
||
image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvonplaten I refactored the 4x upscaler here (just preprocess
and postprocess
, not accepting latents)
however I think I changed the logic of postprocess
here, i.e. if output_type ==pt
, currently it will return a pytorch tensor that's unnormalized, which is inconsistent with image_processor.postprocess
. Let me know if we actually intend to return pytorch tensor between [-1,1]
for this pipeline though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If previously, the latent upscaler was returning unnormalized tensors, I would prefer keeping it that way to avoid any unforeseen consequences?
Maybe, we could add a flag to image_processor.postprocess
to check if normalization is needed? To me, that is a cleaner and more idiomatic approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu could we for now make sure that the output stays exactly the same. I.e. we should not change the behavior of the pipelines in any way IMO.
""" | ||
Convert a PIL image or a list of PIL images to numpy image | ||
""" | ||
if not isinstance(images, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to also check if image
is of type PIL.Image.Image
?
width, height = ( | ||
x - x % self.config.vae_scale_factor for x in (width, height) | ||
) # resize to integer multiple of vae_scale_factor | ||
image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True) | ||
self.control_image_processor = VaeImageProcessor( | ||
vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see this coming to fruition.
if ( | ||
not image_is_pil | ||
and not image_is_tensor | ||
and not image_is_np | ||
and not image_is_pil_list | ||
and not image_is_tensor_list | ||
and not image_is_np_list | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of conditions hahaha.
|
||
image = self.control_image_processor.preprocess(image, height=height, width=width).to(dtype=torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, all of this logic is handled by preprocess()
now? That's amazing!
warnings.warn( | ||
"The preprocess method is deprecated and will be removed in a future version. Please" | ||
" use VaeImageProcessor.preprocess instead", | ||
FutureWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, our plan is to refactor this in a future PR, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this function should be fully deprecated and removed in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Co-authored-by: Sayak Paul <[email protected]>
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Outdated
Show resolved
Hide resolved
@@ -199,6 +204,28 @@ def test_stable_diffusion_pix2pix_euler(self): | |||
def test_inference_batch_single_identical(self): | |||
super().test_inference_batch_single_identical(expected_max_diff=3e-3) | |||
|
|||
# Overwrite the default test_latents_inputs because pix2pix encode the image differently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can merge this more or less. The final missing piece seems to be this: https://github.com/huggingface/diffusers/pull/3557/files#r1214586673
Can we make sure that we don't change the output behavior in any way?
This reverts commit 0ca3473.
@patrickvonplaten |
Having changed this: #3557 (comment) I think we can merge this PR 🥳 |
…sion_latent_upscale.py Co-authored-by: Patrick von Platen <[email protected]>
VaeImageProcessor.preprocess refactor * refactored VaeImageProcessor - allow passing optional height and width argument to resize() - add convert_to_rgb * refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again * added a test in test_pipelines_common.py to test latents as image inputs * refactored img2img pipelines that accept latents as image: - controlnet img2img, stable diffusion img2img , instruct_pix2pix --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
VaeImageProcessor.preprocess refactor * refactored VaeImageProcessor - allow passing optional height and width argument to resize() - add convert_to_rgb * refactored prepare_latents method for img2img pipelines so that if we pass latents directly as image input, it will not encode it again * added a test in test_pipelines_common.py to test latents as image inputs * refactored img2img pipelines that accept latents as image: - controlnet img2img, stable diffusion img2img , instruct_pix2pix --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Sayak Paul <[email protected]>
VaeImageProcessor.preprocess refactor
VaeImageProcessor
a little bitheight
andwidth
argument toresize()
convert_to_rgb
prepare_latents
method for img2img pipelines so that if we pass latents directly as image input, it will not encode it againtest_pipelines_common.py
to test latents as image inputsimage
: controlnet img2img, stable diffusion img2img , instruct_pix2pix