Add image_processor #2617

yiyixuxu · 2023-03-09T07:36:11Z

added a VaeImageProcessor class that provides unified API for preprocessing and postprocessing of image inputs for pipelines

Original PR:
#2304

to-do:

refactor depth_to_image, ControlNet and pix2pix
improve tests (think we should move the relevant test to PipelineTesterMixin)

HuggingFaceDocBuilderDev · 2023-03-09T07:40:49Z

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py

patrickvonplaten · 2023-03-09T12:42:33Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py

@@ -674,7 +658,7 @@ def __call__(
        )

        # 4. Preprocess image
-        image = preprocess(image)
+        image = self.vae_feature_extractor.encode(image)


tests/test_image_processor.py

src/diffusers/image_processor.py

patrickvonplaten

Very cool Think the design is great! Just left a couple of comments to make the image processor class even a bit more robust.

Think once all the comments are treated we can apply the image processor also to depth_to_image and pix2pix pipeline and then add new tests to all three pipelines (img2img, depth_to_image, StableDiffusionControlNetPipeline, and pix2pix) to check that all different input & output combinations work)

patrickvonplaten · 2023-03-09T13:07:58Z

@pcuenca and @williamberman can you also take a look here?

Co-authored-by: Patrick von Platen <[email protected]>

pcuenca

Great job! Left a few comments / questions.

src/diffusers/image_processor.py

pcuenca · 2023-03-09T23:54:40Z

src/diffusers/image_processor.py

+        """
+        if images.ndim == 3:
+            images = images[..., None]
+        elif images.ndim == 5:


When does this happen?

from what I understand, we accept tensors in 3 forms:

with batch dimension ([B,C,H,W])

without the batch dimension ([C, H, W] )

a list of tensors with shape [C,H,W])

and same goes for numpy array too

the way code is written, we will get ndim=5 for tensors with batch dimension because we put it into a list and do torch.cat()

src/diffusers/image_processor.py

pcuenca · 2023-03-10T00:05:48Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py

-        image = image.cpu().permute(0, 2, 3, 1).float().numpy()
+        # image = image.cpu().permute(0, 2, 3, 1).float().numpy()


Is the return type different now?

yeah now it returns torch tensor here so if the output_type is pt it can stays in the device

pcuenca · 2023-03-10T00:15:23Z

src/diffusers/image_processor.py

+
+        return image
+
+    def decode(


Shouldn't we denormalize here (if appropriate), to return the range to [0, 1]?

currently, it's denormalized inside decode_latent - I think we can move it to the image processor, but I'm not sure how the decoding part of the image processor fits in the pipeline - I tried to refactor the img2img pipeline with it, but it seems that we can't abstract the post-processing away from the pipeline if we don't move the safty_checker to image processor

pcuenca · 2023-03-10T00:16:41Z

src/diffusers/image_processor.py

+        images = images.resize((w, h), resample=PIL_INTERPOLATION[self.resample])
+        return images
+
+    def encode(


I find it a bit confusing that these methods are called encode / decode, same as those of the autoencoder. Is this standard nomenclature we use in transformers, or elsewhere?

agreed - maybe preprocess is better

True preprocess and postprocess might be better here

src/diffusers/image_processor.py

src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py

patrickvonplaten

Very cool! Almost done - think we just have two final TODOs:

1.) Improve the error message in the processor slightly
2.) Remove the deprecation warning for now to not copy it in all other processors (sorry this was a bad call from my side)

Co-authored-by: Pedro Cuenca <[email protected]>

yiyixuxu · 2023-03-14T20:34:44Z

@patrickvonplaten let me know if the resize error message is ok now - I refactored the preprocess method a little bit, so now different input formats are processed in separate elif blocks with no shared processing - and I throw an error message for numpy and pytorch separately at where we would to apply resize if we were to support it

happy to change it if you think it is better to throw one error message. I think I've address all other comments and that's the last thing left

patrickvonplaten · 2023-03-15T16:36:14Z

src/diffusers/__init__.py

@@ -32,6 +32,7 @@
 except OptionalDependencyNotAvailable:
    from .utils.dummy_pt_objects import *  # noqa F403
 else:
+    from .image_processor import VaeImageProcessor


Let's not make it public fro now

src/diffusers/__init__.py

tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py

tests/pipelines/altdiffusion/test_alt_diffusion_img2img.py

patrickvonplaten · 2023-03-15T16:39:52Z

Great, I think we can merge this and then go fix the other pipelines in follow-up PRs. Think we just need to run a quick make style and this should be good to go.

patrickvonplaten

Great!

* add image_processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>

add image_processor

50615d3

yiyixuxu changed the title ~~add image_processor~~ [WIP] add image_processor Mar 9, 2023