-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Add image_processor #2617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image_processor #2617
Conversation
The documentation is not available anymore as the PR was closed or merged. |
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
Outdated
Show resolved
Hide resolved
@@ -674,7 +658,7 @@ def __call__( | |||
) | |||
|
|||
# 4. Preprocess image | |||
image = preprocess(image) | |||
image = self.vae_feature_extractor.encode(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool Think the design is great! Just left a couple of comments to make the image processor class even a bit more robust.
Think once all the comments are treated we can apply the image processor also to depth_to_image and pix2pix pipeline and then add new tests to all three pipelines (img2img, depth_to_image, StableDiffusionControlNetPipeline, and pix2pix) to check that all different input & output combinations work)
@pcuenca and @williamberman can you also take a look here? |
Co-authored-by: Patrick von Platen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Left a few comments / questions.
src/diffusers/image_processor.py
Outdated
""" | ||
if images.ndim == 3: | ||
images = images[..., None] | ||
elif images.ndim == 5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I understand, we accept tensors in 3 forms:
- with batch dimension (
[B,C,H,W]
) - without the batch dimension (
[C, H, W]
) - a list of tensors with shape
[C,H,W]
)
and same goes for numpy array too
the way code is written, we will get ndim=5
for tensors with batch dimension because we put it into a list and do torch.cat()
image = image.cpu().permute(0, 2, 3, 1).float().numpy() | ||
# image = image.cpu().permute(0, 2, 3, 1).float().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the return type different now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah now it returns torch tensor here so if the output_type
is pt
it can stays in the device
src/diffusers/image_processor.py
Outdated
|
||
return image | ||
|
||
def decode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we denormalize here (if appropriate), to return the range to [0, 1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, it's denormalized inside decode_latent
- I think we can move it to the image processor, but I'm not sure how the decoding part of the image processor fits in the pipeline - I tried to refactor the img2img pipeline with it, but it seems that we can't abstract the post-processing away from the pipeline if we don't move the safty_checker to image processor
src/diffusers/image_processor.py
Outdated
images = images.resize((w, h), resample=PIL_INTERPOLATION[self.resample]) | ||
return images | ||
|
||
def encode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit confusing that these methods are called encode
/ decode
, same as those of the autoencoder. Is this standard nomenclature we use in transformers, or elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed - maybe preprocess
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True preprocess
and postprocess
might be better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Almost done - think we just have two final TODOs:
- 1.) Improve the error message in the processor slightly
- 2.) Remove the deprecation warning for now to not copy it in all other processors (sorry this was a bad call from my side)
Co-authored-by: Pedro Cuenca <[email protected]>
@patrickvonplaten let me know if the resize error message is ok now - I refactored the preprocess method a little bit, so now different input formats are processed in separate happy to change it if you think it is better to throw one error message. I think I've address all other comments and that's the last thing left |
src/diffusers/__init__.py
Outdated
@@ -32,6 +32,7 @@ | |||
except OptionalDependencyNotAvailable: | |||
from .utils.dummy_pt_objects import * # noqa F403 | |||
else: | |||
from .image_processor import VaeImageProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not make it public fro now
tests/pipelines/stable_diffusion/test_stable_diffusion_img2img.py
Outdated
Show resolved
Hide resolved
Great, I think we can merge this and then go fix the other pipelines in follow-up PRs. Think we just need to run a quick make style and this should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
* add image_processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
* add image_processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
* add image_processor --------- Co-authored-by: yiyixuxu <yixu310@gmail,com> Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
added a
VaeImageProcessor
class that provides unified API for preprocessing and postprocessing of image inputs for pipelinesOriginal PR:
#2304
to-do:
PipelineTesterMixin
)