Create VAE feature extractor class

We currently do not have a unified API that makes sure that for pipelines that accept image inputs that image input and output format always stay the same. This PR shows that nicely: https://github.com/huggingface/diffusers/issues/1882#issuecomment-1416117217

We should make sure that:

- a) All pipelines that accept images, can treat images of type PIL, numpy and torch
- b) if images are passed in numpy or torch, then the input format (image scale) should match 1-to-1 the output format
- c) Pipelines should be able to return images in PT format besides PIL and numpy so that one can run multiple image-to-image generations on GPU
- d) there is **a lot** of boiler plate code around "preparing images" and "preparing masks" => we should unify this code in a feature extractor as it's usually pretty much always the same
- e) Test that pipelines give the same results for all image inputs

This change will require to open a more involved PR, but it's time to tackle this! It would greatly help users that use img-2-img to make movies etc...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create VAE feature extractor class #2304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create VAE feature extractor class #2304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions