Skip to content

Create VAE feature extractor class #2304

@patrickvonplaten

Description

@patrickvonplaten

We currently do not have a unified API that makes sure that for pipelines that accept image inputs that image input and output format always stay the same. This PR shows that nicely: #1882 (comment)

We should make sure that:

  • a) All pipelines that accept images, can treat images of type PIL, numpy and torch
  • b) if images are passed in numpy or torch, then the input format (image scale) should match 1-to-1 the output format
  • c) Pipelines should be able to return images in PT format besides PIL and numpy so that one can run multiple image-to-image generations on GPU
  • d) there is a lot of boiler plate code around "preparing images" and "preparing masks" => we should unify this code in a feature extractor as it's usually pretty much always the same
  • e) Test that pipelines give the same results for all image inputs

This change will require to open a more involved PR, but it's time to tackle this! It would greatly help users that use img-2-img to make movies etc...

Metadata

Metadata

Assignees

Labels

staleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions