Skip to content

StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Aug 21, 2024

Conversation

rootonchair
Copy link
Contributor

What does this PR do?

Fixes #8895

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu

@rootonchair
Copy link
Contributor Author

Test script

from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")

upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16)
upscaler.to("cuda")

prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
generator = torch.manual_seed(33)

# we stay in latent space! Let's make sure that Stable Diffusion returns the image
# in latent space
low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images

upscaled_image = upscaler(
    prompt=prompt,
    image=low_res_latents,
    num_inference_steps=20,
    guidance_scale=0,
    generator=generator,
).images[0]

# Let's save the upscaled image under "upscaled_astronaut.png"
upscaled_image.save("astronaut_1024.png")

prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds = upscaler.encode_prompt(
    prompt=prompt,
    device=upscaler._execution_device,
    do_classifier_free_guidance=False,
)

upscaled_image = upscaler(
    image=low_res_latents,
    num_inference_steps=20,
    guidance_scale=0,
    generator=generator,
    prompt_embeds=prompt_embeds,
    pooled_prompt_embeds=pooled_prompt_embeds,
).images[0]

upscaled_image.save("embeds_astronaut_1024.png")

# as a comparison: Let's also save the low-res image
with torch.no_grad():
    image = pipeline.decode_latents(low_res_latents)
image = pipeline.numpy_to_pil(image)[0]

image.save("astronaut_512.png")

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh thanks! I think it looks really nice!

@yiyixuxu yiyixuxu requested a review from sayakpaul July 25, 2024 05:16
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Could you also share some results stemming from this change?

Let's also add a fast test for this?

@rootonchair
Copy link
Contributor Author

Could you also share some results stemming from this change?

Sure
Here is the original result
org_astronaut_1024

And result using embeds
astronaut_1024

which does not have much different

Let's also add a fast test for this?

Should I open another PR for this @sayakpaul ?

@sayakpaul
Copy link
Member

Should I open another PR for this @sayakpaul ?

We can do this in this PR

@rootonchair
Copy link
Contributor Author

Hi @sayakpaul, could you help me review this PR?

@rootonchair rootonchair requested a review from sayakpaul August 12, 2024 15:35
@@ -0,0 +1,218 @@
# coding=utf-8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh actually I think the tests for latent upscaler is here https://github.com/huggingface/diffusers/blob/main/tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py

maybe we add new tests there? very sorry for all the additional work to create a new test from scratch

@rootonchair
Copy link
Contributor Author

@yiyixuxu I have just update the existing test. Could you help me review? Btw, adding new test is kinda interesting too. So no worries

@@ -173,11 +173,51 @@ def test_inference(self):

self.assertEqual(image.shape, (1, 256, 256, 3))
expected_slice = np.array(
[0.47222412, 0.41921633, 0.44717434, 0.46874192, 0.42588258, 0.46150726, 0.4677534, 0.45583832, 0.48579055]
[0.3970313, 0.3768756, 0.41147298, 0.4716793, 0.5115408, 0.44601366, 0.43763855, 0.46781355, 0.46358708]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to update the expected_slice here? the results of existing tests should not change, no?

if image.shape[1] == 3:
# encode image if not in latent-space yet
image = self.vae.encode(image).latent_dist.sample() * self.vae.config.scaling_factor
image = retrieve_latents(self.vae.encode(image), generator=generator) * self.vae.config.scaling_factor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu I think it's due to this line. The old code does not take in generator

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we got the order of the negative_prompt_embeds and prompt_embeds reversed, that's why the previous test wasn't passing. Let's make the change, and change the test back and make sure it passes :)

@rootonchair
Copy link
Contributor Author

@yiyixuxu thanks for the catch! Sorry I did not check it thoroughly

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu yiyixuxu merged commit 867e0c9 into huggingface:main Aug 21, 2024
15 checks passed
sayakpaul added a commit that referenced this pull request Dec 23, 2024
…s support (#8947)

* make latent upscaler accept prompt embeds

---------

Co-authored-by: Dhruv Nair <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: YiYi Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support
5 participants