-
Notifications
You must be signed in to change notification settings - Fork 6.1k
StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support #8947
Conversation
Test script from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
import torch
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("stabilityai/sd-x2-latent-upscaler", torch_dtype=torch.float16)
upscaler.to("cuda")
prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic"
generator = torch.manual_seed(33)
# we stay in latent space! Let's make sure that Stable Diffusion returns the image
# in latent space
low_res_latents = pipeline(prompt, generator=generator, output_type="latent").images
upscaled_image = upscaler(
prompt=prompt,
image=low_res_latents,
num_inference_steps=20,
guidance_scale=0,
generator=generator,
).images[0]
# Let's save the upscaled image under "upscaled_astronaut.png"
upscaled_image.save("astronaut_1024.png")
prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds = upscaler.encode_prompt(
prompt=prompt,
device=upscaler._execution_device,
do_classifier_free_guidance=False,
)
upscaled_image = upscaler(
image=low_res_latents,
num_inference_steps=20,
guidance_scale=0,
generator=generator,
prompt_embeds=prompt_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
).images[0]
upscaled_image.save("embeds_astronaut_1024.png")
# as a comparison: Let's also save the low-res image
with torch.no_grad():
image = pipeline.decode_latents(low_res_latents)
image = pipeline.numpy_to_pil(image)[0]
image.save("astronaut_512.png") |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh thanks! I think it looks really nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Could you also share some results stemming from this change?
Let's also add a fast test for this?
Sure which does not have much different
Should I open another PR for this @sayakpaul ? |
We can do this in this PR |
…r/diffusers into latent_upscaler_prompt_embeds
…r/diffusers into latent_upscaler_prompt_embeds
Hi @sayakpaul, could you help me review this PR? |
…r/diffusers into latent_upscaler_prompt_embeds
@@ -0,0 +1,218 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh actually I think the tests for latent upscaler is here https://github.com/huggingface/diffusers/blob/main/tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py
maybe we add new tests there? very sorry for all the additional work to create a new test from scratch
@yiyixuxu I have just update the existing test. Could you help me review? Btw, adding new test is kinda interesting too. So no worries |
@@ -173,11 +173,51 @@ def test_inference(self): | |||
|
|||
self.assertEqual(image.shape, (1, 256, 256, 3)) | |||
expected_slice = np.array( | |||
[0.47222412, 0.41921633, 0.44717434, 0.46874192, 0.42588258, 0.46150726, 0.4677534, 0.45583832, 0.48579055] | |||
[0.3970313, 0.3768756, 0.41147298, 0.4716793, 0.5115408, 0.44601366, 0.43763855, 0.46781355, 0.46358708] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to update the expected_slice here? the results of existing tests should not change, no?
if image.shape[1] == 3: | ||
# encode image if not in latent-space yet | ||
image = self.vae.encode(image).latent_dist.sample() * self.vae.config.scaling_factor | ||
image = retrieve_latents(self.vae.encode(image), generator=generator) * self.vae.config.scaling_factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu I think it's due to this line. The old code does not take in generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we got the order of the negative_prompt_embeds and prompt_embeds reversed, that's why the previous test wasn't passing. Let's make the change, and change the test back and make sure it passes :)
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_latent_upscale.py
Outdated
Show resolved
Hide resolved
tests/pipelines/stable_diffusion_2/test_stable_diffusion_latent_upscale.py
Outdated
Show resolved
Hide resolved
…sion_latent_upscale.py Co-authored-by: YiYi Xu <[email protected]>
…sion_latent_upscale.py Co-authored-by: YiYi Xu <[email protected]>
…t_upscale.py Co-authored-by: YiYi Xu <[email protected]>
@yiyixuxu thanks for the catch! Sorry I did not check it thoroughly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
…s support (#8947) * make latent upscaler accept prompt embeds --------- Co-authored-by: Dhruv Nair <[email protected]> Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: YiYi Xu <[email protected]>
What does this PR do?
Fixes #8895
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu