[Pipeline] Add TextToVideoZeroPipeline #2954

19and99 · 2023-04-03T12:30:39Z

This pull request adds TextToVideoZeroPipeline to diffusers library.

Materials

Text2Video-Zero: https://github.com/Picsart-AI-Research/Text2Video-Zero
Paper: https://arxiv.org/abs/2303.13439

Sample code for inference

import torch
from diffusers import TextToVideoZeroPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A panda is playing guitar on times square"
result = pipe(prompt=prompt).images
imageio.mimsave("video.mp4", result, fps=4)

HuggingFaceDocBuilderDev · 2023-04-03T12:35:33Z

The documentation is not available anymore as the PR was closed or merged.

19and99 · 2023-04-04T10:38:59Z

@sayakpaul

sayakpaul · 2023-04-04T10:43:25Z

Hey @19and99! Thanks for the PR. We will review it soon.

Could you please ensure "Run code quality checks / check_repository_consistency" tests pass?

For that, I suggest:

Head over to the diffusers directory locally (the one you forked).
Activate your Python virtual environment for developing diffusers.
Run make fix-copies.
And then push the changes.

src/diffusers/models/attention_processor.py

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py

…nd99/diffusers into add-text2video-zero-pipeline

19and99 · 2023-04-04T19:38:36Z

I also added two resource files in docs/source/en/api/pipelines/res folder, I guess these need to be moved to huggingface dataset @sayakpaul @patrickvonplaten

sayakpaul · 2023-04-05T04:44:49Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+[Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://arxiv.org/abs/2303.13439) <br />
+Levon Khachatryan,
+Andranik Movsisyan,
+Vahram Tadevosyan,
+Roberto Henschel,
+[Zhangyang Wang](https://www.ece.utexas.edu/people/faculty/atlas-wang), Shant Navasardyan, [Humphrey Shi](https://www.humphreyshi.com)


This might break our doc-builder.

Suggested change

[Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://arxiv.org/abs/2303.13439) <br />

Levon Khachatryan,

Andranik Movsisyan,

Vahram Tadevosyan,

Roberto Henschel,

[Zhangyang Wang](https://www.ece.utexas.edu/people/faculty/atlas-wang), Shant Navasardyan, [Humphrey Shi](https://www.humphreyshi.com)

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators]((https://arxiv.org/abs/2303.13439)) by Levon Khachatryan, Andranik Movsisyan, Vahram Tadevosyan, Roberto Henschel, [Zhangyang Wang](https://www.ece.utexas.edu/people/faculty/atlas-wang), Shant Navasardyan, [Humphrey Shi](https://www.humphreyshi.com).

This isn't addressed @19and99

docs/source/en/api/pipelines/text_to_video_zero.mdx

sayakpaul · 2023-04-05T04:46:32Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+<br />
+Results are temporally consistent and follow closely the guidance and textual prompts.
+
+![img](./res/teaser_final.png)


We keep the repository lightweight.

So, please open a PR to https://huggingface.co/datasets/huggingface/documentation-images

Here it is https://huggingface.co/datasets/huggingface/documentation-images/discussions/71
What about test resources? I can see in some testes they download golden resources from https://huggingface.co/datasets/hf-internal-testing

I think @patrickvonplaten already added them.

See here: https://huggingface.co/datasets/hf-internal-testing/diffusers-images/tree/main/text-to-video

docs/source/en/api/pipelines/text_to_video_zero.mdx

sayakpaul · 2023-04-05T04:57:02Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+reader = imageio.get_reader('path/to/your/video',  'ffmpeg')
+frame_count = 8
+video = [Image.fromarray(reader.get_data(i)) for i in range(frame_count)]


Same as what I mentioned in https://github.com/huggingface/diffusers/pull/2954/files#r1158020490.

sayakpaul · 2023-04-05T05:01:38Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+
+### Dreambooth specialization 
+
+Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control** can run with custom dreambooth models by simply set the `model_id` to corresponding model path or url.


Could you expand this a bit more with a code snippet? That will be useful for the users.

Suggested change

Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control** can run with custom dreambooth models by simply set the `model_id` to corresponding model path or url.

Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control** can run with custom [DreamBooth](../training/dreambooth) models by simply set the `model_id` to corresponding model path or URL. You can filter out some available DreamBooth-trained models with [this link](https://huggingface.co/models?search=dreambooth).

…nd99/diffusers into add-text2video-zero-pipeline

patrickvonplaten · 2023-04-06T12:36:52Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+
+model_id = "runwayml/stable-diffusion-v1-5"
+controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16)
+pipe = StableDiffusionControlNetPipeline.from_pretrained(


This is very cool! Does this work? The CrossFrameAttnProcessor is enough to make it work? Nice!

I actually added 2 more lines that were missing))

patrickvonplaten · 2023-04-06T12:43:14Z

tests/pipelines/text_to_video/test_text_to_video_zero.py

+        prompt = "A bear is playing a guitar on Times Square"
+        result = pipe(prompt=prompt, generator=generator).images
+
+        expected_result = torch.load("docs/source/en/api/pipelines/res/A bear is playing a guitar on Times Square.pt")


Suggested change

expected_result = torch.load("docs/source/en/api/pipelines/res/A bear is playing a guitar on Times Square.pt")

expected_result = torch.load("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/text-to-video/A%20bear%20is%20playing%20a%20guitar%20on%20Times%20Square.pt")

Let's not upload tensors and images directly to the GitHub repo

patrickvonplaten · 2023-04-06T12:44:06Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+<br />
+Results are temporally consistent and follow closely the guidance and textual prompts.
+
+![img](./res/teaser_final.png)


Suggested change

![img](./res/teaser_final.png)

![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/text-to-video/teaser_final.png)

Uploaded the picture here: https://huggingface.co/datasets/hf-internal-testing/diffusers-images/blob/main/text-to-video/teaser_final.png

I've uploaded this image to datasets/huggingface/documentation-images. You may remove it from hf-internal-testing

patrickvonplaten · 2023-04-06T12:45:47Z

Besides some final things to change as @sayakpaul pointed out from my side we're good to merge for this model.
Please let's make sure to delete the res folder as we don't want to upload any heavy objects to the GitHub repo.

I've uploaded the data here for you: https://huggingface.co/datasets/hf-internal-testing/diffusers-images/tree/main/text-to-video

Let's make sure the tests pass and I think we're good to go :-)

…urces, delete res folder

docs/source/en/api/pipelines/text_to_video_zero.mdx

sayakpaul · 2023-04-10T07:13:31Z

docs/source/en/api/pipelines/text_to_video_zero.mdx

+[Canny edge ControlNet model](https://huggingface.co/lllyasviel/sd-controlnet-canny) and
+[Avatar style DreamBooth](https://huggingface.co/PAIR/text2video-zero-controlnet-canny-avatar) model
+
+1. Download demo video from huggingface


Suggested change

1. Download demo video from huggingface

Download a demo video

docs/source/en/api/pipelines/text_to_video_zero.mdx

sayakpaul

Excellent work here @19and99! Really well done and thanks so much for iterating so much.

I went ahead and changed a few nits. Hope that's okay.

@patrickvonplaten I think we should we ready to merge this one!

patrickvonplaten

Amazing work @19and99

Skquark · 2023-04-12T01:19:25Z

I got the TextToVideoZeroPipeline working, able to save the frames to video with imageio.mimsave, however I'm struggling to save the individual frames as png image files after exporting as mp4. The output_type="tensor" which was recommended default (didn't look like output_type np or pil was implemented), and the type shows as numpy.ndarray. I'm doing the standard 'for image in images:' and have tried saving the image with imageio.imwrite, .imsave, cv2.imwrite, Image.fromarray, pipe.numpy_to_pil, converting to uint8, and a bunch of other methods that just result in type errors or black images. Couldn't find any examples or posted issues that gave me any method. I previously struggled with the TextToVideoSDPipeine doing the same thing, but there it worked with cv2.imwrite method. It's probably simple answer, I'm just not getting it. Any help with saving those tensor frames? Thanks.

sayakpaul · 2023-04-12T01:25:59Z

#3049 should make this more clear.

But if you do this (from the official documentation):

import torch
from diffusers import TextToVideoZeroPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A panda is playing guitar on times square"
result = pipe(prompt=prompt).images
imageio.mimsave("video.mp4", result, fps=4)

then it should work.

Also, going forward please open a new issue as it's easier for us to keep track of them that way.

Cc: @19and99

Skquark · 2023-04-12T01:30:15Z

I got that part working to use imageio.mimsave to an mp4, that wasn't the problem. I'm trying to save those frame images as png files as well, that was the issue...

sayakpaul · 2023-04-12T01:39:44Z

Ah, sorry. I got lost in the longer message.

If that that's case, you can do:

from PIL import Image

# The images are `np.ndarray`.
result = pipe(prompt=prompt).images

result = [Image.fromarray((image * 255).astype("uint8")) from image in result]
for i, image in enumerate(result):
    image.save(f"{i}.png")

Does this work?

Skquark · 2023-04-12T02:10:35Z

Nice, that worked, thanks. I tried something similar to that solution, but a little differently. Much appreciated.
Side note, when doing the imageio.mimsave, I get a series of these warnings:
WARNING:imageio:Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
Still works, but is these a way around getting these warnings?

* add TextToVideoZeroPipeline and CrossFrameAttnProcessor * add docs for text-to-video zero * add teaser image for text-to-video zero docs * Fix review changes. Add Documentation. Add test * clean up the codes in pipeline_text_to_video.py. Add descriptive comments and docstrings * make style && make quality * make fix-copies * make requested changes to docs. use huggingface server links for resources, delete res folder * make style && make quality && make fix-copies * make style && make quality * Apply suggestions from code review --------- Co-authored-by: Sayak Paul <[email protected]>

add TextToVideoZeroPipeline and CrossFrameAttnProcessor

116faa2

Merge branch 'main' into add-text2video-zero-pipeline

c12c98d

sayakpaul requested review from patrickvonplaten and sayakpaul April 4, 2023 10:43

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

src/diffusers/models/attention_processor.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

src/diffusers/models/attention_processor.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Apr 4, 2023

View reviewed changes

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py Outdated Show resolved Hide resolved

19and99 and others added 6 commits April 4, 2023 20:53

add docs for text-to-video zero

554d8f7

Merge branch 'add-text2video-zero-pipeline' of https://github.com/19a…

827b27f

…nd99/diffusers into add-text2video-zero-pipeline

Merge branch 'main' into add-text2video-zero-pipeline

063f817

add teaser image for text-to-video zero docs

5636129

Merge branch 'add-text2video-zero-pipeline' of https://github.com/19a…

c68e0d0

…nd99/diffusers into add-text2video-zero-pipeline

Fix review changes. Add Documentation. Add test

76eba6c

Merge branch 'main' into add-text2video-zero-pipeline

7ba88b7

sayakpaul reviewed Apr 5, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 5, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 5, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Show resolved Hide resolved

sayakpaul reviewed Apr 5, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 5, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 5, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 5, 2023

View reviewed changes

19and99 added 3 commits April 6, 2023 15:05

Merge branch 'add-text2video-zero-pipeline' of https://github.com/19a…

0bc0ebe

…nd99/diffusers into add-text2video-zero-pipeline

make style && make quality

f44ce33

make fix-copies

0cc4440

patrickvonplaten reviewed Apr 6, 2023

View reviewed changes

19and99 added 3 commits April 6, 2023 19:00

make requested changes to docs. use huggingface server links for reso…

f56b88c

…urces, delete res folder

make style && make quality && make fix-copies

a3b7635

make style && make quality

7ca8792

19and99 changed the title ~~Add TextToVideoZeroPipeline~~ [Pipeline] Add TextToVideoZeroPipeline Apr 6, 2023

sayakpaul reviewed Apr 10, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 10, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 10, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 10, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

sayakpaul reviewed Apr 10, 2023

View reviewed changes

docs/source/en/api/pipelines/text_to_video_zero.mdx Outdated Show resolved Hide resolved

Apply suggestions from code review

ebdaf74

sayakpaul approved these changes Apr 10, 2023

View reviewed changes

patrickvonplaten approved these changes Apr 10, 2023

View reviewed changes

patrickvonplaten merged commit ba49272 into huggingface:main Apr 10, 2023


		### Dreambooth specialization

		Methods Text-To-Video, Text-To-Video with Pose Control and Text-To-Video with Edge Control can run with custom dreambooth models by simply set the `model_id` to corresponding model path or url.

	expected_result = torch.load("docs/source/en/api/pipelines/res/A bear is playing a guitar on Times Square.pt")
	expected_result = torch.load("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/text-to-video/A%20bear%20is%20playing%20a%20guitar%20on%20Times%20Square.pt")

	![img](./res/teaser_final.png)
	![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/text-to-video/teaser_final.png)

	1. Download demo video from huggingface
	Download a demo video

[Pipeline] Add TextToVideoZeroPipeline #2954

[Pipeline] Add TextToVideoZeroPipeline #2954

Uh oh!

Conversation

19and99 commented Apr 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Materials

Sample code for inference

Uh oh!

HuggingFaceDocBuilderDev commented Apr 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

19and99 commented Apr 4, 2023

Uh oh!

sayakpaul commented Apr 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

19and99 commented Apr 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Apr 6, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

19and99 commented Apr 3, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 3, 2023 •

edited

Loading

19and99 commented Apr 4, 2023 •

edited

Loading

Skquark commented Apr 12, 2023 •

edited

Loading