Skip to content

[Docs] refactor text-to-video zero #3049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 12, 2023
9 changes: 7 additions & 2 deletions docs/source/en/api/pipelines/text_to_video_zero.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,15 @@ Resources:
To generate a video from prompt, run the following python command
```python
import torch
import imageio
from diffusers import TextToVideoZeroPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A panda is playing guitar on times square"
result = pipe(prompt=prompt).images
result = [(r * 255).astype("uint8") for r in result]
imageio.mimsave("video.mp4", result, fps=4)
```
You can change these parameters in the pipeline call:
Expand Down Expand Up @@ -95,6 +97,7 @@ To generate a video from prompt with additional pose control

2. Read video containing extracted pose images
```python
from PIL import Image
import imageio

reader = imageio.get_reader(video_path, "ffmpeg")
Expand Down Expand Up @@ -151,6 +154,7 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/

2. Read video from path
```python
from PIL import Image
import imageio

reader = imageio.get_reader(video_path, "ffmpeg")
Expand All @@ -174,14 +178,14 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/
```


### Dreambooth specialization
### DreamBooth specialization
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, we are not using video but instead, pose_images. I think this needs correction. Could you please advise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the title of the step 1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In the code example, you initialized video but never used it. pose_images was used instead.


Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control**
can run with custom [DreamBooth](../training/dreambooth) models, as shown below for
[Canny edge ControlNet model](https://huggingface.co/lllyasviel/sd-controlnet-canny) and
[Avatar style DreamBooth](https://huggingface.co/PAIR/text2video-zero-controlnet-canny-avatar) model

1. Download demo video from huggingface
1. Download a demo video

```python
from huggingface_hub import hf_hub_download
Expand All @@ -193,6 +197,7 @@ can run with custom [DreamBooth](../training/dreambooth) models, as shown below

2. Read video from path
```python
from PIL import Image
import imageio

reader = imageio.get_reader(video_path, "ffmpeg")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -374,9 +374,8 @@ def __call__(
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`.
output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
output_type (`str`, *optional*, defaults to `"numpy"`):
The output format of the generated image. Choose between `"latent"` and `"numpy"`.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
plain tuple.
Expand Down