Skip to content

Commit fa736e3

Browse files
authored
[Docs] refactor text-to-video zero (#3049)
* fix: norm group test for UNet3D. * refactor text-to-video zero docs.
1 parent a4b233e commit fa736e3

File tree

2 files changed

+9
-5
lines changed

2 files changed

+9
-5
lines changed

docs/source/en/api/pipelines/text_to_video_zero.mdx

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,13 +61,15 @@ Resources:
6161
To generate a video from prompt, run the following python command
6262
```python
6363
import torch
64+
import imageio
6465
from diffusers import TextToVideoZeroPipeline
6566

6667
model_id = "runwayml/stable-diffusion-v1-5"
6768
pipe = TextToVideoZeroPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
6869

6970
prompt = "A panda is playing guitar on times square"
7071
result = pipe(prompt=prompt).images
72+
result = [(r * 255).astype("uint8") for r in result]
7173
imageio.mimsave("video.mp4", result, fps=4)
7274
```
7375
You can change these parameters in the pipeline call:
@@ -95,6 +97,7 @@ To generate a video from prompt with additional pose control
9597

9698
2. Read video containing extracted pose images
9799
```python
100+
from PIL import Image
98101
import imageio
99102

100103
reader = imageio.get_reader(video_path, "ffmpeg")
@@ -151,6 +154,7 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/
151154

152155
2. Read video from path
153156
```python
157+
from PIL import Image
154158
import imageio
155159

156160
reader = imageio.get_reader(video_path, "ffmpeg")
@@ -174,14 +178,14 @@ To perform text-guided video editing (with [InstructPix2Pix](./stable_diffusion/
174178
```
175179

176180

177-
### Dreambooth specialization
181+
### DreamBooth specialization
178182

179183
Methods **Text-To-Video**, **Text-To-Video with Pose Control** and **Text-To-Video with Edge Control**
180184
can run with custom [DreamBooth](../training/dreambooth) models, as shown below for
181185
[Canny edge ControlNet model](https://huggingface.co/lllyasviel/sd-controlnet-canny) and
182186
[Avatar style DreamBooth](https://huggingface.co/PAIR/text2video-zero-controlnet-canny-avatar) model
183187

184-
1. Download demo video from huggingface
188+
1. Download a demo video
185189

186190
```python
187191
from huggingface_hub import hf_hub_download
@@ -193,6 +197,7 @@ can run with custom [DreamBooth](../training/dreambooth) models, as shown below
193197

194198
2. Read video from path
195199
```python
200+
from PIL import Image
196201
import imageio
197202

198203
reader = imageio.get_reader(video_path, "ffmpeg")

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -374,9 +374,8 @@ def __call__(
374374
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
375375
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
376376
tensor will ge generated by sampling using the supplied random `generator`.
377-
output_type (`str`, *optional*, defaults to `"pil"`):
378-
The output format of the generate image. Choose between
379-
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
377+
output_type (`str`, *optional*, defaults to `"numpy"`):
378+
The output format of the generated image. Choose between `"latent"` and `"numpy"`.
380379
return_dict (`bool`, *optional*, defaults to `True`):
381380
Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
382381
plain tuple.

0 commit comments

Comments
 (0)