diff --git a/README.md b/README.md index 9655220a2491..489e0d154af2 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ limitations under the License. ## Installation -We recommend installing ๐Ÿค— Diffusers in a virtual environment from PyPi or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation. +We recommend installing ๐Ÿค— Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation. ### PyTorch @@ -77,7 +77,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi ## Quickstart -Generating outputs is super easy with ๐Ÿค— Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 14000+ checkpoints): +Generating outputs is super easy with ๐Ÿค— Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 15000+ checkpoints): ```python from diffusers import DiffusionPipeline @@ -94,14 +94,13 @@ You can also dig into the models and schedulers toolbox to build your own diffus from diffusers import DDPMScheduler, UNet2DModel from PIL import Image import torch -import numpy as np scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256") model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda") scheduler.set_timesteps(50) sample_size = model.config.sample_size -noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda") +noise = torch.randn((1, 3, sample_size, sample_size), device="cuda") input = noise for t in scheduler.timesteps: @@ -136,8 +135,7 @@ You can look out for [issues](https://github.com/huggingface/diffusers/issues) y - See [New model/pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) to contribute exciting new diffusion models / diffusion pipelines - See [New scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) -Also, say ๐Ÿ‘‹ in our public Discord channel Join us on Discord. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or -just hang out โ˜•. +Also, say ๐Ÿ‘‹ in our public Discord channel Join us on Discord. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or just hang out โ˜•. ## Popular Tasks & Pipelines diff --git a/docs/source/en/optimization/memory.md b/docs/source/en/optimization/memory.md index 281b65df8d8c..42a1bcea8fb5 100644 --- a/docs/source/en/optimization/memory.md +++ b/docs/source/en/optimization/memory.md @@ -194,9 +194,9 @@ unet_runs_per_experiment = 50 # load inputs def generate_inputs(): - sample = torch.randn(2, 4, 64, 64).half().cuda() - timestep = torch.rand(1).half().cuda() * 999 - encoder_hidden_states = torch.randn(2, 77, 768).half().cuda() + sample = torch.randn((2, 4, 64, 64), device="cuda", dtype=torch.float16) + timestep = torch.rand(1, device="cuda", dtype=torch.float16) * 999 + encoder_hidden_states = torch.randn((2, 77, 768), device="cuda", dtype=torch.float16) return sample, timestep, encoder_hidden_states diff --git a/docs/source/en/tutorials/basic_training.md b/docs/source/en/tutorials/basic_training.md index 3b545cdf572e..c9ce315af41f 100644 --- a/docs/source/en/tutorials/basic_training.md +++ b/docs/source/en/tutorials/basic_training.md @@ -321,13 +321,13 @@ Now you can wrap all these components together in a training loop with ๐Ÿค— Acce ... for step, batch in enumerate(train_dataloader): ... clean_images = batch["images"] ... # Sample noise to add to the images -... noise = torch.randn(clean_images.shape).to(clean_images.device) +... noise = torch.randn(clean_images.shape, device=clean_images.device) ... bs = clean_images.shape[0] ... # Sample a random timestep for each image ... timesteps = torch.randint( ... 0, noise_scheduler.config.num_train_timesteps, (bs,), device=clean_images.device -... ).long() +... ) ... # Add noise to the clean images according to the noise magnitude at each timestep ... # (this is the forward diffusion process) diff --git a/docs/source/en/using-diffusers/write_own_pipeline.md b/docs/source/en/using-diffusers/write_own_pipeline.md index d3061e36fe63..4ca3fe33223b 100644 --- a/docs/source/en/using-diffusers/write_own_pipeline.md +++ b/docs/source/en/using-diffusers/write_own_pipeline.md @@ -71,7 +71,7 @@ tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720, >>> import torch >>> sample_size = model.config.sample_size ->>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda") +>>> noise = torch.randn((1, 3, sample_size, sample_size), device="cuda") ``` 5. Now write a loop to iterate over the timesteps. At each timestep, the model does a [`UNet2DModel.forward`] pass and returns the noisy residual. The scheduler's [`~DDPMScheduler.step`] method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array. @@ -216,8 +216,8 @@ Next, generate some initial random noise as a starting point for the diffusion p >>> latents = torch.randn( ... (batch_size, unet.config.in_channels, height // 8, width // 8), ... generator=generator, +... device=torch_device, ... ) ->>> latents = latents.to(torch_device) ``` ### Denoise the image diff --git a/docs/source/ko/optimization/fp16.md b/docs/source/ko/optimization/fp16.md index 30197305540c..0f2c487a75ce 100644 --- a/docs/source/ko/optimization/fp16.md +++ b/docs/source/ko/optimization/fp16.md @@ -273,9 +273,9 @@ unet_runs_per_experiment = 50 # ์ž…๋ ฅ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ def generate_inputs(): - sample = torch.randn(2, 4, 64, 64).half().cuda() - timestep = torch.rand(1).half().cuda() * 999 - encoder_hidden_states = torch.randn(2, 77, 768).half().cuda() + sample = torch.randn((2, 4, 64, 64), device="cuda", dtype=torch.float16) + timestep = torch.rand(1, device="cuda", dtype=torch.float16) * 999 + encoder_hidden_states = torch.randn((2, 77, 768), device="cuda", dtype=torch.float16) return sample, timestep, encoder_hidden_states diff --git a/docs/source/ko/tutorials/basic_training.md b/docs/source/ko/tutorials/basic_training.md index a4e5e2a0c8bb..df5e74c22ca8 100644 --- a/docs/source/ko/tutorials/basic_training.md +++ b/docs/source/ko/tutorials/basic_training.md @@ -322,13 +322,13 @@ TensorBoard์— ๋กœ๊น…, ๊ทธ๋ž˜๋””์–ธํŠธ ๋ˆ„์  ๋ฐ ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ•™์Šต์„ ์‰ฝ ... for step, batch in enumerate(train_dataloader): ... clean_images = batch["images"] ... # ์ด๋ฏธ์ง€์— ๋”ํ•  ๋…ธ์ด์ฆˆ๋ฅผ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. -... noise = torch.randn(clean_images.shape).to(clean_images.device) +... noise = torch.randn(clean_images.shape, device=clean_images.device) ... bs = clean_images.shape[0] ... # ๊ฐ ์ด๋ฏธ์ง€๋ฅผ ์œ„ํ•œ ๋žœ๋คํ•œ ํƒ€์ž„์Šคํ…(timestep)์„ ์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค. ... timesteps = torch.randint( ... 0, noise_scheduler.config.num_train_timesteps, (bs,), device=clean_images.device -... ).long() +... ) ... # ๊ฐ ํƒ€์ž„์Šคํ…์˜ ๋…ธ์ด์ฆˆ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ... # (์ด๋Š” foward diffusion ๊ณผ์ •์ž…๋‹ˆ๋‹ค.) diff --git a/docs/source/ko/using-diffusers/write_own_pipeline.md b/docs/source/ko/using-diffusers/write_own_pipeline.md index a6469644566c..787c8113bf0d 100644 --- a/docs/source/ko/using-diffusers/write_own_pipeline.md +++ b/docs/source/ko/using-diffusers/write_own_pipeline.md @@ -71,7 +71,7 @@ specific language governing permissions and limitations under the License. >>> import torch >>> sample_size = model.config.sample_size - >>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda") + >>> noise = torch.randn((1, 3, sample_size, sample_size), device="cuda") ``` 5. ์ด์ œ timestep์„ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฃจํ”„๋ฅผ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ฐ timestep์—์„œ ๋ชจ๋ธ์€ [`UNet2DModel.forward`]๋ฅผ ํ†ตํ•ด noisy residual์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์Šค์ผ€์ค„๋Ÿฌ์˜ [`~DDPMScheduler.step`] ๋ฉ”์„œ๋“œ๋Š” noisy residual, timestep, ๊ทธ๋ฆฌ๊ณ  ์ž…๋ ฅ์„ ๋ฐ›์•„ ์ด์ „ timestep์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด ์ถœ๋ ฅ์€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฃจํ”„์˜ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋‹ค์Œ ์ž…๋ ฅ์ด ๋˜๋ฉฐ, `timesteps` ๋ฐฐ์—ด์˜ ๋์— ๋„๋‹ฌํ•  ๋•Œ๊นŒ์ง€ ๋ฐ˜๋ณต๋ฉ๋‹ˆ๋‹ค. @@ -212,8 +212,8 @@ Stable Diffusion ์€ text-to-image *latent diffusion* ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. latent di >>> latents = torch.randn( ... (batch_size, unet.in_channels, height // 8, width // 8), ... generator=generator, +... device=torch_device, ... ) ->>> latents = latents.to(torch_device) ``` ### ์ด๋ฏธ์ง€ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ