Skip to content

Finish docs textual inversion #3068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 41 additions & 4 deletions docs/source/en/training/text_inversion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -157,24 +157,61 @@ If you're interested in following along with your model training progress, you c

## Inference

Once you have trained a model, you can use it for inference with the [`StableDiffusionPipeline`]. Make sure you include the `placeholder_token` in your prompt, in this case, it is `<cat-toy>`.
Once you have trained a model, you can use it for inference with the [`StableDiffusionPipeline`].

The textual inversion script will by default only save the textual inversion embedding vector(s) that have
been added to the text encoder embedding matrix and consequently been trained.
Comment on lines +162 to +163
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also provide the example?


<frameworkcontent>
<pt>
<Tip>

💡 The community has created a large library of different textual inversion embedding vectors, called [sd-concepts-library](https://huggingface.co/sd-concepts-library).
Instead of training textual inversion embeddings from scratch you can also see whether a fitting textual inversion embedding has already been added to the libary.

</Tip>

To load the textual inversion embeddings you first need to load the base model that was used when training
your textual inversion embedding vectors. Here we assume that [`runwayml/stable-diffusion-v1-5`](runwayml/stable-diffusion-v1-5)
was used as a base model so we load it first:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked if it was possible to retrieve the model from the trained concepts, but it looks like not all the models have a base_model metadata tag in the readme file :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I noticed this too, but changed the notebook so that future models have it :-) Sadly we don't really know what previous text inversions were trained on (I'd assume that most were trained on compvis1-4 though: huggingface/notebooks@944dc4b) cc @apolinario

```python
from diffusers import StableDiffusionPipeline
import torch

model_id = "path-to-your-trained-model"
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
```

prompt = "A <cat-toy> backpack"
Next, we need to load the textual inversion embedding vector which can be done via the [`TextualInversionLoaderMixin.load_textual_inversion`]
function. Here we'll load the embeddings of the "<cat-toy>" example from before.
```python
pipe.load_textual_inversion("sd-concepts-library/cat-toy")
```

image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
Now we can run the pipeline making sure that the placeholder token `<cat-toy>` is used in our prompt.

```python
prompt = "A <cat-toy> backpack"

image = pipe(prompt, num_inference_steps=50).images[0]
image.save("cat-backpack.png")
```

The function [`TextualInversionLoaderMixin.load_textual_inversion`] can not only
load textual embedding vectors saved in Diffusers' format, but also embedding vectors
saved in [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) format.
To do so, you can first download an embedding vector from [civitAI](https://civitai.com/models/3036?modelVersionId=8387)
and then load it locally:
```python
pipe.load_textual_inversion("./charturnerv2.pt")
```
</pt>
<jax>
Currently there is no `load_textual_inversion` function for Flax so one has to make sure the textual inversion
embedding vector is saved as part of the model after training.

The model can then be run just like any other Flax model:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence closes with :. Did you mean to provide any example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok, there's a python snippet after the paragraph.


```python
import jax
import numpy as np
Expand Down
38 changes: 37 additions & 1 deletion src/diffusers/loaders.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ def load_textual_inversion(
):
r"""
Load textual inversion embeddings into the text encoder of stable diffusion pipelines. Both `diffusers` and
`Automatic1111` formats are supported.
`Automatic1111` formats are supported (see example below).

<Tip warning={true}>

Expand Down Expand Up @@ -427,6 +427,42 @@ def load_textual_inversion(
models](https://huggingface.co/docs/hub/models-gated#gated-models).

</Tip>

Example:

To load a textual inversion embedding vector in `diffusers` format:
```py
from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

pipe.load_textual_inversion("sd-concepts-library/cat-toy")

prompt = "A <cat-toy> backpack"

image = pipe(prompt, num_inference_steps=50).images[0]
image.save("cat-backpack.png")
```

To load a textual inversion embedding vector in Automatic1111 format, make sure to first download the vector,
e.g. from [civitAI](https://civitai.com/models/3036?modelVersionId=9857) and then load the vector locally:

```py
from diffusers import StableDiffusionPipeline
import torch

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

pipe.load_textual_inversion("./charturnerv2.pt")

prompt = "charturnerv2, multiple views of the same character in the same outfit, a character turnaround of a woman wearing a black jacket and red shirt, best quality, intricate details."

image = pipe(prompt, num_inference_steps=50).images[0]
image.save("character.png")
```
"""
if not hasattr(self, "tokenizer") or not isinstance(self.tokenizer, PreTrainedTokenizer):
raise ValueError(
Expand Down