Skip to content

Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' #2732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/en/api/pipelines/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ components - all of which are needed to have a functioning end-to-end diffusion
As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models:
- [Autoencoder](./api/models#vae)
- [Conditional Unet](./api/models#UNet2DConditionModel)
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPTextModel)
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPTextModel)
- a scheduler component, [scheduler](./api/scheduler#pndm),
- a [CLIPFeatureExtractor](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPFeatureExtractor),
- a [CLIPImageProcessor](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPImageProcessor),
- as well as a [safety checker](./stable_diffusion#safety_checker).
All of these components are necessary to run stable diffusion in inference even though they were trained
or created independently from each other.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/using-diffusers/custom_pipeline_examples.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,11 @@ The following code requires roughly 12GB of GPU RAM.

```python
from diffusers import DiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPModel
from transformers import CLIPImageProcessor, CLIPModel
import torch


feature_extractor = CLIPFeatureExtractor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
feature_extractor = CLIPImageProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16)


Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/using-diffusers/custom_pipeline_overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ and passing pipeline modules directly.

```python
from diffusers import DiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPModel
from transformers import CLIPImageProcessor, CLIPModel

clip_model_id = "laion/CLIP-ViT-B-32-laion2B-s34B-b79K"

feature_extractor = CLIPFeatureExtractor.from_pretrained(clip_model_id)
feature_extractor = CLIPImageProcessor.from_pretrained(clip_model_id)
clip_model = CLIPModel.from_pretrained(clip_model_id)

pipeline = DiffusionPipeline.from_pretrained(
Expand Down
6 changes: 3 additions & 3 deletions docs/source/en/using-diffusers/loading.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,7 @@ print(pipe)
StableDiffusionPipeline {
"feature_extractor": [
"transformers",
"CLIPFeatureExtractor"
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
Expand Down Expand Up @@ -445,7 +445,7 @@ StableDiffusionPipeline {
```

First, we see that the official pipeline is the [`StableDiffusionPipeline`], and second we see that the `StableDiffusionPipeline` consists of 7 components:
- `"feature_extractor"` of class `CLIPFeatureExtractor` as defined [in `transformers`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPFeatureExtractor).
- `"feature_extractor"` of class `CLIPImageProcessor` as defined [in `transformers`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPImageProcessor).
- `"safety_checker"` as defined [here](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32).
- `"scheduler"` of class [`PNDMScheduler`].
- `"text_encoder"` of class `CLIPTextModel` as defined [in `transformers`](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModel).
Expand Down Expand Up @@ -493,7 +493,7 @@ In the case of `runwayml/stable-diffusion-v1-5` the `model_index.json` is theref
"_diffusers_version": "0.6.0",
"feature_extractor": [
"transformers",
"CLIPFeatureExtractor"
"CLIPImageProcessor"
],
"safety_checker": [
"stable_diffusion",
Expand Down
4 changes: 2 additions & 2 deletions examples/community/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ The following code requires roughly 12GB of GPU RAM.

```python
from diffusers import DiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPModel
from transformers import CLIPImageProcessor, CLIPModel
import torch


feature_extractor = CLIPFeatureExtractor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
feature_extractor = CLIPImageProcessor.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K")
clip_model = CLIPModel.from_pretrained("laion/CLIP-ViT-B-32-laion2B-s34B-b79K", torch_dtype=torch.float16)


Expand Down
4 changes: 2 additions & 2 deletions examples/community/clip_guided_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from torch import nn
from torch.nn import functional as F
from torchvision import transforms
from transformers import CLIPFeatureExtractor, CLIPModel, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPModel, CLIPTextModel, CLIPTokenizer

from diffusers import (
AutoencoderKL,
Expand Down Expand Up @@ -64,7 +64,7 @@ def __init__(
tokenizer: CLIPTokenizer,
unet: UNet2DConditionModel,
scheduler: Union[PNDMScheduler, LMSDiscreteScheduler, DDIMScheduler],
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()
self.register_modules(
Expand Down
6 changes: 3 additions & 3 deletions examples/community/composable_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

import torch
from packaging import version
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import DiffusionPipeline
from diffusers.configuration_utils import FrozenDict
Expand Down Expand Up @@ -64,7 +64,7 @@ class ComposableStableDiffusionPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""
_optional_components = ["safety_checker", "feature_extractor"]
Expand All @@ -84,7 +84,7 @@ def __init__(
DPMSolverMultistepScheduler,
],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
requires_safety_checker: bool = True,
):
super().__init__()
Expand Down
6 changes: 3 additions & 3 deletions examples/community/imagic_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# TODO: remove and import from diffusers.utils when the new version of diffusers is released
from packaging import version
from tqdm.auto import tqdm
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import DiffusionPipeline
from diffusers.models import AutoencoderKL, UNet2DConditionModel
Expand Down Expand Up @@ -80,7 +80,7 @@ class ImagicStableDiffusionPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offsensive or harmful.
Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -92,7 +92,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()
self.register_modules(
Expand Down
6 changes: 3 additions & 3 deletions examples/community/img2img_inpainting.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import numpy as np
import PIL
import torch
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import DiffusionPipeline
from diffusers.configuration_utils import FrozenDict
Expand Down Expand Up @@ -79,7 +79,7 @@ class ImageToImageInpaintingPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -91,7 +91,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()

Expand Down
6 changes: 3 additions & 3 deletions examples/community/interpolate_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import numpy as np
import torch
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import DiffusionPipeline
from diffusers.configuration_utils import FrozenDict
Expand Down Expand Up @@ -70,7 +70,7 @@ class StableDiffusionWalkPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -82,7 +82,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()

Expand Down
8 changes: 4 additions & 4 deletions examples/community/lpw_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import PIL
import torch
from packaging import version
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

import diffusers
from diffusers import SchedulerMixin, StableDiffusionPipeline
Expand Down Expand Up @@ -422,7 +422,7 @@ class StableDiffusionLongPromptWeightingPipeline(StableDiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -436,7 +436,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: SchedulerMixin,
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
requires_safety_checker: bool = True,
):
super().__init__(
Expand All @@ -461,7 +461,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: SchedulerMixin,
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__(
vae=vae,
Expand Down
6 changes: 3 additions & 3 deletions examples/community/lpw_stable_diffusion_onnx.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import PIL
import torch
from packaging import version
from transformers import CLIPFeatureExtractor, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTokenizer

import diffusers
from diffusers import OnnxRuntimeModel, OnnxStableDiffusionPipeline, SchedulerMixin
Expand Down Expand Up @@ -441,7 +441,7 @@ def __init__(
unet: OnnxRuntimeModel,
scheduler: SchedulerMixin,
safety_checker: OnnxRuntimeModel,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
requires_safety_checker: bool = True,
):
super().__init__(
Expand All @@ -468,7 +468,7 @@ def __init__(
unet: OnnxRuntimeModel,
scheduler: SchedulerMixin,
safety_checker: OnnxRuntimeModel,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__(
vae_encoder=vae_encoder,
Expand Down
6 changes: 3 additions & 3 deletions examples/community/multilingual_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import torch
from transformers import (
CLIPFeatureExtractor,
CLIPImageProcessor,
CLIPTextModel,
CLIPTokenizer,
MBart50TokenizerFast,
Expand Down Expand Up @@ -79,7 +79,7 @@ class MultilingualStableDiffusion(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -94,7 +94,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()

Expand Down
2 changes: 1 addition & 1 deletion examples/community/sd_text2img_k_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ class StableDiffusionPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""
_optional_components = ["safety_checker", "feature_extractor"]
Expand Down
6 changes: 3 additions & 3 deletions examples/community/seed_resize_stable_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing import Callable, List, Optional, Union

import torch
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import DiffusionPipeline
from diffusers.models import AutoencoderKL, UNet2DConditionModel
Expand Down Expand Up @@ -42,7 +42,7 @@ class SeedResizeStableDiffusionPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -54,7 +54,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()
self.register_modules(
Expand Down
4 changes: 2 additions & 2 deletions examples/community/speech_to_image_diffusion.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import torch
from transformers import (
CLIPFeatureExtractor,
CLIPImageProcessor,
CLIPTextModel,
CLIPTokenizer,
WhisperForConditionalGeneration,
Expand Down Expand Up @@ -37,7 +37,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
):
super().__init__()

Expand Down
6 changes: 3 additions & 3 deletions examples/community/stable_diffusion_comparison.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from typing import Any, Callable, Dict, List, Optional, Union

import torch
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import (
AutoencoderKL,
Expand Down Expand Up @@ -46,7 +46,7 @@ class StableDiffusionComparisonPipeline(DiffusionPipeline):
safety_checker ([`StableDiffusionMegaSafetyChecker`]):
Classification module that estimates whether generated images could be considered offensive or harmful.
Please, refer to the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5) for details.
feature_extractor ([`CLIPFeatureExtractor`]):
feature_extractor ([`CLIPImageProcessor`]):
Model that extracts features from generated images to be used as inputs for the `safety_checker`.
"""

Expand All @@ -58,7 +58,7 @@ def __init__(
unet: UNet2DConditionModel,
scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler],
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
requires_safety_checker: bool = True,
):
super()._init_()
Expand Down
4 changes: 2 additions & 2 deletions examples/community/stable_diffusion_controlnet_img2img.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import numpy as np
import PIL.Image
import torch
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer

from diffusers import AutoencoderKL, ControlNetModel, DiffusionPipeline, UNet2DConditionModel, logging
from diffusers.pipelines.stable_diffusion import StableDiffusionPipelineOutput, StableDiffusionSafetyChecker
Expand Down Expand Up @@ -135,7 +135,7 @@ def __init__(
controlnet: ControlNetModel,
scheduler: KarrasDiffusionSchedulers,
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPFeatureExtractor,
feature_extractor: CLIPImageProcessor,
requires_safety_checker: bool = True,
):
super().__init__()
Expand Down
Loading