StableCascadeDecoderPipeline Error with num_images_per_prompt > 1

### Describe the bug

I'm encountering a RuntimeError when using the `StableCascadeDecoderPipeline` with `num_images_per_prompt` set to a value greater than 1. Since parallel processing functions correctly in the `StableCascadePriorPipeline`, I suspect an issue with how latents are being passed to the decoder pipeline.

```cmd
Traceback (most recent call last):
  File "E:\stable-cascade-one-click-installer\issue.py", line 18, in <module>
    decoder_output = decoder(
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\pipelines\stable_cascade\pipeline_stable_cascade.py", line 429, in __call__
    predicted_latents = self.decoder(
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 604, in forward
    level_outputs = self._down_encode(x, timestep_ratio_embed, clip)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 496, in _down_encode
    x = block(x, clip)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 109, in forward
    kv = torch.cat([norm_x.view(batch_size, channel, -1).transpose(1, 2), kv], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list.
```

### Reproduction

```Python
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "an image of a smiley, donning a spacesuit and helmet"

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", variant="bf16", torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", variant="bf16", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    num_images_per_prompt=2,
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings.to(torch.float16),
    prompt=prompt,
    output_type="pil",
).images[0]
decoder_output.save("cascade.png")
```

### Logs

```shell
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade-prior HTTP/1.1" 200 3491
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade-prior/resolve/main/model_index.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade HTTP/1.1" 200 9845
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade/resolve/main/model_index.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade-prior HTTP/1.1" 200 3491
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade-prior/resolve/main/model_index.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade HTTP/1.1" 200 9845
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade/resolve/main/model_index.json HTTP/1.1" 200 0
ERROR:__main__:Error occurred during image generation:
Traceback (most recent call last):
  File "E:\stable-cascade-one-click-installer\issue.py", line 23, in <module>
    decoder_output = decoder(
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\pipelines\stable_cascade\pipeline_stable_cascade.py", line 429, in __call__
    predicted_latents = self.decoder(
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 604, in forward
    level_outputs = self._down_encode(x, timestep_ratio_embed, clip)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 496, in _down_encode
    x = block(x, clip)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 109, in forward
    kv = torch.cat([norm_x.view(batch_size, channel, -1).transpose(1, 2), kv], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade-prior HTTP/1.1" 200 3491
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade-prior/resolve/main/model_index.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade HTTP/1.1" 200 9845
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade/resolve/main/model_index.json HTTP/1.1" 200 0
ERROR:__main__:Error occurred during image generation:
Traceback (most recent call last):
  File "E:\stable-cascade-one-click-installer\issue.py", line 21, in <module>
    logger.debug(f"norm_x shape: {norm_x.shape}")
NameError: name 'norm_x' is not defined
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade-prior HTTP/1.1" 200 3491
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade-prior/resolve/main/model_index.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade HTTP/1.1" 200 9845
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade/resolve/main/model_index.json HTTP/1.1" 200 0
ERROR:__main__:Error occurred during image generation:
Traceback (most recent call last):
  File "E:\stable-cascade-one-click-installer\issue.py", line 22, in <module>
    logger.debug(f"norm_x shape: {norm_x.shape}")
NameError: name 'norm_x' is not defined
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade-prior HTTP/1.1" 200 3491
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade-prior/resolve/main/model_index.json HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "GET /api/models/stabilityai/stable-cascade HTTP/1.1" 200 9845
DEBUG:urllib3.connectionpool:https://huggingface.co:443 "HEAD /stabilityai/stable-cascade/resolve/main/model_index.json HTTP/1.1" 200 0
ERROR:__main__:Error occurred during image generation:
Traceback (most recent call last):
  File "E:\stable-cascade-one-click-installer\issue.py", line 23, in <module>
    decoder_output = decoder(
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\pipelines\stable_cascade\pipeline_stable_cascade.py", line 429, in __call__
    predicted_latents = self.decoder(
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\accelerate\hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 604, in forward
    level_outputs = self._down_encode(x, timestep_ratio_embed, clip)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 496, in _down_encode
    x = block(x, clip)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\stable-cascade-one-click-installer\venv\lib\site-packages\diffusers\models\unets\unet_stable_cascade.py", line 109, in forward
    kv = torch.cat([norm_x.view(batch_size, channel, -1).transpose(1, 2), kv], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list.
```


### System Info

- `diffusers` version: 0.27.0
- Platform: Windows-10-10.0.22631-SP0
- Python version: 3.10.9
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Huggingface_hub version: 0.21.4
- Transformers version: 4.38.2
- Accelerate version: 0.28.0
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

### Who can help?

@DN6 @yiyi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StableCascadeDecoderPipeline Error with num_images_per_prompt > 1 #7377

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

StableCascadeDecoderPipeline Error with num_images_per_prompt > 1 #7377

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions