ControlNet-xs with depth Control

### Describe the bug

I have tried to use controlnet-xs  pipeline with depth control, but there are some bugs here. I cannot find any instructions on how to use it on the Depth map in diffusers (only a canny image). It would be great if the author can provide some instructions on Control-XS on depth map @sayakpaul 
My diffusers version:  2a111bc9 [origin/main] [Advanced Training Script] Fix pipe example (#6106)


### Reproduction

``` 
from transformers import pipeline

prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"

depth_estimator = pipeline('depth-estimation')

image = Image.open('images_stormtrooper.png')
depth_image = depth_estimator(image)['depth']
image = np.array(image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
depth_image = Image.fromarray(image)
depth_image.save('depth.png')

controlnet_conditioning_scale = 0.5  # recommended for good generalization

 # initialize the models and pipeline
controlnet_conditioning_scale = 0.5  # recommended for good generalization
controlnet = ControlNetXSModel.from_pretrained("UmerHA/ConrolNetXS-SDXL-depth", torch_dtype=torch.float16)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetXSPipeline.from_pretrained(
        ...     "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16
        ... )
pipe.enable_model_cpu_offload()

image = pipe(
  prompt, controlnet_conditioning_scale=controlnet_conditioning_scale, image=depth_image
).images[0]
image.save('test.png')
``` 

### Logs

```shell
File "/home/josha/reference/diffusers/src/diffusers/models/controlnetxs.py", line 741, in forward
    h_base = h_base + next(it_up_convs_out)(hs_ctrl.pop()) * next(scales)  # add info from ctrl encoder
RuntimeError: The size of tensor a (30) must match the size of tensor b (29) at non-singleton dimension 3
```
```


### System Info

```
- `diffusers` version: 0.25.0.dev0
- Platform: Linux-6.2.0-34-generic-x86_64-with-glibc2.35
- Python version: 3.10.9
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.19.4
- Transformers version: 4.33.2
- Accelerate version: 0.21.0
- xFormers version: 0.0.21
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
```

### Who can help?

@sayakpaul Hi Sayak, thanks for your supporting on ControlNet-XS, It would be great if you can reply to this information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ControlNet-xs with depth Control #6109

Describe the bug

Reproduction

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ControlNet-xs with depth Control #6109

Description

Describe the bug

Reproduction

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions