Skip to content

[Pipeline] Marigold depth and normals estimation #7847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 111 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
32d5131
implement marigold depth and normals pipelines in diffusers core
toshas May 2, 2024
23255c2
remove bibtex
toshas May 3, 2024
59ec93b
Merge branch 'main' into pipeline_marigold_pr
toshas May 12, 2024
4bde96c
remove deprecations
toshas May 8, 2024
f12d94c
remove save_memory argument
toshas May 8, 2024
bf87789
remove validate_vae
toshas May 9, 2024
bce8dd6
remove config output
toshas May 9, 2024
b299011
remove batch_size autodetection
toshas May 9, 2024
e9d6bdf
remove presets logic
toshas May 9, 2024
9750185
remove no_grad
toshas May 9, 2024
ef881c6
add fp16 to the example usage
toshas May 9, 2024
c1c7520
implement is_matplotlib_available
toshas May 9, 2024
04c807d
move colormap, visualize_depth, and visualize_normals into export_uti…
toshas May 9, 2024
3de290f
make the denoising loop more lucid
toshas May 12, 2024
a013e2f
style
toshas May 8, 2024
142df36
Merge branch 'main' into pipeline_marigold_pr
toshas May 13, 2024
2e7e747
rename denoising_steps into num_inference_steps
toshas May 13, 2024
405feb3
rename input_image into image
toshas May 13, 2024
b72334b
rename input_latent into latents
toshas May 13, 2024
0878b3e
remove decode_image
toshas May 13, 2024
601bfc8
move clean_latent outside of progress_bar
toshas May 13, 2024
a21da45
refactor marigold-reusable image processing bits into MarigoldImagePr…
toshas May 13, 2024
6b48618
Merge branch 'main' into pipeline_marigold_pr
toshas May 15, 2024
5847bbd
clean up the usage example docstring
toshas May 14, 2024
a1fd551
make ensemble functions members of the pipelines
toshas May 14, 2024
6f17e2c
add early checks in check_inputs
toshas May 14, 2024
832cbd8
fix vae_scale_factor computation
toshas May 14, 2024
c4ff406
better compatibility with torch.compile
toshas May 14, 2024
f3ae063
move export_depth_to_png to export_utils
toshas May 14, 2024
706ff6d
remove encode_prediction
toshas May 14, 2024
b699d76
improve visualize_depth and visualize_normals to accept multi-dimensi…
toshas May 15, 2024
c72c5b7
do not shortcut vae.config variables
toshas May 13, 2024
0d09a0c
change all asserts to raise ValueError
toshas May 16, 2024
b597ee8
rename output_prediction_type to output_type
toshas May 16, 2024
91cba30
better variable names
toshas May 16, 2024
99213c3
better variable names
toshas May 16, 2024
bca2571
pass desc and leave kwargs into the diffusers progress_bar
toshas May 16, 2024
30a15fa
implement scale_invariant and shift_invariant flags in the ensemble_d…
toshas May 17, 2024
2b276f6
fix generator device placement checks
toshas May 17, 2024
d8a2550
move encode_empty_text body into the pipeline call
toshas May 17, 2024
24ff896
minor empty text encoding simplifications
toshas May 17, 2024
d9f9ca5
adjust pipelines' class docstrings to explain the added construction …
toshas May 17, 2024
ee56c35
improve the scipy failure condition
toshas May 17, 2024
1b2c027
make input image values range check configurable in the preprocessor
toshas May 19, 2024
a06bf1f
remove forgotten print
toshas May 19, 2024
2b3261e
add prediction_type model config
toshas May 19, 2024
0627efc
add uncertainty visualization into export utils
toshas May 19, 2024
7c16b07
change default of output_uncertainty to False
toshas May 19, 2024
9767ddd
Merge branch 'main' into pipeline_marigold_pr
toshas May 19, 2024
6e6e132
fix `output_uncertainty=False`
toshas May 19, 2024
3634eb0
remove kwargs
toshas May 20, 2024
f4813c7
rename prepare_latent into prepare_latents as in other pipelines
toshas May 20, 2024
566e894
move nested-capable `progress_bar` method into the pipelines
toshas May 20, 2024
b1a151e
minor message improvement
toshas May 20, 2024
03eb96e
fix cpu offloading
toshas May 20, 2024
f4845ba
move colormap, visualize_depth, export_depth_to_16bit_png, visualize_…
toshas May 21, 2024
b5052e3
fix missing comma
toshas May 21, 2024
0f308bb
change torch.FloatTensor to torch.Tensor
toshas May 21, 2024
9db8c17
fix importing of MarigoldImageProcessor
toshas May 21, 2024
c56d2fa
fix vae offloading
toshas May 21, 2024
f5cfeaf
implement marigold's intial tests
toshas May 21, 2024
662e1d5
fix num_images computation
toshas May 21, 2024
f03e27d
remove MarigoldImageProcessor and outputs from import structure
toshas May 21, 2024
6507f1b
update docstrings
toshas May 21, 2024
596ada3
update init
yiyixuxu May 21, 2024
1b89bf6
update
yiyixuxu May 21, 2024
9814973
style
May 21, 2024
f2a74ac
fix
yiyixuxu May 21, 2024
b35c6ef
fix
yiyixuxu May 21, 2024
412aafc
up
May 21, 2024
6497802
up
yiyixuxu May 21, 2024
a4d321c
up
May 21, 2024
a778f4d
add simple test
yiyixuxu May 21, 2024
d0da66c
up
May 21, 2024
107c1da
update expected np input/output to be channel last
May 22, 2024
2345f6e
move expand_tensor_or_array into the MarigoldImageProcessor
toshas May 23, 2024
b3f3c88
rewrite tests to follow conventions - hardcoded slices instead of ima…
toshas May 23, 2024
80dcdf5
Merge branch 'main' into pipeline_marigold_pr
toshas May 23, 2024
16e07a1
add basic docs.
sayakpaul May 24, 2024
4aa8ce5
add anton's contribution statement
sayakpaul May 24, 2024
0b495c1
Merge branch 'main' into pipeline_marigold_pr
sayakpaul May 24, 2024
a2df838
remove todos.
sayakpaul May 24, 2024
00082c8
fix assertion values for marigold depth slow tests
sayakpaul May 24, 2024
09276fb
fix assertion values for depth normals.
sayakpaul May 24, 2024
c044f2e
remove print
sayakpaul May 24, 2024
699976c
Merge branch 'main' into pipeline_marigold_pr
sayakpaul May 24, 2024
b0fc7f3
support AutoencoderTiny in the pipelines
toshas May 25, 2024
10fc97d
update documentation page
toshas May 26, 2024
6e053b4
fix missing import in docstring
toshas May 26, 2024
56f7980
[doc] add marigold to pipelines overview
toshas May 26, 2024
2cbe30b
[doc] add section "usage examples"
toshas May 26, 2024
35925b9
fix an issue with latents check in the pipelines
toshas May 27, 2024
f808e88
add "Frame-by-frame Video Processing with Consistency" section
toshas May 27, 2024
183327a
grammarly
toshas May 27, 2024
4b286f8
replace tables with images with css-styled images (blindly)
toshas May 27, 2024
59fe25c
Merge branch 'main' into pipeline_marigold_pr
sayakpaul May 27, 2024
c587f5d
style
sayakpaul May 27, 2024
9dc2f31
print
sayakpaul May 27, 2024
304ab64
fix the assertions.
sayakpaul May 27, 2024
ab8d5ce
Merge branch 'main' into pipeline_marigold_pr
sayakpaul May 27, 2024
0494131
take from the github runner.
sayakpaul May 27, 2024
bfaf002
take the slices from action artifacts
sayakpaul May 27, 2024
ad3e484
style.
sayakpaul May 27, 2024
cab0507
update with the slices from the runner.
sayakpaul May 27, 2024
427637c
remove unnecessary code blocks.
sayakpaul May 27, 2024
037fa29
Merge branch 'main' into pipeline_marigold_pr
toshas May 27, 2024
2da329e
Revert "[doc] add marigold to pipelines overview"
toshas May 27, 2024
bdd21be
remove invitation for new modalities
toshas May 27, 2024
3815778
split out marigold usage examples
toshas May 27, 2024
20b08ba
doc cleanup
sayakpaul May 27, 2024
8aae163
Merge branch 'main' into pipeline_marigold_pr
sayakpaul May 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@
title: Trajectory Consistency Distillation-LoRA
- local: using-diffusers/svd
title: Stable Video Diffusion
- local: using-diffusers/marigold_usage
title: Marigold Computer Vision
title: Specific pipeline examples
- sections:
- local: training/overview
Expand Down Expand Up @@ -295,6 +297,8 @@
title: Latent Diffusion
- local: api/pipelines/ledits_pp
title: LEDITS++
- local: api/pipelines/marigold
title: Marigold
- local: api/pipelines/panorama
title: MultiDiffusion
- local: api/pipelines/musicldm
Expand Down Expand Up @@ -445,4 +449,4 @@
title: Video Processor
title: Internal classes
isExpanded: false
title: API
title: API
76 changes: 76 additions & 0 deletions docs/source/en/api/pipelines/marigold.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
<!--Copyright 2024 Marigold authors and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Marigold Pipelines for Computer Vision Tasks

![marigold](https://marigoldmonodepth.github.io/images/teaser_collage_compressed.jpg)

Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en).
The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks.
Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above.
Later,
- [Tianfu Wang](https://tianfwang.github.io/) trained the first Latent Consistency Model (LCM) of Marigold, which unlocked fast single-step inference;
- [Kevin Qu](https://www.linkedin.com/in/kevin-qu-b3417621b/?locale=en_US) extended the approach to Surface Normals Estimation;
- [Anton Obukhov](https://www.obukhov.ai/) contributed the pipelines and documentation into diffusers (enabled and supported by [YiYi Xu](https://yiyixuxu.github.io/) and [Sayak Paul](https://sayak.dev/)).

The abstract from the paper is:

*Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://marigoldmonodepth.github.io.*

## Available Pipelines

Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image.
Currently, the following tasks are implemented:

| Pipeline | Predicted Modalities | Demos |
|---------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
| [MarigoldDepthPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_depth.py) | [Depth](https://en.wikipedia.org/wiki/Depth_map), [Disparity](https://en.wikipedia.org/wiki/Binocular_disparity) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-lcm), [Slow Original Demo (DDIM)](https://huggingface.co/spaces/prs-eth/marigold) |
| [MarigoldNormalsPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_normals.py) | [Surface normals](https://en.wikipedia.org/wiki/Normal_mapping) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-normals-lcm) |


## Available Checkpoints

The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization.

<Tip>

Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines. Also, to know more about reducing the memory usage of this pipeline, refer to the ["Reduce memory usage"] section [here](../../using-diffusers/svd#reduce-memory-usage).

</Tip>

<Tip warning={true}>

Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`.
Depending on the scheduler, the number of inference steps required to get reliable predictions varies, and there is no universal value that works best across schedulers.
Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference).
Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`.
This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument.

</Tip>

See also Marigold [usage examples](marigold_usage).

## MarigoldDepthPipeline
[[autodoc]] MarigoldDepthPipeline
- all
- __call__

## MarigoldNormalsPipeline
[[autodoc]] MarigoldNormalsPipeline
- all
- __call__

## MarigoldDepthOutput
[[autodoc]] pipelines.marigold.pipeline_marigold_depth.MarigoldDepthOutput

## MarigoldNormalsOutput
[[autodoc]] pipelines.marigold.pipeline_marigold_normals.MarigoldNormalsOutput
Loading
Loading