Skip to content

Commit 4ddf8fb

Browse files
nikhilaravifacebook-github-bot
authored andcommitted
updates to renderer doc text
Summary: Pull Request resolved: #3 Reviewed By: gkioxari Differential Revision: D19523239 Pulled By: nikhilaravi fbshipit-source-id: 46b3d52cdabbddb4fecba6038f6ca23a0b736571
1 parent f6c2987 commit 4ddf8fb

File tree

3 files changed

+33
-29
lines changed

3 files changed

+33
-29
lines changed

docs/figs/subset_batch_size_128.png

-51.3 KB
Binary file not shown.

docs/notes/renderer.md

Lines changed: 32 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,18 @@
11
# Differentiable Rendering
22

3-
Differentiable rendering is an exciting research area but it is not yet clear what the best practices are. We extensively researched existing codebases and found that:
3+
Differentiable rendering is a relatively new and exciting research area in computer vision, bridging the gap between 2D and 3D by allowing 2D image pixels to be related back to 3D properties of a scene.
4+
5+
For example, by rendering an image from a 3D shape predicted by a neural network, it is possible to compute a 2D loss with a target image. Inverting the rendering step means we can relate the 2D loss from the pixels back to the 3D properties of the shape such as the positions of mesh vertices, enabling 3D shapes to be learnt without any explicit 3D supervision.
6+
7+
We extensively researched existing codebases for differentiable rendering and found that:
48
- the rendering pipeline is complex with more than 7 separate components which need to interoperate and be differentiable
5-
- popular existing approaches [[1](#1), [2](#2)] are based on the same core implementation which bundles many of the key components into large CUDA kernels which require significant expertise to understand and has limited scope for extensions
9+
- popular existing approaches [[1](#1), [2](#2)] are based on the same core implementation which bundles many of the key components into large CUDA kernels which require significant expertise to understand, and has limited scope for extensions
610
- existing methods either do not support batching or assume that meshes in a batch have the same number of vertices and faces
711
- existing projects only provide CUDA implementations so they cannot be used without GPUs
812

9-
In order to experiment with different approaches we wanted a modular implementation that is easy to use and extend and supports [heterogeneous batching](batching.md). Taking inspiration from existing work in this area [[1](#1), [2](#2)] we have created a new modular, differentiable renderer with **parallel implementations in PyTorch, C++ and CUDA**.
13+
In order to experiment with different approaches, we wanted a modular implementation that is easy to use and extend, and supports [heterogeneous batching](batching.md).
14+
15+
Taking inspiration from existing work [[1](#1), [2](#2)], we have created a new, modular, differentiable renderer with **parallel implementations in PyTorch, C++ and CUDA**, as well as comprehensive documentation and tests, with the aim of helping to further research in this field.
1016

1117
Our implementation decouples the rasterization and shading steps of rendering. The core rasterization step (based on [[2]](#2)) returns several intermediate variables and has an optimized implementation in CUDA. The rest of the pipeline is implemented purely in PyTorch, and is designed to be customized and extended. With this approach, the PyTorch3d differentiable renderer can be imported as a library.
1218

@@ -15,32 +21,39 @@ Our implementation decouples the rasterization and shading steps of rendering. T
1521
To learn about more the implementation and start using the renderer refer to [renderer_getting_started.md](renderer_getting_started.md), which also contains the [architecture overview](../figs/architecture_overview.png) and [coordinate transformation conventions](../figs/transformations_overview.png).
1622

1723

18-
##<u>Key features</u>
24+
## <u>Key features</u>
1925

2026
### 1. CUDA support for fast rasterization of large meshes
2127

22-
We implemented modular CUDA kernels for the forward and backward pass of rasterization, adaptating a traditional graphics approach known as "coarse to fine" rasterization.
28+
We implemented modular CUDA kernels for the forward and backward pass of rasterization, adaptating a traditional graphics approach known as "coarse-to-fine" rasterization.
2329

24-
First, the image is divided into a coarse grid and mesh faces are allocated to the grid cell in which they occur. This is followed by a refinement step which does pixel wise rasterization of the reduced subset of faces per grid cell. The grid size is a parameter which can be varied.
30+
First, the image is divided into a coarse grid and mesh faces are allocated to the grid cell in which they occur. This is followed by a refinement step which does pixel wise rasterization of the reduced subset of faces per grid cell. The grid cell size is a parameter which can be varied (`bin_size`).
2531

26-
We additionally introduce a parameter `faces_per_pixel` which allows users to specify the top K faces which should be returned per pixel in the image (as opposed to traditional rasterization which returns only the index of the closest face in the mesh per pixel). The top K face properties can then be aggregated using different methods (such as the sigmoid/softmax approach proposed by Li et at in for SoftRasterizer [[2]](#2)).
32+
We additionally introduce a parameter `faces_per_pixel` which allows users to specify the top K faces which should be returned per pixel in the image (as opposed to traditional rasterization which returns only the index of the closest face in the mesh per pixel). The top K face properties can then be aggregated using different methods (such as the sigmoid/softmax approach proposed by Li et at in SoftRasterizer [[2]](#2)).
2733

28-
We compared with the SoftRasterizer, to measure the effect of both these design changes on the speed of rasterizing a set of meshes of different sizes from ShapeNetV1 core. We rasterize one mesh in each batch to produce images of different sizes and measure the speed of the forward and backward passes.
34+
We compared PyTorch3d with SoftRasterizer to measure the effect of both these design changes on the speed of rasterization. We selected a set of meshes of different sizes from ShapeNetV1 core, and rasterized one mesh in each batch to produce images of different sizes. We report the speed of the forward and backward passes.
2935

3036
**Fig 1: PyTorch3d Naive vs Coarse-to-fine**
3137

32-
This figure shows how the coarse to fine strategy for rasterization results in significant speed up compared to naive rasterization. This is especially clear in large images.
38+
This figure shows how the coarse-to-fine strategy for rasterization results in significant speed up compared to naive rasterization for large image size and large mesh sizes.
3339

3440
<img src="../figs/p3d_naive_vs_coarse.png" width="1000">
3541

3642

43+
For small mesh and image sizes, the naive approach is slightly faster. We advise that you understand the data you are using and choose the rasterization setting which suits your performance requirements. It is easy to switch between the naive and coarse-to-fine options by adjusting the `bin_size` value when initializing the [rasterization settings](https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/renderer/mesh/rasterizer.py#L26).
44+
45+
Setting `bin_size = 0` will enable naive rasterization. If `bin_size > 0`, the coarse-to-fine approach is used. The default is `bin_size = None` in which case we set the bin size based on [heuristics](https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/renderer/mesh/rasterize_meshes.py#L92).
46+
3747
**Fig 2: PyTorch3d Coarse-to-fine vs SoftRasterizer**
3848

39-
This figure shows the speed up in the full forward and backward pass enabled by the combination of coarse-to-fine approach and caching the faces rasterized per pixel returned from the forward pass. In the SoftRasterizer implementation, in both the forward and backward pass, there is a loop over every single face in the mesh for every pixel in the image. Therefore, the time for the full forward plus backward pass is ~2x the time for the forward pass.
49+
This figure shows the effect of the _combination_ of coarse-to-fine rasterization and caching the faces rasterized per pixel returned from the forward pass. For large meshes and image sizes, we again observe that the PyTorch3d rasterizer is significantly faster.
50+
51+
In the SoftRasterizer implementation, in both the forward and backward pass, there is a loop over every single face in the mesh for every pixel in the image. Therefore, the time for the full forward plus backward pass is ~2x the time for the forward pass. For small mesh and image sizes, the SoftRasterizer approach is slightly faster.
4052

4153
<img src="../figs/p3d_vs_softras.png" width="1000">
4254

4355

56+
4457
### 2. Support for Heterogeneous Batches
4558

4659
PyTorch3d supports efficient rendering of batches of meshes where each mesh has different numbers of vertices and faces. This is done without using padded inputs.
@@ -49,37 +62,28 @@ We again compare with SoftRasterizer which only supports batches of homogeneous
4962

5063
We group meshes from Shapenet into bins based on the number of faces in the mesh, and sample to compose a batch. We then render images of fixed size and measure the speed of the forward and backward passes.
5164

52-
We tested with a range of increasingly large bin sizes.
65+
We tested with a range of increasingly large meshes and bin sizes.
5366

5467
**Fig 3: PyTorch3d heterogeneous batching compared with SoftRasterizer**
5568

5669
<img src="../figs/fullset_batch_size_16.png" width="700"/>
5770

58-
**Fig 3(a):** This shows that for large meshes and large bin width (i.e. more variation in mesh size in the batch) the heterogeneous batching approach in PyTorch3d is faster than either of the workarounds with SoftRasterizer.
59-
(settings: batch size = 16, mesh sizes in bins ranging from 500-350k faces, image size = 64)
60-
61-
<img src="../figs/subset_batch_size_128.png" width="700"/>
62-
63-
**Fig 3(b):** For larger batch sizes with smaller mesh sizes and bin sizes, PyTorch3d is still comparably fast with improved performance again in the cases of larger meshes and larger bin width.
64-
(settings: batch size in [64, 128] for mesh sizes in bins from 500-10k faces, image size = 128)
71+
This shows that for large meshes and large bin width (i.e. more variation in mesh size in the batch) the heterogeneous batching approach in PyTorch3d is faster than either of the workarounds with SoftRasterizer.
6572

73+
(settings: batch size = 16, mesh sizes in bins ranging from 500-350k faces, image size = 64, faces per pixel = 100)
6674

6775
---
6876

6977
**NOTE: CUDA Memory usage**
7078

71-
The SoftRasterizer forward CUDA kernel only outputs one (N, H, W, 4) FloatTensor compared with the PyTorch3d rasterizer forward CUDA kernel which outputs 4 tensors:
72-
- `pix_to_face`, LongTensor `(N, H, W, K)`
73-
- `zbuf`, FloatTensor `(N, H, W, K)`
74-
- `dist`, FloatTensor `(N, H, W, K)`
75-
- `bary_coords`, FloatTensor `(N, H, W, K, 3)`
79+
The SoftRasterizer forward CUDA kernel only outputs one `(N, H, W, 4)` FloatTensor compared with the PyTorch3d rasterizer forward CUDA kernel which outputs 4 tensors:
7680

77-
The PyTorch3d backward pass returns gradients for:
78-
- `zbuf`, FloatTensor `(N, H, W, K)`
79-
- `dist`, FloatTensor `(N, H, W, K)`
80-
- `bary_coords`, FloatTensor `(N, H, W, K, 3)`
81+
- `pix_to_face`, LongTensor `(N, H, W, K)`
82+
- `zbuf`, FloatTensor `(N, H, W, K)`
83+
- `dist`, FloatTensor `(N, H, W, K)`
84+
- `bary_coords`, FloatTensor `(N, H, W, K, 3)`
8185

82-
where **N** = batch size, **H/W** are image height/width, **K** is the faces per pixel.
86+
where **N** = batch size, **H/W** are image height/width, **K** is the faces per pixel. The PyTorch3d backward pass returns gradients for `zbuf`, `dist` and `bary_coords`.
8387

8488
Returning intermediate variables from rasterization has an associated memory cost. We can calculate the theoretical lower bound on the memory usage for the forward and backward pass as follows:
8589

docs/notes/renderer_getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Rendering requires transformations between several different coordinate frames:
4141

4242
While we tried to emulate several aspects of OpenGL, the NDC coordinate system in PyTorch3d is **right-handed** compared with a **left-handed** NDC coordinate system in OpenGL (the projection matrix switches the handedness).
4343

44-
In OpenGL, the camera at the origin is looking along -Z axis in camera space, but it is looking along +Z axis in NDC space.
44+
In OpenGL, the camera at the origin is looking along `-z` axis in camera space, but it is looking along the `+z` axis in NDC space.
4545

4646
<img src="../figs/opengl_coordframes.png" width="200">
4747

0 commit comments

Comments
 (0)