Skip to content

Commit 15c72be

Browse files
nikhilaravifacebook-github-bot
authored andcommitted
Fix coordinate system conventions in renderer
Summary: ## Updates - Defined the world and camera coordinates according to this figure. The world coordinates are defined as having +Y up, +X left and +Z in. {F230888499} - Removed all flipping from blending functions. - Updated the rasterizer to return images with +Y up and +X left. - Updated all the mesh rasterizer tests - The expected values are now defined in terms of the default +Y up, +X left - Added tests where the triangles in the meshes are non symmetrical so that it is clear which direction +X and +Y are ## Questions: - Should we have **scene settings** instead of raster settings? - To be more correct we should be [z clipping in the rasterizer based on the far/near clipping planes](https://github.com/ShichenLiu/SoftRas/blob/master/soft_renderer/cuda/soft_rasterize_cuda_kernel.cu#L400) - these values are also required in the blending functions so should we make these scene level parameters and have a scene settings tuple which is available to the rasterizer and shader? Reviewed By: gkioxari Differential Revision: D20208604 fbshipit-source-id: 55787301b1bffa0afa9618f0a0886cc681da51f3
1 parent 767d68a commit 15c72be

27 files changed

+522
-482
lines changed
-83.8 KB
Loading
61.7 KB
Loading

docs/notes/renderer_getting_started.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,19 +34,22 @@ The differentiable renderer API is experimental and subject to change!.
3434

3535
### Coordinate transformation conventions
3636

37-
Rendering requires transformations between several different coordinate frames: world space, view/camera space, NDC space and screen space. At each step it is important to know where the camera is located, how the x,y,z axes are aligned and the possible range of values. The following figure outlines the conventions used PyTorch3d.
37+
Rendering requires transformations between several different coordinate frames: world space, view/camera space, NDC space and screen space. At each step it is important to know where the camera is located, how the +X, +Y, +Z axes are aligned and the possible range of values. The following figure outlines the conventions used PyTorch3d.
3838

3939
<img src="assets/transformations_overview.png" width="1000">
4040

4141

42+
For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page.
43+
44+
<img src="assets/world_camera_image.png" width="1000">
4245

4346
---
4447

4548
**NOTE: PyTorch3d vs OpenGL**
4649

47-
While we tried to emulate several aspects of OpenGL, the NDC coordinate system in PyTorch3d is **right-handed** compared with a **left-handed** NDC coordinate system in OpenGL (the projection matrix switches the handedness).
48-
49-
In OpenGL, the camera at the origin is looking along `-z` axis in camera space, but it is looking along the `+z` axis in NDC space.
50+
While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions.
51+
- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen. Both are right handed.
52+
- The NDC coordinate system in PyTorch3d is **right-handed** compared with a **left-handed** NDC coordinate system in OpenGL (the projection matrix switches the handedness).
5053

5154
<img align="center" src="assets/opengl_coordframes.png" width="300">
5255

@@ -60,7 +63,7 @@ A renderer in PyTorch3d is composed of a **rasterizer** and a **shader**. Create
6063
from pytorch3d.renderer import (
6164
OpenGLPerspectiveCameras, look_at_view_transform,
6265
RasterizationSettings, BlendParams,
63-
MeshRenderer, MeshRasterizer, PhongShader
66+
MeshRenderer, MeshRasterizer, HardPhongShader
6467
)
6568
6669
# Initialize an OpenGL perspective camera.
@@ -81,7 +84,7 @@ raster_settings = RasterizationSettings(
8184
# PhongShader, passing in the device on which to initialize the default parameters
8285
renderer = MeshRenderer(
8386
rasterizer=MeshRasterizer(cameras=cameras, raster_settings=raster_settings),
84-
shader=PhongShader(device=device, cameras=cameras)
87+
shader=HardPhongShader(device=device, cameras=cameras)
8588
)
8689
```
8790

docs/tutorials/camera_position_optimization_with_differentiable_rendering.ipynb

Lines changed: 78 additions & 63 deletions
Large diffs are not rendered by default.

docs/tutorials/render_textured_meshes.ipynb

Lines changed: 47 additions & 98 deletions
Large diffs are not rendered by default.

pytorch3d/csrc/rasterize_meshes/rasterize_meshes.cu

Lines changed: 46 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -189,12 +189,12 @@ __global__ void RasterizeMeshesNaiveCudaKernel(
189189
const float* face_verts,
190190
const int64_t* mesh_to_face_first_idx,
191191
const int64_t* num_faces_per_mesh,
192-
float blur_radius,
193-
bool perspective_correct,
194-
int N,
195-
int H,
196-
int W,
197-
int K,
192+
const float blur_radius,
193+
const bool perspective_correct,
194+
const int N,
195+
const int H,
196+
const int W,
197+
const int K,
198198
int64_t* face_idxs,
199199
float* zbuf,
200200
float* pix_dists,
@@ -207,8 +207,10 @@ __global__ void RasterizeMeshesNaiveCudaKernel(
207207
// Convert linear index to 3D index
208208
const int n = i / (H * W); // batch index.
209209
const int pix_idx = i % (H * W);
210-
const int yi = pix_idx / H;
211-
const int xi = pix_idx % W;
210+
211+
// Determine ordering based on axis convention.
212+
const int yi = H - 1 - pix_idx / W;
213+
const int xi = W - 1 - pix_idx % W;
212214

213215
// screen coordinates to ndc coordiantes of pixel.
214216
const float xf = PixToNdc(xi, W);
@@ -254,7 +256,7 @@ __global__ void RasterizeMeshesNaiveCudaKernel(
254256

255257
// TODO: make sorting an option as only top k is needed, not sorted values.
256258
BubbleSort(q, q_size);
257-
int idx = n * H * W * K + yi * H * K + xi * K;
259+
int idx = n * H * W * K + pix_idx * K;
258260
for (int k = 0; k < q_size; ++k) {
259261
face_idxs[idx + k] = q[k].idx;
260262
zbuf[idx + k] = q[k].z;
@@ -274,7 +276,7 @@ RasterizeMeshesNaiveCuda(
274276
const int image_size,
275277
const float blur_radius,
276278
const int num_closest,
277-
bool perspective_correct) {
279+
const bool perspective_correct) {
278280
if (face_verts.ndimension() != 3 || face_verts.size(1) != 3 ||
279281
face_verts.size(2) != 3) {
280282
AT_ERROR("face_verts must have dimensions (num_faces, 3, 3)");
@@ -331,12 +333,12 @@ RasterizeMeshesNaiveCuda(
331333
__global__ void RasterizeMeshesBackwardCudaKernel(
332334
const float* face_verts, // (F, 3, 3)
333335
const int64_t* pix_to_face, // (N, H, W, K)
334-
bool perspective_correct,
335-
int N,
336-
int F,
337-
int H,
338-
int W,
339-
int K,
336+
const bool perspective_correct,
337+
const int N,
338+
const int F,
339+
const int H,
340+
const int W,
341+
const int K,
340342
const float* grad_zbuf, // (N, H, W, K)
341343
const float* grad_bary, // (N, H, W, K, 3)
342344
const float* grad_dists, // (N, H, W, K)
@@ -351,17 +353,20 @@ __global__ void RasterizeMeshesBackwardCudaKernel(
351353
// Convert linear index to 3D index
352354
const int n = t_i / (H * W); // batch index.
353355
const int pix_idx = t_i % (H * W);
354-
const int yi = pix_idx / H;
355-
const int xi = pix_idx % W;
356+
357+
// Determine ordering based on axis convention.
358+
const int yi = H - 1 - pix_idx / W;
359+
const int xi = W - 1 - pix_idx % W;
360+
356361
const float xf = PixToNdc(xi, W);
357362
const float yf = PixToNdc(yi, H);
358363
const float2 pxy = make_float2(xf, yf);
359364

360365
// Loop over all the faces for this pixel.
361366
for (int k = 0; k < K; k++) {
362367
// Index into (N, H, W, K, :) grad tensors
363-
const int i =
364-
n * H * W * K + yi * H * K + xi * K + k; // pixel index + face index
368+
// pixel index + top k index
369+
int i = n * H * W * K + pix_idx * K + k;
365370

366371
const int f = pix_to_face[i];
367372
if (f < 0) {
@@ -451,7 +456,7 @@ torch::Tensor RasterizeMeshesBackwardCuda(
451456
const torch::Tensor& grad_zbuf, // (N, H, W, K)
452457
const torch::Tensor& grad_bary, // (N, H, W, K, 3)
453458
const torch::Tensor& grad_dists, // (N, H, W, K)
454-
bool perspective_correct) {
459+
const bool perspective_correct) {
455460
const int F = face_verts.size(0);
456461
const int N = pix_to_face.size(0);
457462
const int H = pix_to_face.size(1);
@@ -509,6 +514,7 @@ __global__ void RasterizeMeshesCoarseCudaKernel(
509514
// Have each block handle a chunk of faces
510515
const int chunks_per_batch = 1 + (F - 1) / chunk_size;
511516
const int num_chunks = N * chunks_per_batch;
517+
512518
for (int chunk = blockIdx.x; chunk < num_chunks; chunk += gridDim.x) {
513519
const int batch_idx = chunk / chunks_per_batch; // batch index
514520
const int chunk_idx = chunk % chunks_per_batch;
@@ -551,17 +557,21 @@ __global__ void RasterizeMeshesCoarseCudaKernel(
551557
// Y coordinate of the top and bottom of the bin.
552558
// PixToNdc gives the location of the center of each pixel, so we
553559
// need to add/subtract a half pixel to get the true extent of the bin.
554-
const float bin_y_min = PixToNdc(by * bin_size, H) - half_pix;
555-
const float bin_y_max = PixToNdc((by + 1) * bin_size - 1, H) + half_pix;
560+
// Reverse ordering of Y axis so that +Y is upwards in the image.
561+
const int yidx = num_bins - by;
562+
float bin_y_max = PixToNdc(yidx * bin_size - 1, H) + half_pix;
563+
float bin_y_min = PixToNdc((yidx - 1) * bin_size, H) - half_pix;
564+
556565
const bool y_overlap = (ymin <= bin_y_max) && (bin_y_min < ymax);
557566

558567
for (int bx = 0; bx < num_bins; ++bx) {
559568
// X coordinate of the left and right of the bin.
560-
const float bin_x_min = PixToNdc(bx * bin_size, W) - half_pix;
561-
const float bin_x_max =
562-
PixToNdc((bx + 1) * bin_size - 1, W) + half_pix;
563-
const bool x_overlap = (xmin <= bin_x_max) && (bin_x_min < xmax);
569+
// Reverse ordering of x axis so that +X is left.
570+
const int xidx = num_bins - bx;
571+
float bin_x_max = PixToNdc(xidx * bin_size - 1, W) + half_pix;
572+
float bin_x_min = PixToNdc((xidx - 1) * bin_size, W) - half_pix;
564573

574+
const bool x_overlap = (xmin <= bin_x_max) && (bin_x_min < xmax);
565575
if (y_overlap && x_overlap) {
566576
binmask.set(by, bx, f);
567577
}
@@ -654,7 +664,6 @@ torch::Tensor RasterizeMeshesCoarseCuda(
654664
// ****************************************************************************
655665
// * FINE RASTERIZATION *
656666
// ****************************************************************************
657-
658667
__global__ void RasterizeMeshesFineCudaKernel(
659668
const float* face_verts, // (F, 3, 3)
660669
const int32_t* bin_faces, // (N, B, B, T)
@@ -695,8 +704,14 @@ __global__ void RasterizeMeshesFineCudaKernel(
695704

696705
if (yi >= H || xi >= W)
697706
continue;
698-
const float xf = PixToNdc(xi, W);
699-
const float yf = PixToNdc(yi, H);
707+
708+
// Reverse ordering of the X and Y axis so that
709+
// in the image +Y is pointing up and +X is pointing left.
710+
const int yidx = H - 1 - yi;
711+
const int xidx = W - 1 - xi;
712+
713+
const float xf = PixToNdc(xidx, W);
714+
const float yf = PixToNdc(yidx, H);
700715
const float2 pxy = make_float2(xf, yf);
701716

702717
// This part looks like the naive rasterization kernel, except we use
@@ -751,7 +766,7 @@ RasterizeMeshesFineCuda(
751766
const float blur_radius,
752767
const int bin_size,
753768
const int faces_per_pixel,
754-
bool perspective_correct) {
769+
const bool perspective_correct) {
755770
if (face_verts.ndimension() != 3 || face_verts.size(1) != 3 ||
756771
face_verts.size(2) != 3) {
757772
AT_ERROR("face_verts must have dimensions (num_faces, 3, 3)");

pytorch3d/csrc/rasterize_meshes/rasterize_meshes.h

Lines changed: 44 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,21 @@ RasterizeMeshesNaiveCpu(
1414
const torch::Tensor& face_verts,
1515
const torch::Tensor& mesh_to_face_first_idx,
1616
const torch::Tensor& num_faces_per_mesh,
17-
int image_size,
18-
float blur_radius,
19-
int faces_per_pixel,
20-
bool perspective_correct);
17+
const int image_size,
18+
const float blur_radius,
19+
const int faces_per_pixel,
20+
const bool perspective_correct);
2121

2222
#ifdef WITH_CUDA
2323
std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>
2424
RasterizeMeshesNaiveCuda(
2525
const at::Tensor& face_verts,
2626
const at::Tensor& mesh_to_face_first_idx,
2727
const at::Tensor& num_faces_per_mesh,
28-
int image_size,
29-
float blur_radius,
30-
int num_closest,
31-
bool perspective_correct);
28+
const int image_size,
29+
const float blur_radius,
30+
const int num_closest,
31+
const bool perspective_correct);
3232
#endif
3333
// Forward pass for rasterizing a batch of meshes.
3434
//
@@ -77,10 +77,10 @@ RasterizeMeshesNaive(
7777
const torch::Tensor& face_verts,
7878
const torch::Tensor& mesh_to_face_first_idx,
7979
const torch::Tensor& num_faces_per_mesh,
80-
int image_size,
81-
float blur_radius,
82-
int faces_per_pixel,
83-
bool perspective_correct) {
80+
const int image_size,
81+
const float blur_radius,
82+
const int faces_per_pixel,
83+
const bool perspective_correct) {
8484
// TODO: Better type checking.
8585
if (face_verts.type().is_cuda()) {
8686
#ifdef WITH_CUDA
@@ -117,7 +117,7 @@ torch::Tensor RasterizeMeshesBackwardCpu(
117117
const torch::Tensor& grad_bary,
118118
const torch::Tensor& grad_zbuf,
119119
const torch::Tensor& grad_dists,
120-
bool perspective_correct);
120+
const bool perspective_correct);
121121

122122
#ifdef WITH_CUDA
123123
torch::Tensor RasterizeMeshesBackwardCuda(
@@ -126,7 +126,7 @@ torch::Tensor RasterizeMeshesBackwardCuda(
126126
const torch::Tensor& grad_bary,
127127
const torch::Tensor& grad_zbuf,
128128
const torch::Tensor& grad_dists,
129-
bool perspective_correct);
129+
const bool perspective_correct);
130130
#endif
131131

132132
// Args:
@@ -159,7 +159,7 @@ torch::Tensor RasterizeMeshesBackward(
159159
const torch::Tensor& grad_zbuf,
160160
const torch::Tensor& grad_bary,
161161
const torch::Tensor& grad_dists,
162-
bool perspective_correct) {
162+
const bool perspective_correct) {
163163
if (face_verts.type().is_cuda()) {
164164
#ifdef WITH_CUDA
165165
return RasterizeMeshesBackwardCuda(
@@ -191,20 +191,20 @@ torch::Tensor RasterizeMeshesCoarseCpu(
191191
const torch::Tensor& face_verts,
192192
const at::Tensor& mesh_to_face_first_idx,
193193
const at::Tensor& num_faces_per_mesh,
194-
int image_size,
195-
float blur_radius,
196-
int bin_size,
197-
int max_faces_per_bin);
194+
const int image_size,
195+
const float blur_radius,
196+
const int bin_size,
197+
const int max_faces_per_bin);
198198

199199
#ifdef WITH_CUDA
200200
torch::Tensor RasterizeMeshesCoarseCuda(
201201
const torch::Tensor& face_verts,
202202
const torch::Tensor& mesh_to_face_first_idx,
203203
const torch::Tensor& num_faces_per_mesh,
204-
int image_size,
205-
float blur_radius,
206-
int bin_size,
207-
int max_faces_per_bin);
204+
const int image_size,
205+
const float blur_radius,
206+
const int bin_size,
207+
const int max_faces_per_bin);
208208
#endif
209209
// Args:
210210
// face_verts: Tensor of shape (F, 3, 3) giving (packed) vertex positions for
@@ -232,10 +232,10 @@ torch::Tensor RasterizeMeshesCoarse(
232232
const torch::Tensor& face_verts,
233233
const torch::Tensor& mesh_to_face_first_idx,
234234
const torch::Tensor& num_faces_per_mesh,
235-
int image_size,
236-
float blur_radius,
237-
int bin_size,
238-
int max_faces_per_bin) {
235+
const int image_size,
236+
const float blur_radius,
237+
const int bin_size,
238+
const int max_faces_per_bin) {
239239
if (face_verts.type().is_cuda()) {
240240
#ifdef WITH_CUDA
241241
return RasterizeMeshesCoarseCuda(
@@ -270,11 +270,11 @@ std::tuple<torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor>
270270
RasterizeMeshesFineCuda(
271271
const torch::Tensor& face_verts,
272272
const torch::Tensor& bin_faces,
273-
int image_size,
274-
float blur_radius,
275-
int bin_size,
276-
int faces_per_pixel,
277-
bool perspective_correct);
273+
const int image_size,
274+
const float blur_radius,
275+
const int bin_size,
276+
const int faces_per_pixel,
277+
const bool perspective_correct);
278278
#endif
279279
// Args:
280280
// face_verts: Tensor of shape (F, 3, 3) giving (packed) vertex positions for
@@ -317,11 +317,11 @@ std::tuple<torch::Tensor, torch::Tensor, torch::Tensor, torch::Tensor>
317317
RasterizeMeshesFine(
318318
const torch::Tensor& face_verts,
319319
const torch::Tensor& bin_faces,
320-
int image_size,
321-
float blur_radius,
322-
int bin_size,
323-
int faces_per_pixel,
324-
bool perspective_correct) {
320+
const int image_size,
321+
const float blur_radius,
322+
const int bin_size,
323+
const int faces_per_pixel,
324+
const bool perspective_correct) {
325325
if (face_verts.type().is_cuda()) {
326326
#ifdef WITH_CUDA
327327
return RasterizeMeshesFineCuda(
@@ -373,6 +373,7 @@ RasterizeMeshesFine(
373373
// this function instead returns screen-space
374374
// barycentric coordinates for each pixel.
375375
//
376+
//
376377
// Returns:
377378
// A 4 element tuple of:
378379
// pix_to_face: int64 tensor of shape (N, H, W, K) giving the face index of
@@ -394,12 +395,12 @@ RasterizeMeshes(
394395
const torch::Tensor& face_verts,
395396
const torch::Tensor& mesh_to_face_first_idx,
396397
const torch::Tensor& num_faces_per_mesh,
397-
int image_size,
398-
float blur_radius,
399-
int faces_per_pixel,
400-
int bin_size,
401-
int max_faces_per_bin,
402-
bool perspective_correct) {
398+
const int image_size,
399+
const float blur_radius,
400+
const int faces_per_pixel,
401+
const int bin_size,
402+
const int max_faces_per_bin,
403+
const bool perspective_correct) {
403404
if (bin_size > 0 && max_faces_per_bin > 0) {
404405
// Use coarse-to-fine rasterization
405406
auto bin_faces = RasterizeMeshesCoarse(

0 commit comments

Comments
 (0)