Skip to content

Commit 57a22e7

Browse files
gkioxarifacebook-github-bot
authored andcommitted
camera refactoring
Summary: Refactor cameras * CamerasBase was enhanced with `transform_points_screen` that transforms projected points from NDC to screen space * OpenGLPerspective, OpenGLOrthographic -> FoVPerspective, FoVOrthographic * SfMPerspective, SfMOrthographic -> Perspective, Orthographic * PerspectiveCamera can optionally be constructred with screen space parameters * Note on Cameras and coordinate systems was added Reviewed By: nikhilaravi Differential Revision: D23168525 fbshipit-source-id: dd138e2b2cc7e0e0d9f34c45b8251c01266a2063
1 parent 9242e7e commit 57a22e7

File tree

65 files changed

+897
-280
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+897
-280
lines changed

docs/notes/cameras.md

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Cameras
2+
3+
## Camera Coordinate Systems
4+
5+
When working with 3D data, there are 4 coordinate systems users need to know
6+
* **World coordinate system**
7+
This is the system the object/scene lives - the world.
8+
* **Camera view coordinate system**
9+
This is the system that has its origin on the image plane and the `Z`-axis perpendicular to the image plane. In PyTorch3D, we assume that `+X` points left, and `+Y` points up and `+Z` points out from the image plane. The transformation from world to view happens after applying a rotation (`R`) and translation (`T`).
10+
* **NDC coordinate system**
11+
This is the normalized coordinate system that confines in a volume the renderered part of the object/scene. Also known as view volume. Under the PyTorch3D convention, `(+1, +1, znear)` is the top left near corner, and `(-1, -1, zfar)` is the bottom right far corner of the volume. The transformation from view to NDC happens after applying the camera projection matrix (`P`).
12+
* **Screen coordinate system**
13+
This is another representation of the view volume with the `XY` coordinates defined in pixel space instead of a normalized space.
14+
15+
An illustration of the 4 coordinate systems is shown below
16+
![cameras](https://user-images.githubusercontent.com/4369065/90317960-d9b8db80-dee1-11ea-8088-39c414b1e2fa.png)
17+
18+
## Defining Cameras in PyTorch3D
19+
20+
Cameras in PyTorch3D transform an object/scene from world to NDC by first transforming the object/scene to view (via transforms `R` and `T`) and then projecting the 3D object/scene to NDC (via the projection matrix `P`, else known as camera matrix). Thus, the camera parameters in `P` are assumed to be in NDC space. If the user has camera parameters in screen space, which is a common use case, the parameters should transformed to NDC (see below for an example)
21+
22+
We describe the camera types in PyTorch3D and the convention for the camera parameters provided at construction time.
23+
24+
### Camera Types
25+
26+
All cameras inherit from `CamerasBase` which is a base class for all cameras. PyTorch3D provides four different camera types. The `CamerasBase` defines methods that are common to all camera models:
27+
* `get_camera_center` that returns the optical center of the camera in world coordinates
28+
* `get_world_to_view_transform` which returns a 3D transform from world coordinates to the camera view coordinates (R, T)
29+
* `get_full_projection_transform` which composes the projection transform (P) with the world-to-view transform (R, T)
30+
* `transform_points` which takes a set of input points in world coordinates and projects to NDC coordinates ranging from [-1, -1, znear] to [+1, +1, zfar].
31+
* `transform_points_screen` which takes a set of input points in world coordinates and projects them to the screen coordinates ranging from [0, 0, znear] to [W-1, H-1, zfar]
32+
33+
Users can easily customize their own cameras. For each new camera, users should implement the `get_projection_transform` routine that returns the mapping `P` from camera view coordinates to NDC coordinates.
34+
35+
#### FoVPerspectiveCameras, FoVOrthographicCameras
36+
These two cameras follow the OpenGL convention for perspective and orthographic cameras respectively. The user provides the near `znear` and far `zfar` field which confines the view volume in the `Z` axis. The view volume in the `XY` plane is defined by field of view angle (`fov`) in the case of `FoVPerspectiveCameras` and by `min_x, min_y, max_x, max_y` in the case of `FoVOrthographicCameras`.
37+
38+
#### PerspectiveCameras, OrthographicCameras
39+
These two cameras follow the Multi-View Geometry convention for cameras. The user provides the focal length (`fx`, `fy`) and the principal point (`px`, `py`). For example, `camera = PerspectiveCameras(focal_length=((fx, fy),), principal_point=((px, py),))`
40+
41+
As mentioned above, the focal length and principal point are used to convert a point `(X, Y, Z)` from view coordinates to NDC coordinates, as follows
42+
43+
```
44+
# for perspective
45+
x_ndc = fx * X / Z + px
46+
y_ndc = fy * Y / Z + py
47+
z_ndc = 1 / Z
48+
49+
# for orthographic
50+
x_ndc = fx * X + px
51+
y_ndc = fy * Y + py
52+
z_ndc = Z
53+
```
54+
55+
Commonly, users have access to the focal length (`fx_screen`, `fy_screen`) and the principal point (`px_screen`, `py_screen`) in screen space. In that case, to construct the camera the user needs to additionally provide the `image_size = ((image_width, image_height),)`. More precisely, `camera = PerspectiveCameras(focal_length=((fx_screen, fy_screen),), principal_point=((px_screen, py_screen),), image_size = ((image_width, image_height),))`. Internally, the camera parameters are converted from screen to NDC as follows:
56+
57+
```
58+
fx = fx_screen * 2.0 / image_width
59+
fy = fy_screen * 2.0 / image_height
60+
61+
px = - (px_screen - image_width / 2.0) * 2.0 / image_width
62+
py = - (py_screen - image_height / 2.0) * 2.0/ image_height
63+
```

docs/notes/renderer_getting_started.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -39,16 +39,16 @@ Rendering requires transformations between several different coordinate frames:
3939
<img src="assets/transformations_overview.png" width="1000">
4040

4141

42-
For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page.
42+
For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page.
4343

4444
<img src="assets/world_camera_image.png" width="1000">
4545

4646
---
4747

4848
**NOTE: PyTorch3D vs OpenGL**
4949

50-
While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions.
51-
- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen. Both are right handed.
50+
While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions.
51+
- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen. Both are right handed.
5252
- The NDC coordinate system in PyTorch3D is **right-handed** compared with a **left-handed** NDC coordinate system in OpenGL (the projection matrix switches the handedness).
5353

5454
<img align="center" src="assets/opengl_coordframes.png" width="300">
@@ -61,14 +61,14 @@ A renderer in PyTorch3D is composed of a **rasterizer** and a **shader**. Create
6161
```
6262
# Imports
6363
from pytorch3d.renderer import (
64-
OpenGLPerspectiveCameras, look_at_view_transform,
64+
FoVPerspectiveCameras, look_at_view_transform,
6565
RasterizationSettings, BlendParams,
6666
MeshRenderer, MeshRasterizer, HardPhongShader
6767
)
6868
6969
# Initialize an OpenGL perspective camera.
7070
R, T = look_at_view_transform(2.7, 10, 20)
71-
cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)
71+
cameras = FoVPerspectiveCameras(device=device, R=R, T=T)
7272
7373
# Define the settings for rasterization and shading. Here we set the output image to be of size
7474
# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1

docs/tutorials/camera_position_optimization_with_differentiable_rendering.ipynb

+3-3
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@
102102
"\n",
103103
"# rendering components\n",
104104
"from pytorch3d.renderer import (\n",
105-
" OpenGLPerspectiveCameras, look_at_view_transform, look_at_rotation, \n",
105+
" FoVPerspectiveCameras, look_at_view_transform, look_at_rotation, \n",
106106
" RasterizationSettings, MeshRenderer, MeshRasterizer, BlendParams,\n",
107107
" SoftSilhouetteShader, HardPhongShader, PointLights\n",
108108
")"
@@ -217,8 +217,8 @@
217217
},
218218
"outputs": [],
219219
"source": [
220-
"# Initialize an OpenGL perspective camera.\n",
221-
"cameras = OpenGLPerspectiveCameras(device=device)\n",
220+
"# Initialize a perspective camera.\n",
221+
"cameras = FoVPerspectiveCameras(device=device)\n",
222222
"\n",
223223
"# To blend the 100 faces we set a few parameters which control the opacity and the sharpness of \n",
224224
"# edges. Refer to blending.py for more details. \n",

docs/tutorials/fit_textured_mesh.ipynb

+6-6
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@
129129
"from pytorch3d.structures import Meshes, Textures\n",
130130
"from pytorch3d.renderer import (\n",
131131
" look_at_view_transform,\n",
132-
" OpenGLPerspectiveCameras, \n",
132+
" FoVPerspectiveCameras, \n",
133133
" PointLights, \n",
134134
" DirectionalLights, \n",
135135
" Materials, \n",
@@ -309,16 +309,16 @@
309309
"# the cow is facing the -z direction. \n",
310310
"lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])\n",
311311
"\n",
312-
"# Initialize an OpenGL perspective camera that represents a batch of different \n",
312+
"# Initialize a camera that represents a batch of different \n",
313313
"# viewing angles. All the cameras helper methods support mixed type inputs and \n",
314314
"# broadcasting. So we can view the camera from the a distance of dist=2.7, and \n",
315315
"# then specify elevation and azimuth angles for each viewpoint as tensors. \n",
316316
"R, T = look_at_view_transform(dist=2.7, elev=elev, azim=azim)\n",
317-
"cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
317+
"cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
318318
"\n",
319319
"# We arbitrarily choose one particular view that will be used to visualize \n",
320320
"# results\n",
321-
"camera = OpenGLPerspectiveCameras(device=device, R=R[None, 1, ...], \n",
321+
"camera = FoVPerspectiveCameras(device=device, R=R[None, 1, ...], \n",
322322
" T=T[None, 1, ...]) \n",
323323
"\n",
324324
"# Define the settings for rasterization and shading. Here we set the output \n",
@@ -361,7 +361,7 @@
361361
"# Our multi-view cow dataset will be represented by these 2 lists of tensors,\n",
362362
"# each of length num_views.\n",
363363
"target_rgb = [target_images[i, ..., :3] for i in range(num_views)]\n",
364-
"target_cameras = [OpenGLPerspectiveCameras(device=device, R=R[None, i, ...], \n",
364+
"target_cameras = [FoVPerspectiveCameras(device=device, R=R[None, i, ...], \n",
365365
" T=T[None, i, ...]) for i in range(num_views)]"
366366
],
367367
"execution_count": null,
@@ -925,4 +925,4 @@
925925
]
926926
}
927927
]
928-
}
928+
}

docs/tutorials/render_colored_points.ipynb

+5-5
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
"from pytorch3d.structures import Pointclouds\n",
6565
"from pytorch3d.renderer import (\n",
6666
" look_at_view_transform,\n",
67-
" OpenGLOrthographicCameras, \n",
67+
" FoVOrthographicCameras, \n",
6868
" PointsRasterizationSettings,\n",
6969
" PointsRenderer,\n",
7070
" PointsRasterizer,\n",
@@ -147,9 +147,9 @@
147147
"metadata": {},
148148
"outputs": [],
149149
"source": [
150-
"# Initialize an OpenGL perspective camera.\n",
150+
"# Initialize a camera.\n",
151151
"R, T = look_at_view_transform(20, 10, 0)\n",
152-
"cameras = OpenGLOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
152+
"cameras = FoVOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
153153
"\n",
154154
"# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
155155
"# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
@@ -195,9 +195,9 @@
195195
"metadata": {},
196196
"outputs": [],
197197
"source": [
198-
"# Initialize an OpenGL perspective camera.\n",
198+
"# Initialize a camera.\n",
199199
"R, T = look_at_view_transform(20, 10, 0)\n",
200-
"cameras = OpenGLOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
200+
"cameras = FoVOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
201201
"\n",
202202
"# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
203203
"# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",

docs/tutorials/render_textured_meshes.ipynb

+5-5
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@
9090
"from pytorch3d.structures import Meshes, Textures\n",
9191
"from pytorch3d.renderer import (\n",
9292
" look_at_view_transform,\n",
93-
" OpenGLPerspectiveCameras, \n",
93+
" FoVPerspectiveCameras, \n",
9494
" PointLights, \n",
9595
" DirectionalLights, \n",
9696
" Materials, \n",
@@ -286,11 +286,11 @@
286286
},
287287
"outputs": [],
288288
"source": [
289-
"# Initialize an OpenGL perspective camera.\n",
289+
"# Initialize a camera.\n",
290290
"# With world coordinates +Y up, +X left and +Z in, the front of the cow is facing the -Z direction. \n",
291291
"# So we move the camera by 180 in the azimuth direction so it is facing the front of the cow. \n",
292292
"R, T = look_at_view_transform(2.7, 0, 180) \n",
293-
"cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
293+
"cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
294294
"\n",
295295
"# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
296296
"# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
@@ -444,7 +444,7 @@
444444
"source": [
445445
"# Rotate the object by increasing the elevation and azimuth angles\n",
446446
"R, T = look_at_view_transform(dist=2.7, elev=10, azim=-150)\n",
447-
"cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
447+
"cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
448448
"\n",
449449
"# Move the light location so the light is shining on the cow's face. \n",
450450
"lights.location = torch.tensor([[2.0, 2.0, -2.0]], device=device)\n",
@@ -519,7 +519,7 @@
519519
"# view the camera from the same distance and specify dist=2.7 as a float,\n",
520520
"# and then specify elevation and azimuth angles for each viewpoint as tensors. \n",
521521
"R, T = look_at_view_transform(dist=2.7, elev=elev, azim=azim)\n",
522-
"cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
522+
"cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
523523
"\n",
524524
"# Move the light back in front of the cow which is facing the -z direction.\n",
525525
"lights.location = torch.tensor([[0.0, 0.0, -3.0]], device=device)"

pytorch3d/datasets/shapenet_base.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
HardPhongShader,
1111
MeshRasterizer,
1212
MeshRenderer,
13-
OpenGLPerspectiveCameras,
13+
FoVPerspectiveCameras,
1414
PointLights,
1515
RasterizationSettings,
1616
TexturesVertex,
@@ -125,7 +125,7 @@ def render(
125125
meshes.textures = TexturesVertex(
126126
verts_features=torch.ones_like(meshes.verts_padded(), device=device)
127127
)
128-
cameras = kwargs.get("cameras", OpenGLPerspectiveCameras()).to(device)
128+
cameras = kwargs.get("cameras", FoVPerspectiveCameras()).to(device)
129129
if len(cameras) != 1 and len(cameras) % len(meshes) != 0:
130130
raise ValueError("Mismatch between batch dims of cameras and meshes.")
131131
if len(cameras) > 1:

pytorch3d/renderer/__init__.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,15 @@
66
sigmoid_alpha_blend,
77
softmax_rgb_blend,
88
)
9+
from .cameras import OpenGLOrthographicCameras # deprecated
10+
from .cameras import OpenGLPerspectiveCameras # deprecated
11+
from .cameras import SfMOrthographicCameras # deprecated
12+
from .cameras import SfMPerspectiveCameras # deprecated
913
from .cameras import (
10-
OpenGLOrthographicCameras,
11-
OpenGLPerspectiveCameras,
12-
SfMOrthographicCameras,
13-
SfMPerspectiveCameras,
14+
FoVOrthographicCameras,
15+
FoVPerspectiveCameras,
16+
OrthographicCameras,
17+
PerspectiveCameras,
1418
camera_position_from_spherical_angles,
1519
get_world_to_view_transform,
1620
look_at_rotation,

0 commit comments

Comments
 (0)