facebookresearch
diff --git a/‎docs/notes/cameras.md
+63 b/‎docs/notes/cameras.md
+63
diff --git a/‎docs/notes/renderer_getting_started.md
+5-5 b/‎docs/notes/renderer_getting_started.md
+5-5
diff --git a/‎docs/tutorials/camera_position_optimization_with_differentiable_rendering.ipynb
+3-3 b/‎docs/tutorials/camera_position_optimization_with_differentiable_rendering.ipynb
+3-3
diff --git a/‎docs/tutorials/fit_textured_mesh.ipynb
+6-6 b/‎docs/tutorials/fit_textured_mesh.ipynb
+6-6
diff --git a/‎docs/tutorials/render_colored_points.ipynb
+5-5 b/‎docs/tutorials/render_colored_points.ipynb
+5-5
diff --git a/‎docs/tutorials/render_textured_meshes.ipynb
+5-5 b/‎docs/tutorials/render_textured_meshes.ipynb
+5-5
diff --git a/‎pytorch3d/datasets/shapenet_base.py
+2-2 b/‎pytorch3d/datasets/shapenet_base.py
+2-2
diff --git a/‎pytorch3d/renderer/__init__.py
+8-4 b/‎pytorch3d/renderer/__init__.py
+8-4
@@ -0,0 +1,63 @@
+# Cameras
+
+## Camera Coordinate Systems
+
+When working with 3D data, there are 4 coordinate systems users need to know
+* **World coordinate system**
+This is the system the object/scene lives - the world.
+* **Camera view coordinate system**
+This is the system that has its origin on the image plane and the `Z`-axis perpendicular to the image plane. In PyTorch3D, we assume that `+X` points left, and `+Y` points up and `+Z` points out from the image plane. The transformation from world to view happens after applying a rotation (`R`) and translation (`T`). 
+* **NDC coordinate system**
+This is the normalized coordinate system that confines in a volume the renderered part of the object/scene. Also known as view volume. Under the PyTorch3D convention, `(+1, +1, znear)` is the top left near corner, and `(-1, -1, zfar)` is the bottom right far corner of the volume. The transformation from view to NDC happens after applying the camera projection matrix (`P`).
+* **Screen coordinate system**
+This is another representation of the view volume with the `XY` coordinates defined in pixel space instead of a normalized space.
+
+An illustration of the 4 coordinate systems is shown below
+![cameras](https://user-images.githubusercontent.com/4369065/90317960-d9b8db80-dee1-11ea-8088-39c414b1e2fa.png)
+
+## Defining Cameras in PyTorch3D
+
+Cameras in PyTorch3D transform an object/scene from world to NDC by first transforming the object/scene to view (via transforms `R` and `T`) and then projecting the 3D object/scene to NDC (via the projection matrix `P`, else known as camera matrix). Thus, the camera parameters in `P` are assumed to be in NDC space. If the user has camera parameters in screen space, which is a common use case, the parameters should transformed to NDC (see below for an example)
+
+We describe the camera types in PyTorch3D and the convention for the camera parameters provided at construction time. 
+
+### Camera Types
+
+All cameras inherit from `CamerasBase` which is a base class for all cameras. PyTorch3D provides four different camera types. The `CamerasBase` defines methods that are common to all camera models:
+* `get_camera_center` that returns the optical center of the camera in world coordinates
+* `get_world_to_view_transform` which returns a 3D transform from world coordinates to the camera view coordinates (R, T)
+* `get_full_projection_transform` which composes the projection transform (P) with the world-to-view transform (R, T)
+* `transform_points` which takes a set of input points in world coordinates and projects to NDC coordinates ranging from [-1, -1, znear] to  [+1, +1, zfar].
+* `transform_points_screen` which takes a set of input points in world coordinates and projects them to the screen coordinates ranging from [0, 0, znear] to [W-1, H-1, zfar] 
+
+Users can easily customize their own cameras. For each new camera, users should implement the `get_projection_transform` routine that returns the mapping `P` from camera view coordinates to NDC coordinates.
+
+#### FoVPerspectiveCameras, FoVOrthographicCameras
+These two cameras follow the OpenGL convention for perspective and orthographic cameras respectively. The user provides the near `znear` and far `zfar` field which confines the view volume in the `Z` axis. The view volume in the `XY` plane is defined by field of view angle (`fov`) in the case of `FoVPerspectiveCameras` and by `min_x, min_y, max_x, max_y` in the case of `FoVOrthographicCameras`. 
+
+#### PerspectiveCameras, OrthographicCameras
+These two cameras follow the Multi-View Geometry convention for cameras. The user provides the focal length (`fx`, `fy`) and the principal point (`px`, `py`). For example, `camera = PerspectiveCameras(focal_length=((fx, fy),), principal_point=((px, py),))`
+
+As mentioned above, the focal length and principal point are used to convert a point `(X, Y, Z)` from view coordinates to NDC coordinates, as follows
+
+```
+# for perspective
+x_ndc = fx * X / Z + px
+y_ndc = fy * Y / Z + py
+z_ndc = 1 / Z
+
+# for orthographic
+x_ndc = fx * X + px
+y_ndc = fy * Y + py
+z_ndc = Z
+```
+
+Commonly, users have access to the focal length (`fx_screen`, `fy_screen`) and the principal point (`px_screen`, `py_screen`) in screen space. In that case, to construct the camera the user needs to additionally provide the `image_size = ((image_width, image_height),)`. More precisely, `camera = PerspectiveCameras(focal_length=((fx_screen, fy_screen),), principal_point=((px_screen, py_screen),), image_size = ((image_width, image_height),))`. Internally, the camera parameters are converted from screen to NDC as follows:
+
+```
+fx = fx_screen * 2.0 / image_width
+fy = fy_screen * 2.0 / image_height
+
+px = - (px_screen - image_width / 2.0) * 2.0 / image_width
+py = - (py_screen - image_height / 2.0) * 2.0/ image_height
+```
@@ -39,16 +39,16 @@ Rendering requires transformations between several different coordinate frames:
 <img src="assets/transformations_overview.png" width="1000">
 
 
-For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page. 
+For example, given a teapot mesh, the world coordinate frame, camera coordiante frame and image are show in the figure below. Note that the world and camera coordinate frames have the +z direction pointing in to the page.
 
 <img src="assets/world_camera_image.png" width="1000">
 
 ---
 
 **NOTE: PyTorch3D vs OpenGL**
 
-While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions. 
-- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen.  Both are right handed. 
+While we tried to emulate several aspects of OpenGL, there are differences in the coordinate frame conventions.
+- The default world coordinate frame in PyTorch3D has +Z pointing in to the screen whereas in OpenGL, +Z is pointing out of the screen.  Both are right handed.
 - The NDC coordinate system in PyTorch3D is **right-handed** compared with a **left-handed** NDC coordinate system in OpenGL (the projection matrix switches the handedness).
 
 <img align="center" src="assets/opengl_coordframes.png" width="300">
@@ -61,14 +61,14 @@ A renderer in PyTorch3D is composed of a **rasterizer** and a **shader**. Create
 ```
 # Imports
 from pytorch3d.renderer import (
-    OpenGLPerspectiveCameras, look_at_view_transform,
+    FoVPerspectiveCameras, look_at_view_transform,
     RasterizationSettings, BlendParams,
     MeshRenderer, MeshRasterizer, HardPhongShader
 )
 
 # Initialize an OpenGL perspective camera.
 R, T = look_at_view_transform(2.7, 10, 20)
-cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)
+cameras = FoVPerspectiveCameras(device=device, R=R, T=T)
 
 # Define the settings for rasterization and shading. Here we set the output image to be of size
 # 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1
 
@@ -102,7 +102,7 @@
     "\n",
     "# rendering components\n",
     "from pytorch3d.renderer import (\n",
-    "    OpenGLPerspectiveCameras, look_at_view_transform, look_at_rotation, \n",
+    "    FoVPerspectiveCameras, look_at_view_transform, look_at_rotation, \n",
     "    RasterizationSettings, MeshRenderer, MeshRasterizer, BlendParams,\n",
     "    SoftSilhouetteShader, HardPhongShader, PointLights\n",
     ")"
@@ -217,8 +217,8 @@
    },
    "outputs": [],
    "source": [
-    "# Initialize an OpenGL perspective camera.\n",
-    "cameras = OpenGLPerspectiveCameras(device=device)\n",
+    "# Initialize a perspective camera.\n",
+    "cameras = FoVPerspectiveCameras(device=device)\n",
     "\n",
     "# To blend the 100 faces we set a few parameters which control the opacity and the sharpness of \n",
     "# edges. Refer to blending.py for more details. \n",
 
@@ -129,7 +129,7 @@
         "from pytorch3d.structures import Meshes, Textures\n",
         "from pytorch3d.renderer import (\n",
         "    look_at_view_transform,\n",
-        "    OpenGLPerspectiveCameras, \n",
+        "    FoVPerspectiveCameras, \n",
         "    PointLights, \n",
         "    DirectionalLights, \n",
         "    Materials, \n",
@@ -309,16 +309,16 @@
         "# the cow is facing the -z direction. \n",
         "lights = PointLights(device=device, location=[[0.0, 0.0, -3.0]])\n",
         "\n",
-        "# Initialize an OpenGL perspective camera that represents a batch of different \n",
+        "# Initialize a camera that represents a batch of different \n",
         "# viewing angles. All the cameras helper methods support mixed type inputs and \n",
         "# broadcasting. So we can view the camera from the a distance of dist=2.7, and \n",
         "# then specify elevation and azimuth angles for each viewpoint as tensors. \n",
         "R, T = look_at_view_transform(dist=2.7, elev=elev, azim=azim)\n",
-        "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+        "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
         "\n",
         "# We arbitrarily choose one particular view that will be used to visualize \n",
         "# results\n",
-        "camera = OpenGLPerspectiveCameras(device=device, R=R[None, 1, ...], \n",
+        "camera = FoVPerspectiveCameras(device=device, R=R[None, 1, ...], \n",
         "                                  T=T[None, 1, ...]) \n",
         "\n",
         "# Define the settings for rasterization and shading. Here we set the output \n",
@@ -361,7 +361,7 @@
         "# Our multi-view cow dataset will be represented by these 2 lists of tensors,\n",
         "# each of length num_views.\n",
         "target_rgb = [target_images[i, ..., :3] for i in range(num_views)]\n",
-        "target_cameras = [OpenGLPerspectiveCameras(device=device, R=R[None, i, ...], \n",
+        "target_cameras = [FoVPerspectiveCameras(device=device, R=R[None, i, ...], \n",
         "                                           T=T[None, i, ...]) for i in range(num_views)]"
       ],
       "execution_count": null,
@@ -925,4 +925,4 @@
       ]
     }
   ]
-}
+}
@@ -64,7 +64,7 @@
     "from pytorch3d.structures import Pointclouds\n",
     "from pytorch3d.renderer import (\n",
     "    look_at_view_transform,\n",
-    "    OpenGLOrthographicCameras, \n",
+    "    FoVOrthographicCameras, \n",
     "    PointsRasterizationSettings,\n",
     "    PointsRenderer,\n",
     "    PointsRasterizer,\n",
@@ -147,9 +147,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Initialize an OpenGL perspective camera.\n",
+    "# Initialize a camera.\n",
     "R, T = look_at_view_transform(20, 10, 0)\n",
-    "cameras = OpenGLOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
+    "cameras = FoVOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
     "\n",
     "# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
     "# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
@@ -195,9 +195,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Initialize an OpenGL perspective camera.\n",
+    "# Initialize a camera.\n",
     "R, T = look_at_view_transform(20, 10, 0)\n",
-    "cameras = OpenGLOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
+    "cameras = FoVOrthographicCameras(device=device, R=R, T=T, znear=0.01)\n",
     "\n",
     "# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
     "# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
 
@@ -90,7 +90,7 @@
     "from pytorch3d.structures import Meshes, Textures\n",
     "from pytorch3d.renderer import (\n",
     "    look_at_view_transform,\n",
-    "    OpenGLPerspectiveCameras, \n",
+    "    FoVPerspectiveCameras, \n",
     "    PointLights, \n",
     "    DirectionalLights, \n",
     "    Materials, \n",
@@ -286,11 +286,11 @@
    },
    "outputs": [],
    "source": [
-    "# Initialize an OpenGL perspective camera.\n",
+    "# Initialize a camera.\n",
     "# With world coordinates +Y up, +X left and +Z in, the front of the cow is facing the -Z direction. \n",
     "# So we move the camera by 180 in the azimuth direction so it is facing the front of the cow. \n",
     "R, T = look_at_view_transform(2.7, 0, 180) \n",
-    "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+    "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
     "\n",
     "# Define the settings for rasterization and shading. Here we set the output image to be of size\n",
     "# 512x512. As we are rendering images for visualization purposes only we will set faces_per_pixel=1\n",
@@ -444,7 +444,7 @@
    "source": [
     "# Rotate the object by increasing the elevation and azimuth angles\n",
     "R, T = look_at_view_transform(dist=2.7, elev=10, azim=-150)\n",
-    "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+    "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
     "\n",
     "# Move the light location so the light is shining on the cow's face.  \n",
     "lights.location = torch.tensor([[2.0, 2.0, -2.0]], device=device)\n",
@@ -519,7 +519,7 @@
     "# view the camera from the same distance and specify dist=2.7 as a float,\n",
     "# and then specify elevation and azimuth angles for each viewpoint as tensors. \n",
     "R, T = look_at_view_transform(dist=2.7, elev=elev, azim=azim)\n",
-    "cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)\n",
+    "cameras = FoVPerspectiveCameras(device=device, R=R, T=T)\n",
     "\n",
     "# Move the light back in front of the cow which is facing the -z direction.\n",
     "lights.location = torch.tensor([[0.0, 0.0, -3.0]], device=device)"
 
@@ -10,7 +10,7 @@
     HardPhongShader,
     MeshRasterizer,
     MeshRenderer,
-    OpenGLPerspectiveCameras,
+    FoVPerspectiveCameras,
     PointLights,
     RasterizationSettings,
     TexturesVertex,
@@ -125,7 +125,7 @@ def render(
         meshes.textures = TexturesVertex(
             verts_features=torch.ones_like(meshes.verts_padded(), device=device)
         )
-        cameras = kwargs.get("cameras", OpenGLPerspectiveCameras()).to(device)
+        cameras = kwargs.get("cameras", FoVPerspectiveCameras()).to(device)
         if len(cameras) != 1 and len(cameras) % len(meshes) != 0:
             raise ValueError("Mismatch between batch dims of cameras and meshes.")
         if len(cameras) > 1:
 
@@ -6,11 +6,15 @@
     sigmoid_alpha_blend,
     softmax_rgb_blend,
 )
+from .cameras import OpenGLOrthographicCameras  # deprecated
+from .cameras import OpenGLPerspectiveCameras  # deprecated
+from .cameras import SfMOrthographicCameras  # deprecated
+from .cameras import SfMPerspectiveCameras  # deprecated
 from .cameras import (
-    OpenGLOrthographicCameras,
-    OpenGLPerspectiveCameras,
-    SfMOrthographicCameras,
-    SfMPerspectiveCameras,
+    FoVOrthographicCameras,
+    FoVPerspectiveCameras,
+    OrthographicCameras,
+    PerspectiveCameras,
     camera_position_from_spherical_angles,
     get_world_to_view_transform,
     look_at_rotation,