Skip to content
Open

Dev #41

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 94 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,98 @@
<div align='center'>

# SceneRF (Fork Edition)
**Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields**
_Original Authors:_ [Anh-Quan Cao](https://anhquancao.github.io), [Raoul de Charette](https://team.inria.fr/rits/membres/raoul-de-charette/)
_Inria, Paris, France._

</div>

**This repository is a personal fork of the official [SceneRF](https://github.com/astra-vision/SceneRF) repository.**
Please note that the changes described below are _not_ part of the official SceneRF repository and are _not_ endorsed by the original authors.

---
## Fork Changelog

This project was completed as a part of the **Machine Learning for 3D Geometry (IN2392)** course at **TUM**.

### **Enhancements in SceneRF Performance**

- Implemented **Random Fourier Features positional encoding** and **Hierarchical Sampling** (alongside existing sampling techniques) to significantly enhance **novel depths synthesis, novel views synthesis, and scene reconstruction** in SceneRF.
- Also tried **Multihead Self Attention** in Spherical-UNet, but it didn't improve the results because we didn't have alot of data and compute to train it for longer.
- Please checkout the project report [here](docs/BetterSceNeRF.pdf).
- These improvements yield better performance, as shown in the following table:

<img src="assets/outputResults.png">

- The **best results** are highlighted with **bold** font.
- **Original results** are taken from the SceneRF paper.
- **Scaled-down results** correspond to a scaled-down model using the configuration in `train_eval_bash_scripts/train_bundlefusion_scaled_down.sh`.

### **Additional Modifications**
Below is a summary of the modifications introduced in **this fork** to support additional features and datasets. **All credit for the original work goes to the original authors.**

1. **Dataset Argument for TUM RGB-D**
- A new `--dataset` argument has been introduced to:
- `scenerf/scripts/train_bundlefusion.py`
- `scenerf/data/bundlefusion_dm.py`
- `scenerf/data/bundlefusion_dataset.py`
- This allows for selecting between **BundleFusion** (`bf`) and **TUM RGB-D** (`tum_rgbd`) during training and data loading.

2. **Modified Evaluation and Reconstruction Scripts**
- Added a `--dataset` argument to:
- `scenerf/scripts/evaluation/save_depth_metrics_bf.py`
- `scenerf/scripts/evaluation/agg_depth_metrics_bf.py`
- `scenerf/scripts/evaluation/render_colors_bf.py`
- `scenerf/scripts/reconstruction/generate_novel_depths_bf.py`
- `scenerf/scripts/reconstruction/depth2tsdf_bf.py`
- `scenerf/scripts/reconstruction/generate_sc_gt_bf.py`
- `scenerf/scripts/evaluation/eval_sc_bf.py`
- This makes it possible to perform the same depth/TSDF/color metrics evaluations on the TUM RGB-D dataset using a BundleFusion-like format.

3. **TUM RGB-D to BundleFusion Conversion**
- **New File:** `convert_tum_to_bf/tum_to_bf`
- Script to convert the **TUM RGB-D** dataset into a BundleFusion-like directory structure, including:
- Pose conversion
- Depth scaling
- Converting `color.png` to `color.jpg`

4. **Random Fourier Features Positional Encoding**
- **New File:** `scenerf/models/pe_rff.py`
- **Modified Files:** `scenerf/models/scenerf_bf.py` (to implement rff positional encoding).
- Implements **Random Fourier Features** for positional encoding, providing an alternative to standard positional encodings.

5. **Hierarchical Sampling**
- **Modified Files:** `scenerf/scripts/train_bundlefusion.py` (added a `--n_pts_hier` argument) and `scenerf/models/scenerf_bf.py` (to implement hierarchical sampling).
- Implements **Hierarchical Sampling** alongside uniform and probabilistic sampling.
- Allows specifying the number of points for hierarchical sampling directly from the command line.
- Probabilistic sampling could sometimes overly concentrate on specific surface areas, leading to an imbalanced focus. Hierarchical sampling refines the uniform sampling points, ensuring a more even distribution near surfaces and improving overall reconstruction quality.

6. **Self Attention**
- **Modified Files:** `scenerf/models/unet2d_sphere.py` (to implement multihead self attention in the u-net bottleneck).

7. **Training and Evaluation Bash Scripts**
- **New File:** `train_eval_bash_scripts/train_bundlefusion_scaled_down.sh` (to train the model with scaled down configuration)
- **New File:** `train_eval_bash_scripts/eval_bundlefusion_scaled_down.sh` (to evaluate the model)
- Change paths in the bash scripts accordingly.
- Train either the **BundleFusion** (`bf`) and **TUM RGB-D** (`tum_rgbd`) dataset by selecting (`bf`) or (`tum_rgbd`) in the bash scripts.

8. **Assets**
- **New Directory:** `assets` (to save evaluation results)

---

<div align='center'>

# Original SceneRF README

</div>

Please refer to the original [SceneRF repository](https://github.com/astra-vision/SceneRF) for the most up-to-date official code and instructions. The following sections are from the original SceneRF README (with minor adaptations to reflect the presence of the fork).

----

<div align='center'>

# SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields

ICCV 2023
Expand All @@ -14,7 +107,7 @@ Inria, Paris, France.
</div>

If you find this work or code useful, please cite our [paper](https://arxiv.org/abs/2212.02501) and [give this repo a star](https://github.com/astra-vision/SceneRF/stargazers):
```
```bibtex
@InProceedings{cao2023scenerf,
author = {Cao, Anh-Quan and de Charette, Raoul},
title = {SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields},
Expand Down
Binary file added assets/outputResults.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
256 changes: 256 additions & 0 deletions convert_tum_to_bf/tum_to_bf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
import os
import numpy as np
from PIL import Image
from scipy.spatial.transform import Rotation as R
import argparse


def combine_and_rename_files(
tum_folder: str,
output_folder: str,
margin: float = 0.02
) -> None:
"""
Match and rename RGB/depth files and write them to the output folder with
the BundleFusion naming convention. Additionally, load poses from TUM's
groundtruth.txt, match them by timestamp within a given margin, and save
them as 4x4 transformation matrices. Color images are re-encoded as JPG at
maximum quality, and depth images are divided by 5 becuase for bf depth
1mm=1 and for tum_rgbd 1mm=5.

Parameters
----------
tum_folder : str
Path to the TUM scene folder containing 'rgb', 'depth', and
'groundtruth.txt'.
output_folder : str
Path to the output folder where converted data will be stored.
margin : float, optional
Maximum allowed time difference (in seconds) for matching frames
between RGB, depth, and pose data. Defaults to 0.02.

Returns
-------
None
"""

# Create output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Paths to subfolders and files
rgb_path = os.path.join(tum_folder, "rgb")
depth_path = os.path.join(tum_folder, "depth")
pose_file = os.path.join(tum_folder, "groundtruth.txt")

# If any required directory or file doesn't exist, return early
if not (os.path.isdir(rgb_path) and os.path.isdir(depth_path) and
os.path.exists(pose_file)):
print(f"Skipping {tum_folder} because it lacks required folders/files.")
return

# Get sorted lists of RGB and depth files
rgb_files = sorted(f for f in os.listdir(rgb_path)
if f.lower().endswith((".png", ".jpg")))
depth_files = sorted(f for f in os.listdir(depth_path)
if f.lower().endswith(".png"))

# Extract timestamps from filenames (assuming `timestamp.ext`)
rgb_entries = [(float(f.rsplit(".", 1)[0]), f) for f in rgb_files]
depth_entries = [(float(f.rsplit(".", 1)[0]), f) for f in depth_files]

# Load pose entries (timestamp tx ty tz qx qy qz qw) ignoring commented lines
with open(pose_file, "r") as f:
pose_lines = [
line.strip() for line in f
if line.strip() and not line.startswith("#")
]
pose_entries = []
for line in pose_lines:
parts = line.split()
ts = float(parts[0])
data = parts[1:] # [tx, ty, tz, qx, qy, qz, qw]
pose_entries.append((ts, data))

frame_counter = 0

# Iterate over RGB frames and find matching depth and pose
for rgb_ts, rgb_filename in rgb_entries:
frame_id = f"frame-{frame_counter:06d}"

# Find closest depth frame
if not depth_entries:
break
closest_depth = min(depth_entries, key=lambda x: abs(rgb_ts - x[0]))
if abs(rgb_ts - closest_depth[0]) > margin:
continue

# Find closest pose
if not pose_entries:
break
closest_pose = min(pose_entries, key=lambda x: abs(rgb_ts - x[0]))
if abs(rgb_ts - closest_pose[0]) > margin:
continue

# We have matched depth and pose; remove them from the pool
depth_entries.remove(closest_depth)
pose_entries.remove(closest_pose)

# Increment the frame counter now that we have a valid match
frame_counter += 1

# -- Process and save color image as JPG with max quality --
rgb_src = os.path.join(rgb_path, rgb_filename)
rgb_dst = os.path.join(output_folder, f"{frame_id}.color.jpg")

rgb_img = Image.open(rgb_src).convert("RGB")
rgb_img.save(rgb_dst, "JPEG", quality=100)

# -- Process and save depth image (divide by 5) as PNG --
depth_src = os.path.join(depth_path, closest_depth[1])
depth_dst = os.path.join(output_folder, f"{frame_id}.depth.png")

depth_img = np.array(Image.open(depth_src))
# Convert to uint16 and divide by 5 (integer division)
depth_img = depth_img.astype(np.uint16)
depth_img //= 5

depth_img_pil = Image.fromarray(depth_img)
depth_img_pil.save(depth_dst)

# -- Process and save pose as .pose.txt --
tx, ty, tz, qx, qy, qz, qw = map(float, closest_pose[1])
rotation = R.from_quat([qx, qy, qz, qw]).as_matrix()

pose_matrix = np.eye(4, dtype=np.float64)
pose_matrix[:3, :3] = rotation
pose_matrix[:3, 3] = [tx, ty, tz]

pose_dst = os.path.join(output_folder, f"{frame_id}.pose.txt")
np.savetxt(pose_dst, pose_matrix, fmt="%.6f")


def generate_info_txt(output_folder: str, folder_name: str) -> None:
"""
Generate an 'info.txt' file suitable for BundleFusion. This file includes
camera intrinsics and extrinsics for the color and depth sensors. The
intrinsics are selected based on the TUM dataset prefix.

Parameters
----------
output_folder : str
Path to the output folder where 'info.txt' will be saved.
folder_name : str
Name of the scene folder. Used to determine if the dataset is
'freiburg1', 'freiburg2', or 'freiburg3'.

Returns
-------
None
"""

# Intrinsics for different TUM prefixes
intrinsics = {
"freiburg1": "517.3 0 318.6 0 0 516.5 255.3 0 0 0 1 0 0 0 0 1",
"freiburg2": "520.9 0 325.1 0 0 521.0 249.7 0 0 0 1 0 0 0 0 1",
"freiburg3": "535.4 0 320.1 0 0 539.2 247.6 0 0 0 1 0 0 0 0 1"
}

# Default values
color_intrinsic = "525.0 0 319.5 0 0 525.0 239.5 0 0 0 1 0 0 0 0 1"
depth_intrinsic = "525.0 0 319.5 0 0 525.0 239.5 0 0 0 1 0 0 0 0 1"

folder_name_lower = folder_name.lower()
if "freiburg1" in folder_name_lower:
color_intrinsic = intrinsics["freiburg1"]
depth_intrinsic = intrinsics["freiburg1"]
elif "freiburg2" in folder_name_lower:
color_intrinsic = intrinsics["freiburg2"]
depth_intrinsic = intrinsics["freiburg2"]
elif "freiburg3" in folder_name_lower:
color_intrinsic = intrinsics["freiburg3"]
depth_intrinsic = intrinsics["freiburg3"]

info_path = os.path.join(output_folder, "info.txt")
with open(info_path, "w") as f:
f.write("m_versionNumber = 4\n")
f.write("m_sensorName = Kinect\n")
f.write("m_colorWidth = 640\n")
f.write("m_colorHeight = 480\n")
f.write("m_depthWidth = 640\n")
f.write("m_depthHeight = 480\n")
f.write("m_depthShift = 5000\n")
f.write(f"m_calibrationColorIntrinsic = {color_intrinsic}\n")
f.write("m_calibrationColorExtrinsic = "
"1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1\n")
f.write(f"m_calibrationDepthIntrinsic = {depth_intrinsic}\n")
f.write("m_calibrationDepthExtrinsic = "
"1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1\n")


def main(source_dir: str, dest_dir: str) -> None:
"""
Main function that processes multiple TUM scene folders within a source
directory and saves the converted data to a destination directory.

For each scene in `source_dir`, this function:
1. Calls `combine_and_rename_files` to match and convert images/poses.
2. Calls `generate_info_txt` to create the BundleFusion 'info.txt'.

Parameters
----------
source_dir : str
Path to the directory containing multiple TUM scenes (subdirectories).
dest_dir : str
Path to the directory where the converted scenes will be stored.

Returns
-------
None
"""

if not os.path.isdir(source_dir):
print(f"Source directory '{source_dir}' is not valid.")
return

os.makedirs(dest_dir, exist_ok=True)

# Process each subdirectory (scene) within the source directory
for scene_name in sorted(os.listdir(source_dir)):
scene_path = os.path.join(source_dir, scene_name)
if not os.path.isdir(scene_path):
# Skip files; only process directories
continue

# Create corresponding directory in destination
output_scene_path = os.path.join(dest_dir, scene_name)
os.makedirs(output_scene_path, exist_ok=True)

print(f"Processing scene: {scene_name}")

# Perform matching, renaming, and saving
combine_and_rename_files(scene_path, output_scene_path)

# Generate info.txt
generate_info_txt(output_scene_path, scene_name)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Convert multiple TUM RGB-D dataset scenes to BundleFusion format."
)
parser.add_argument(
"--source_dir",
type=str,
required=True,
help="Path to the directory containing multiple TUM scene folders."
)
parser.add_argument(
"--dest_dir",
type=str,
required=True,
help="Path to the directory where converted scenes will be stored."
)

args = parser.parse_args()
main(args.source_dir, args.dest_dir)

Binary file added docs/BetterSceNeRF.pdf
Binary file not shown.
Loading