Skip to content

Allow for >1 batch size in Splatfacto #3582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

akristoffersen
Copy link
Contributor

WIP, preliminary testing makes it look like it's working but I would want to make sure.

@kerrj
Copy link
Collaborator

kerrj commented Jan 31, 2025

Hey Alex! this is super cool; especially in MCMC which doesn't require gradient thresholds at all. #3216 might have mildly broken parts of this PR since it merged in parallel dataloading, but it shouldn't be too bad; let us know if you want any help fixing conflicts!

@hardikdava
Copy link
Contributor

hardikdava commented Feb 7, 2025

@akristoffersen I think you might want to modify step_post_backward to support for batch processing. So densification and pruning can be intact as it is.

@akristoffersen
Copy link
Contributor Author

Works with masks now

As expected, I noticed a almost 2x increase in rays/s with a batch size of two, and a very slight performance drop with a batch size of 1 compared to baseline (50.1 M rays/sec -> 48 M rays/sec)

@akristoffersen
Copy link
Contributor Author

@hardikdava do you mean that the tuning might be different for the thresholds? Yeah, I don't know exactly what to do there. maybe someone else has an opinion?

Some quick stats on the poster dataset.
Orange: Baseline (batch size of 1)
Blue: BS = 5
Red: BS = 10

Screenshot 2025-02-08 at 9 42 13 PM

so the splitting / densification outcomes are affected by batch size.

Screenshot 2025-02-08 at 9 43 48 PM

Similarly, train rays/sec do start higher due to the larger batch size, but go down as you'd expect with the higher number of gaussians.

Screenshot 2025-02-08 at 9 43 02 PM

Some good news, with a higher batch I do see the training loss hitting better values quicker as the batch size increases.

@hardikdava
Copy link
Contributor

@akristoffersen currently, densification, splitting and culling are implemented inside strategy and logic is based on step. Since batch training can skip some of the functions. I am not sure but you might want to modify the strategy function by calling the batch size e.g. loop for the batch size. So that all densification logic will be performed for all steps, otherwise some of the functions will be skipped.

In simple words, suppose the batch size is 2, opacity reset needs to be applied at every 3000th step. So it should happen at every 1500th steps according to batch size. But according to your current implementation it will be applied at every 3000th steps but actually it will be 6000th step (batch size * step).

@akristoffersen
Copy link
Contributor Author

akristoffersen commented Feb 10, 2025

@hardikdava I think dividing those parameters by the batch size assumes that every image produces gradients for a unique set of gaussians. If there's any overlap, then the gaussians seen by 2 images would just be getting a single gradient descent update applied to them (albeit of a possibly better quality because of the signal from both images), while if it was a single image batch those gaussians would have gotten 2 gradient descent updates applied to them.

I think that dividing those params by the batch size could still be a good approximation, I'll try it and see how the losses look.

Copy link
Contributor

@AntonioMacaronio AntonioMacaronio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these changes on 2 of my datasets with the following commands

  • ns-train
  • ns-render
  • ns-eval
  • ns-export

These all worked well! @jeffreyhuparallel do you have any comments about the hyperparameter strategy stuff?

@abrahamezzeddine
Copy link

abrahamezzeddine commented Mar 17, 2025

How would batch size work with a dataset of several thousand images? With opacity reset in mind etc... :)

I suspect that scenes with lots of images would inherently benefit from a larger batch size because more images are part of the training per step.

How is the memory use? Are we able to train with, say, 20, 50 or 100 images per step? How would that affect training speed and memory use? Is it linear?

@akristoffersen
Copy link
Contributor Author

All good questions @abrahamezzeddine , unfortunately I haven't had cycles as of late to finish up the implementation and do the necessary benchmarking. @AntonioMacaronio has been helping on that front.

On the large dataset question, I think you're right. Something that has always bothered me about gs is that a batch isn't representative of the full objective-- with NeRFs the batch is a random collection of rays from all images so that helps.

I do suspect that at the moment, splatfacto is memory bound, so increasing the batch size may not improve things as you'd expect. But I think it should make the scene converge better / faster.

@abrahamezzeddine
Copy link

abrahamezzeddine commented Mar 17, 2025

All good questions @abrahamezzeddine , unfortunately I haven't had cycles as of late to finish up the implementation and do the necessary benchmarking. @AntonioMacaronio has been helping on that front.

On the large dataset question, I think you're right. Something that has always bothered me about gs is that a batch isn't representative of the full objective-- with NeRFs the batch is a random collection of rays from all images so that helps.

I do suspect that at the moment, splatfacto is memory bound, so increasing the batch size may not improve things as you'd expect. But I think it should make the scene converge better / faster.

Thanks for the quick response!

Just food for thought:
If memory is indeed a bottleneck, what if we initially load all images at aggressively downsampled resolutions to dynamically fit larger batches?

As we reach a satisfactory loss at the initial stage, an upsample session starts and splatfacto progressively reduce the batch size while increasing the image resolution, iteratively refining until we achieve the desired quality at full resolution. Essentially, starting with maximum batch size with lower quality images, and gradually trading off batch size (N-times to the desired batch size) for higher resolution during training.

Do you think this would help converge complex scenes initially as we upsample and reduce batch size?

@akristoffersen
Copy link
Contributor Author

iteratively refining until we achieve the desired quality at full resolution.

We sort of already do this-- we initially train on downsampled images and then increase the resolution as training continues. But inversely scaling the batch size as training goes on also sounds like a good idea.

Something I've also wanted to try (probably in a separate PR) is to load in a large batch of patches, so the training acts more like NeRFs. You could do this by keeping the focal lengths the same, but augmenting the principal points for each patch.

@abrahamezzeddine
Copy link

abrahamezzeddine commented Mar 18, 2025

Thanks. Dynamic batching is indeed an interesting possibility.

Two thoughts: 💭

How would batching process images? Randomly or sequentially?

One option is to simply process images in the order they were captured—for example, taking the first n images as one batch, then the next n, and so on. The idea here is that sequential ordering might naturally preserve temporal continuity, so adjacent frames (which are likely to have similar viewpoints) get processed together. But that can perhaps be difficult to know depending on what the user matched the images with; exhaustive, sequential or vocabulary tree.

Another approach is to order the sparse point cloud using a Hilbert curve. Since a Hilbert curve is a space-filling curve that preserves locality, it essentially divides your scene into “patches” of points that are spatially close together. For instance, if you select 10,000 consecutive points from this 1D Hilbert index, you’re effectively picking a coherent patch of the scene. If you divide them into n-points, you essentially create patches of local regions. You can then choose the images that see these points for your batch based on the colmap input data. Since the images are already ordered according to the hilbert curve, it’s easy to keep track which images belongs to which patch. This strategy explicitly enforces spatial coherence, ensuring that each batch is focused on a local region of the scene as it is training. You

Would love to hear your thoughts about this.

@akristoffersen
Copy link
Contributor Author

You might want to check out https://arxiv.org/abs/2501.13975 , they have a similar "locality" heuristic that they use to pull multiple images seeing the same region of the scene. They say that this helps prevent overshoot/overfitting to a single image which could happen as they are using a second-order optimization algorithm.

My take is that with a suitably large and diverse batch, this might not be a problem? But I agree that with smaller batches, a local neighborhood might work out better. I imagine batch-building heuristic doesn't have to be super complicated to get the behavior you'd want.

@abrahamezzeddine
Copy link

abrahamezzeddine commented Mar 19, 2025

Trying this out now myself and seems to converge initially much faster with a batch size of 50 at the moment. Using 2K resolution images and around 2500 images. 18GB VRAM is used of 48GB VRAM.

750 (2.50%) 699.992 ms 5 h, 41 m, 14 s 494.09 M

Not the fastest but as long as it produces a high quality output, it's fine I guess. =)

@abrahamezzeddine
Copy link

Works with masks now

As expected, I noticed a almost 2x increase in rays/s with a batch size of two, and a very slight performance drop with a batch size of 1 compared to baseline (50.1 M rays/sec -> 48 M rays/sec)

I am not seeing the linear increase in rays/s with larger batch sizes. Is there a diminishing effect after a certain batch?

@akristoffersen
Copy link
Contributor Author

Is there a diminishing effect after a certain batch?

Yes, please see the initial wandb results in an earlier comment. Initially the ray throughput scales, but I think because the splitting behavior currently assumes a single image per patch, we are getting many more gaussians with higher batch sizes.

@abrahamezzeddine
Copy link

abrahamezzeddine commented Mar 19, 2025

Is there a diminishing effect after a certain batch?

Yes, please see the initial wandb results in an earlier comment. Initially the ray throughput scales, but I think because the splitting behavior currently assumes a single image per patch, we are getting many more gaussians with higher batch sizes.

Ok, thanks.

The learning rates, should one consider the square root batch scaling due to larger batch size?

Bilagrad was also not working but I made these changes to have it work again with batched training. Not sure however if this is "compatible" with Bilagrad.

    def _apply_bilateral_grid(self, rgb: torch.Tensor, cam_idx, H: int, W: int) -> torch.Tensor:
        # Get batch size
        batch_size = rgb.shape[0]
        
        grid_y, grid_x = torch.meshgrid(
            torch.linspace(0, 1.0, H, device=self.device),
            torch.linspace(0, 1.0, W, device=self.device),
            indexing="ij",
        )
        grid_xy = torch.stack([grid_x, grid_y], dim=-1)  # [H, W, 2]
        grid_xy = grid_xy.expand(batch_size, H, W, 2)  # [B, H, W, 2]
        
        if isinstance(cam_idx, torch.Tensor):
            if cam_idx.dim() > 1:
                grid_idx = cam_idx[0, 0].clone().detach().to(device=self.device, dtype=torch.long)
            else:
                grid_idx = cam_idx.clone().detach().to(device=self.device, dtype=torch.long)
        else:
            grid_idx = torch.tensor(cam_idx, device=self.device, dtype=torch.long)
        
        grid_idx = grid_idx.expand(batch_size)
        
        out = slice(
            bil_grids=self.bil_grids,
            rgb=rgb,
            xy=grid_xy,
            grid_idx=grid_idx,
        )
        return out["rgb"]

@ichsan2895
Copy link

ichsan2895 commented Mar 27, 2025

Recently I tested Nerfstudio v1.1.4+gsplat 1.3.0 and this PR (commit d5bdd45)+gsplat 1.4.0 in a dataset.

Why the metrics (PSNR, SSIM, LPIPS) are fluctuating up and down in Blue line in Nerfstudio commit fix alpha compositing. ??, while pink line (Nerfstudio commit 194b5d4) is more stable..

image

@ichsan2895
Copy link

ichsan2895 commented Mar 28, 2025

I see the different of quality while adjusting batch-size 🎉

ns-train splatfacto --pipeline.datamanager.batch-size 1
(PSNR: 28.559, SSIM: 0.9312, LPIPS: 0.2408)
01_ABSGRAD-8-SML-CLS

ns-train splatfacto --pipeline.datamanager.batch-size 2
(PSNR: 29.026, SSIM: 0.9355, LPIPS: 0.218)
01_ABSBS2-8-SML-CLS

ns-train splatfacto-mcmc --pipeline.datamanager.batch-size 1
(PSNR: 27.069, SSIM: 0.9225, LPIPS: 0.265)
01_MCMC1M-8-SML-CLS

ns-train splatfacto-mcmc --pipeline.datamanager.batch-size 2
(PSNR: 29.005, SSIM: 0.9363, LPIPS: 0.233)
01_MCMC1M-8-BS2-CLS

Bilagrid with batch-size > 1 seems not working well. TV_loss is always zero.
image

@ichsan2895
Copy link

Found another bugs.

Batch-size > 1 is not working when using multi camera setup (For example, the images resolution is not same, some images are landscape, and others is potrait. You can test my dataset here:( https://drive.google.com/file/d/1NWZSDU9tEmrAtpKxntTw6YBge_AZ66mf/view?usp=sharing)

## This code Works IF I set batch-size 1
>> ns-train splatfacto --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.datamanager.batch-size 2 \
    nerfstudio-data \
    --data path/to/dataset --downscale-factor 1

wandb: 🚀 View run at https://wandb.ai/muhammad-ichsan/nerfstudio-project/runs/l4d8d829
logging events to: outputs/unnamed/splatfacto/2025-03-30_062154
[06:22:05] Caching / undistorting train images                                            �]8;id=442417;file:///workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py�\full_images_datamanager.py�]8;;�\:�]8;id=33326;file:///workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py#241�\241�]8;;�\
Caching / undistorting train images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00m 0:00:01
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.2992              
VanillaPipeline.get_train_loss_dict: 0.2976              
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 298, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py", line 406, in next_train
    data = nerfstudio_collate(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in nerfstudio_collate
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in <dictcomp>
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 103, in nerfstudio_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [767, 1035, 3] at entry 0 and [1049, 778, 3] at entry 1
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 298, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py", line 406, in next_train
    data = nerfstudio_collate(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in nerfstudio_collate
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in <dictcomp>
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 103, in nerfstudio_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [767, 1035, 3] at entry 0 and [1049, 778, 3] at entry 1

And yeah, this code works, but the TV_loss is still zero when I activated bilagrid.

    def _apply_bilateral_grid(self, rgb: torch.Tensor, cam_idx, H: int, W: int) -> torch.Tensor:
        # Get batch size
        batch_size = rgb.shape[0]
        
        grid_y, grid_x = torch.meshgrid(
            torch.linspace(0, 1.0, H, device=self.device),
            torch.linspace(0, 1.0, W, device=self.device),
            indexing="ij",
        )
        grid_xy = torch.stack([grid_x, grid_y], dim=-1)  # [H, W, 2]
        grid_xy = grid_xy.expand(batch_size, H, W, 2)  # [B, H, W, 2]
        
        if isinstance(cam_idx, torch.Tensor):
            if cam_idx.dim() > 1:
                grid_idx = cam_idx[0, 0].clone().detach().to(device=self.device, dtype=torch.long)
            else:
                grid_idx = cam_idx.clone().detach().to(device=self.device, dtype=torch.long)
        else:
            grid_idx = torch.tensor(cam_idx, device=self.device, dtype=torch.long)
        
        grid_idx = grid_idx.expand(batch_size)
        
        out = slice(
            bil_grids=self.bil_grids,
            rgb=rgb,
            xy=grid_xy,
            grid_idx=grid_idx,
        )
        return out["rgb"]

@AntonioMacaronio
Copy link
Contributor

AntonioMacaronio commented Mar 30, 2025

@ichsan2895 thank you for the beautiful testing! The batching with cameras of different resolutions is concerning, and I suspect this is something that just can't be supported until Pytorch supports jagged tensors. Afaik, it is called NestedTensors and it's currently in beta, but it will likely be some time before it is supported

perhaps the current best solution is to just not allow batching when images of varying resolution are given

@abrahamezzeddine
Copy link

NestedTensors

I can also mention that camera optim does not work with batch size over 1. With a few modifications, I had made it work again.

@abrahamezzeddine
Copy link

Found another bugs.

Batch-size > 1 is not working when using multi camera setup (For example, the images resolution is not same, some images are landscape, and others is potrait. You can test my dataset here:( https://drive.google.com/file/d/1NWZSDU9tEmrAtpKxntTw6YBge_AZ66mf/view?usp=sharing)

## This code Works IF I set batch-size 1
>> ns-train splatfacto --logging.steps-per-log 200 --vis viewer+wandb --viewer.websocket-port 7007 \
    --pipeline.datamanager.batch-size 2 \
    nerfstudio-data \
    --data path/to/dataset --downscale-factor 1

wandb: 🚀 View run at https://wandb.ai/muhammad-ichsan/nerfstudio-project/runs/l4d8d829
logging events to: outputs/unnamed/splatfacto/2025-03-30_062154
[06:22:05] Caching / undistorting train images                                            �]8;id=442417;file:///workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py�\full_images_datamanager.py�]8;;�\:�]8;id=33326;file:///workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py#241�\241�]8;;�\
Caching / undistorting train images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00m 0:00:01
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.2992              
VanillaPipeline.get_train_loss_dict: 0.2976              
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 298, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py", line 406, in next_train
    data = nerfstudio_collate(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in nerfstudio_collate
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in <dictcomp>
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 103, in nerfstudio_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [767, 1035, 3] at entry 0 and [1049, 778, 3] at entry 1
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 298, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py", line 406, in next_train
    data = nerfstudio_collate(
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in nerfstudio_collate
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 122, in <dictcomp>
    {key: nerfstudio_collate([d[key] for d in batch], extra_mappings=extra_mappings) for key in elem}
  File "/workspace/NERFSTUDIO_v115a1/nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py", line 103, in nerfstudio_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [767, 1035, 3] at entry 0 and [1049, 778, 3] at entry 1

And yeah, this code works, but the TV_loss is still zero when I activated bilagrid.

    def _apply_bilateral_grid(self, rgb: torch.Tensor, cam_idx, H: int, W: int) -> torch.Tensor:
        # Get batch size
        batch_size = rgb.shape[0]
        
        grid_y, grid_x = torch.meshgrid(
            torch.linspace(0, 1.0, H, device=self.device),
            torch.linspace(0, 1.0, W, device=self.device),
            indexing="ij",
        )
        grid_xy = torch.stack([grid_x, grid_y], dim=-1)  # [H, W, 2]
        grid_xy = grid_xy.expand(batch_size, H, W, 2)  # [B, H, W, 2]
        
        if isinstance(cam_idx, torch.Tensor):
            if cam_idx.dim() > 1:
                grid_idx = cam_idx[0, 0].clone().detach().to(device=self.device, dtype=torch.long)
            else:
                grid_idx = cam_idx.clone().detach().to(device=self.device, dtype=torch.long)
        else:
            grid_idx = torch.tensor(cam_idx, device=self.device, dtype=torch.long)
        
        grid_idx = grid_idx.expand(batch_size)
        
        out = slice(
            bil_grids=self.bil_grids,
            rgb=rgb,
            xy=grid_xy,
            grid_idx=grid_idx,
        )
        return out["rgb"]

How do you activate the train validation loss in the console log output? Maybe I can check and see what to find.

@ichsan2895
Copy link

ichsan2895 commented Mar 30, 2025

Preliminary Benchmark

Just benchmarking Mip360 dataset with various value of batch-size

In this time, for each scene, I just run 1000 steps only. Maybe when I have other free time, I will set it to 30k steps.

FYI, mip-360's downscaled images is not compatible with nerfstudio since nerfstudio needs downscale with floor rounding decimals. So, I resize it manually.. See this #1438 for discussion.

ns-train splatfacto --pipeline.datamanager.batch-size 1, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

# SCENE PSNR SSIM LPIPS duration_seconds duration_minutes
0 garden 20.855 0.432 0.592 43.000 0.717
1 bicycle 19.842 0.377 0.727 39.000 0.650
2 stump 21.623 0.452 0.651 33.000 0.550
3 bonsai 23.668 0.789 0.293 64.000 1.067
4 counter 22.757 0.746 0.374 46.000 0.767
5 kitchen 20.818 0.697 0.304 53.000 0.883
6 room 24.931 0.801 0.366 47.000 0.783
7 Average 22.071 0.613 0.472 46.429 0.774

ns-train splatfacto --pipeline.datamanager.batch-size 2, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

SCENE PSNR SSIM LPIPS duration_seconds duration_minutes
garden 20.911 0.441 0.558 42.000 0.700
bicycle 19.982 0.390 0.688 37.000 0.617
stump 21.833 0.468 0.612 34.000 0.567
bonsai 23.759 0.797 0.271 54.000 0.900
counter 23.310 0.760 0.333 48.000 0.800
kitchen 20.116 0.698 0.284 54.000 0.900
room 25.750 0.817 0.329 48.000 0.800
Average 22.237 0.624 0.439 45.286 0.755

ns-train splatfacto --pipeline.datamanager.batch-size 3, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

# SCENE PSNR SSIM LPIPS duration_seconds duration_minutes
0 garden 20.920 0.447 0.540 43.000 0.717
1 bicycle 20.095 0.398 0.667 39.000 0.650
2 stump 21.949 0.477 0.593 36.000 0.600
3 bonsai 24.043 0.803 0.260 55.000 0.917
4 counter 23.539 0.768 0.314 48.000 0.800
5 kitchen 19.972 0.709 0.273 66.000 1.100
6 room 26.261 0.825 0.311 62.000 1.033
7 Average 22.397 0.632 0.423 49.857 0.831

SPLATFACTO-BIG

ns-train splatfacto-big --pipeline.datamanager.batch-size 1, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

Index Scene PSNR SSIM LPIPS Duration (seconds) Duration (minutes)
0 garden 21.040 0.432 0.587 46.000 0.767
1 bicycle 19.903 0.377 0.727 47.000 0.783
2 stump 21.701 0.456 0.648 38.000 0.633
3 bonsai 23.915 0.793 0.293 175.000 2.917
4 counter 22.931 0.749 0.374 162.000 2.700
5 kitchen 21.051 0.702 0.303 108.000 1.800
6 room 25.014 0.804 0.365 170.000 2.833
7 Average 22.222 0.616 0.471 106.571 1.776

ns-train splatfacto-big --pipeline.datamanager.batch-size 2, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

Index Scene PSNR SSIM LPIPS Duration (seconds) Duration (minutes)
0 garden 21.076 0.440 0.551 52.000 0.867
1 bicycle 20.072 0.391 0.682 63.000 1.050
2 stump 21.979 0.473 0.605 36.000 0.600
3 bonsai 24.170 0.802 0.269 74.000 1.233
4 counter 23.511 0.764 0.331 123.000 2.050
5 kitchen 20.531 0.706 0.281 129.000 2.150
6 room 25.776 0.820 0.326 112.000 1.867
7 Average 22.445 0.628 0.435 84.143 1.402

ns-train splatfacto-big --pipeline.datamanager.batch-size 3, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

Index Scene PSNR SSIM LPIPS Duration (seconds) Duration (minutes)
0 garden 21.121 0.447 0.531 43.000 0.717
1 bicycle 20.205 0.401 0.658 39.000 0.650
2 stump 22.155 0.483 0.581 34.000 0.567
3 bonsai 24.312 0.806 0.260 61.000 1.017
4 counter 23.686 0.772 0.311 49.000 0.817
5 kitchen 20.286 0.716 0.269 56.000 0.933
6 room 26.332 0.827 0.308 48.000 0.800
7 Average 22.585 0.636 0.417 47.143 0.786

MCMC

ns-train splatfacto-mcmc --pipeline.datamanager.batch-size 1, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

Index Scene PSNR SSIM LPIPS Duration (seconds) Duration (minutes)
0 garden 21.171 0.434 0.627 42.000 0.700
1 bicycle 20.050 0.376 0.769 38.000 0.633
2 stump 21.447 0.438 0.707 34.000 0.567
3 bonsai 24.029 0.797 0.310 52.000 0.867
4 counter 23.161 0.754 0.384 47.000 0.783
5 kitchen 21.002 0.703 0.311 54.000 0.900
6 room 25.264 0.809 0.375 48.000 0.800
7 Average 22.303 0.616 0.498 45.000 0.750

ns-train splatfacto-mcmc --pipeline.datamanager.batch-size 2, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

Index Scene PSNR SSIM LPIPS Duration (seconds) Duration (minutes)
0 garden 21.281 0.444 0.596 42.000 0.700
1 bicycle 20.175 0.386 0.739 39.000 0.650
2 stump 21.501 0.447 0.683 35.000 0.583
3 bonsai 24.289 0.807 0.286 52.000 0.867
4 counter 23.645 0.767 0.347 48.000 0.800
5 kitchen 20.754 0.709 0.290 53.000 0.883
6 room 26.017 0.822 0.344 48.000 0.800
7 Average 22.523 0.626 0.469 45.286 0.755

ns-train splatfacto-mcmc --pipeline.datamanager.batch-size 2 --pipeline.datamanager.train-cameras-sampling-strategy fps, Nerfstudio commit d5bdd45, Python 3.10, RTX4090

Index Scene PSNR SSIM LPIPS Duration (seconds) Duration (minutes)
0 garden 21.284 0.445 0.595 42.000 0.700
1 bicycle 20.269 0.389 0.736 39.000 0.650
2 stump 21.587 0.447 0.684 35.000 0.583
3 bonsai 24.378 0.810 0.282 54.000 0.900
4 counter 23.813 0.773 0.342 47.000 0.783
5 kitchen 20.814 0.713 0.286 53.000 0.883
6 room 25.957 0.823 0.341 50.000 0.833
7 Average 22.586 0.629 0.467 45.714 0.762

@ichsan2895
Copy link

ichsan2895 commented Mar 30, 2025

How do you activate the train validation loss in the console log output? Maybe I can check and see what to find.

@abrahamezzeddine I use wandb for logging.

>> ns-train splatfacto --vis viewer+wandb so on......

wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: muhammad-ichsan. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.0
wandb: Run data is saved locally in outputs/unnamed/splatfacto/2025-03-29_180941/wandb/run-20250329_180951-wkvxyz93
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run unnamed
wandb: ⭐️ View project at https://wandb.ai/your-account/nerfstudio-project
wandb: 🚀 View run at https://wandb.ai/your-account/nerfstudio-project/runs/wkvxyz93

@akristoffersen
Copy link
Contributor Author

Sorry for the neglect of this pr all, @ichsan2895 thank you so much for the benchmarking, it's a real help.

I will try to crush the bugs re: bilagrid this weekend. Sorry again for the delay here.

Regarding multi-res camera support, I think I'm okay limiting that ability at the moment, though if I go through with the "large patch sampling" technique described above, that limit can go away.

@ichsan2895
Copy link

ichsan2895 commented Apr 17, 2025

Personally, this feature is big charger for splatfacto.

ns-train splatfacto --pipeline.datamanager.batch-size 1
1_bs1_16k
2_bs1_20k

ns-train splatfacto --pipeline.datamanager.batch-size 3
1_bs3_16k
2_bs3_20k

ns-train splatfacto --pipeline.datamanager.batch-size 5
1_bs5_16k
2_bs5_20k

Hopefully in the future, the bugs of bilateral_grid and multicam support is fixed.

@ichsan2895
Copy link

ichsan2895 commented Apr 22, 2025

I confirm that bilagrid works properly. TV_loss, cc_psnr, cc_ssim, and cc_lpips are calculated properly. 🎉
image

Please note, sometimes metrics (psnr, lpips, ssim) are up and down for unknown reason in some dataset. But in the end, it will be higher and stable in >15k iters.
image

Next, I will test bilagrid+RGBA images.

@ichsan2895
Copy link

ichsan2895 commented Apr 22, 2025

Bilagrid+RGBA dataset does not work:

>> ns-train splatfacto --vis viewer+wandb \
    --pipeline.model.use-bilateral-grid True --pipeline.model.color-corrected-metrics True \
    --pipeline.datamanager.batch-size 2 \
    nerfstudio-data \
    --data path/to/scene --downscale-factor 1
.
.
.
Step (% Done)       Train Iter (time)    ETA (time)           
--------------------------------------------------------------
Step (% Done)       Train Iter (time)    ETA (time)                                                  
--------------------------------------------------------------                                       
0 (0.00%)           1 m, 7 s             23 d, 6 h, 36 m, 54 s                                       
---------------------------------------------------------------------------------------------------- 
Viewer running locally at: http://localhost:7007/ (listening on 0.0.0.0)                              
[06:48:33] Caching / undistorting eval images                                             �]8;id=439898;file:///workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py\full_images_datamanager.py�]8;;�\:�]8;id=231148;file:///workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py#241�\241�]8;;�\
Caching / undistorting eval images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:0600:0100:01
Printing profiling stats, from longest to shortest duration in seconds
VanillaPipeline.get_eval_image_metrics_and_images: 7.3795              
Trainer.train_iteration: 0.7054              
VanillaPipeline.get_train_loss_dict: 0.6988              
Trainer.eval_iteration: 0.0731              
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 304, in train
    self.eval_iteration(step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/decorators.py", line 71, in wrapper
    ret = func(self, *args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 551, in eval_iteration
    metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 339, in get_eval_image_metrics_and_images
    metrics_dict, images_dict = self.model.get_image_metrics_and_images(outputs, batch)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 763, in get_image_metrics_and_images
    combined_rgb = torch.cat([gt_rgb, predicted_rgb], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 4 and 3
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 304, in train
    self.eval_iteration(step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/decorators.py", line 71, in wrapper
    ret = func(self, *args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 551, in eval_iteration
    metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 339, in get_eval_image_metrics_and_images
    metrics_dict, images_dict = self.model.get_image_metrics_and_images(outputs, batch)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 763, in get_image_metrics_and_images
    combined_rgb = torch.cat([gt_rgb, predicted_rgb], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 4 and 3

@ichsan2895
Copy link

ichsan2895 commented Apr 22, 2025

Another error:

Images and Mask does not work well

>> ns-train splatfacto --vis viewer+wandb \
    --pipeline.model.use-bilateral-grid True --pipeline.model.color-corrected-metrics True \
    --pipeline.datamanager.batch-size 2 \
    colmap \
    --data path/to/scene --downscale-factor 1 --colmap-path "sparse/0" \
    --images-path "images" --masks-path "masks"
[08:31:21] Caching / undistorting train images                                            �]8;id=442417;file:///workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py\full_images_datamanager.py�]8;;�\:�]8;id=33326;file:///workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/data/datamanagers/full_images_datamanager.py#241�\241�]8;;�\
Caching / undistorting train images ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:3400:0100:02
/usr/local/lib/python3.10/dist-packages/torch/_inductor/compile_fx.py:135: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
  warnings.warn(
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 47.2053             
VanillaPipeline.get_train_loss_dict: 47.2035             
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 301, in get_train_loss_dict
    loss_dict = self.model.get_loss_dict(model_outputs, batch, metrics_dict)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 687, in get_loss_dict
    mask = self._downscale_if_required(batch["mask"])
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 452, in _downscale_if_required
    return resize_image(image, d)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 66, in resize_image
    downscaled = tf.conv2d(image, weight, stride=d)
RuntimeError: Input type (CUDABoolType) and weight type (torch.cuda.FloatTensor) should be the same
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 272, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 257, in main
    launch(
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 190, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/scripts/train.py", line 101, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 266, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/engine/trainer.py", line 502, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/utils/profiler.py", line 111, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 301, in get_train_loss_dict
    loss_dict = self.model.get_loss_dict(model_outputs, batch, metrics_dict)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 687, in get_loss_dict
    mask = self._downscale_if_required(batch["mask"])
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 452, in _downscale_if_required
    return resize_image(image, d)
  File "/workspace/NERFSTUDIO_v115a2/nerfstudio/nerfstudio/models/splatfacto.py", line 66, in resize_image
    downscaled = tf.conv2d(image, weight, stride=d)
RuntimeError: Input type (CUDABoolType) and weight type (torch.cuda.FloatTensor) should be the same

@ichsan2895
Copy link

ichsan2895 commented Apr 27, 2025

I have tried padding with alpha channel in the images thats does not have same resolution as first image in the batch. It worked but the result is not good.

Now, I have an idea @akristoffersen for using multi-cameras with batch. It only stack same images with same resolution of the first image of the batch. If it does not have any same images with same resolution in the batch, return the first image itself. To preserve the batch shape, I create a dummy of clone of first image in the batch.

Add this code in nerfstudio/nerfstudio/data/utils/nerfstudio_collate.py

    def stacked_batches(batch, dim=0, out=None):
        if not batch:
            raise ValueError("Batch cannot be empty")
        
        # Reference size from the first tensor
        ref_h, ref_w, ref_c = batch[0].shape
        
        # Collect tensors that match the reference size
        matching = [tensor for tensor in batch if tensor.shape == (ref_h, ref_w, ref_c)]
        
        if not matching:
            raise ValueError("No tensors with matching resolution found")
        
        # Create output list, starting with matching tensors
        result = matching.copy()
        
        # Fill remaining slots with duplicates of the first tensor to match original batch length
        while len(result) < len(batch):
            result.append(batch[0])
        
        # Stack all tensors
        return torch.stack(result, dim=dim, out=out)
    
   if isinstance(elem, torch.Tensor):
          out = None
          if torch.utils.data.get_worker_info() is not None:
              # If in a background process, use shared memory
              numel = sum(x.numel() for x in batch)
              storage = elem.untyped_storage()._new_shared(numel, device=str(elem.device))
              out = elem.new(storage).resize_(len(batch), *list(elem.size()))
          return stacked_batches(batch, 0, out=out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants