diff --git a/images/rvm_feature.gif b/images/rvm_feature.gif new file mode 100644 index 00000000..838695df Binary files /dev/null and b/images/rvm_feature.gif differ diff --git a/images/rvm_logo.gif b/images/rvm_logo.gif new file mode 100644 index 00000000..03b51227 Binary files /dev/null and b/images/rvm_logo.gif differ diff --git a/peterl1n_robustvideomatting.md b/peterl1n_robustvideomatting.md new file mode 100644 index 00000000..a8e8f743 --- /dev/null +++ b/peterl1n_robustvideomatting.md @@ -0,0 +1,99 @@ +--- +layout: hub_detail +background-class: hub-background +body-class: hub +category: researchers +title: Robust Video Matting +summary: RVM is a matting network for background replacement tasks. It is trained on human subjects and supports high resolutions such as 4K/HD. +image: rvm_logo.gif +author: Peter Lin +tags: [vision, scriptable] +github-link: https://github.com/PeterL1n/RobustVideoMatting +github-id: PeterL1n/RobustVideoMatting +featured_image_1: rvm_feature.gif +featured_image_2: no-image +accelerator: cuda-optional +--- + +### Model Description + +Robust Video Matting (RVM) is human matting network proposed in paper ["Robust High-Resolution Video Matting with Temporal Guidance"](https://peterl1n.github.io/RobustVideoMatting/). It can process high resolutions such as 4K/HD in real-time (depending on the GPU). It is also natively designed for processing videos by using a recurrent architecture to exploit temporal information. + +### Demo + +Watch the showreel video on [YouTube](https://youtu.be/Jvzltozpbpk) or [Bilibili](https://www.bilibili.com/video/BV1Z3411B7g7/) to see how it performs! + + +### Load the Model + +Make sure you have `torch>=1.8.1` and `torchvision>=0.9.1` installed. + +```python +import torch + +model = torch.hub.load("PeterL1n/RobustVideoMatting", "mobilenetv3").cuda().eval() +``` + +We recommend using the `mobilenetv3` backbone for most cases. We also provide a `resnet50` backbone with slight improvement in performance. + +### Use Inference API + +We provide a simple inference function `convert_video`. You can use it to apply our model on your videos. + +```sh +pip install av pims tqdm +``` + +```python +convert_video = torch.hub.load("PeterL1n/RobustVideoMatting", "converter") +``` + +```python +convert_video( + model, # The loaded model, can be on any device (cpu or cuda). + input_source='input.mp4', # A video file or an image sequence directory. + input_resize=None, # [Optional] Resize the input to (width, height). + downsample_ratio=None, # [Optional] Advanced hyperparameter. See inference doc. + output_type='video', # Choose "video" or "png_sequence" + output_composition='com.mp4', # File path if video; directory path if png sequence. + output_alpha="pha.mp4", # [Optional] Output the raw alpha prediction. + output_foreground="fgr.mp4", # [Optional] Output the raw foreground prediction. + output_video_mbps=4, # Output video mbps. Not needed for png sequence. + seq_chunk=4, # Process n frames at once for better parallelism. + num_workers=0, # Only for image sequence input. + progress=True # Print conversion progress. +) +``` + +More advanced usage, see [inference documentation](https://github.com/PeterL1n/RobustVideoMatting/blob/master/documentation/inference.md). Note that the inference function will likely not be real-time for HD/4K because video encoding/decoding are not done with hardware acceleration. You can implement your own inference loop as shown below. + +### Implement Custom Inference Loops + +```python +bgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda() # Green background. +rec = [None] * 4 # Initial recurrent states. +downsample_ratio = 0.25 # Adjust based on your video. + +with torch.no_grad(): + for src in __YOUR_VIDEO_READER__: # RGB video frame normalized to 0 ~ 1. + fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio) # Cycle the recurrent states. + out = fgr * pha + bgr * (1 - pha) # Composite to green background. + __YOUR_VIDEO_WRITER__(out) +``` + +* Since RVM uses a recurrent architecture, we initialize the recurrent states `rec` to `None` initially, and cycle the recurrent states when processing every frame. +* `src` can be `BCHW`, or `BTCHW` where `T` is a chunk of frames that can be given to the model at once to improve parallalism. +* `downsample_ratio` is an important hyperparameter. See [inference documentation](https://github.com/PeterL1n/RobustVideoMatting/blob/master/documentation/inference.md) for more detail. + +### References + +``` +@misc{rvm, + title={Robust High-Resolution Video Matting with Temporal Guidance}, + author={Shanchuan Lin and Linjie Yang and Imran Saleemi and Soumyadip Sengupta}, + year={2021}, + eprint={2108.11515}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` \ No newline at end of file diff --git a/test_run_python_code.py b/test_run_python_code.py index f44a2856..d20d5f91 100644 --- a/test_run_python_code.py +++ b/test_run_python_code.py @@ -23,6 +23,8 @@ def test_run_file(file_path): pytest.skip("temporarily disabled") if 'pytorch_fairseq_roberta' in file_path: pytest.skip("temporarily disabled") + if 'robustvideomatting' in file_path: + pytest.skip("temporarily disabled") # We just run the python files in a separate sub-process. We really want a # subprocess here because otherwise we might run into package versions