Reduce GPU memory consumption

Running GPU decoding/encoding with `ffmpeg` command takes about 300MB of GPU memory.

```
ffmpeg -hide_banner -y -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid  -i "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4" -c:a copy -c:v h264_nvenc test.mp4
```

```
$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
2023/03/10 11:04:18.960, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 35, 34 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:20.984, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 34, 22 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:23.003, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 3 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:25.017, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 3 %, 5 %, 10240 MiB, 7923 MiB, 2154 MiB
2023/03/10 11:04:27.027, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 4 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:29.043, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 3 %, 5 %, 10240 MiB, 7926 MiB, 2151 MiB
```

Running the following script, which involves GPU decode/encode and YUV444P conversion takes about 1.3 GB (!) memory. (about 200 MB is from PyTorch CUDA Tensor)
It looks like some 700 MB is still occupied when decoding/encoding is not happening. We need to look into what is consuming so much memory.

600MB of active memory might be about device context, which we might be able to reuse among decoder/encoder. (#3160)

<details>

```python
import time
from datetime import datetime

import torchaudio
from torchaudio.io import StreamReader, StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)

input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for chunk, in r.stream():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    return num_frames


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    time.sleep(10)
    print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
```

</details>

```
2023/03/10 11:06:34.270, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 18 %, 18 %, 10240 MiB, 8292 MiB, 1785 MiB // BEFORE LAUNCH
2023/03/10 11:06:36.283, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 14 %, 26 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:06:38.297, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 4 %, 58 %, 10240 MiB, 8289 MiB, 1788 MiB
2023/03/10 11:06:40.322, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 35, 52 %, 10 %, 10240 MiB, 7023 MiB, 3054 MiB // ENCODE/DECODE
2023/03/10 11:06:42.346, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 60 %, 11 %, 10240 MiB, 7023 MiB, 3054 MiB
2023/03/10 11:06:44.358, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 56 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:46.378, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 57 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:48.389, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 53 %, 10 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:50.407, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 58 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:52.421, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 59 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:54.445, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 7 %, 2 %, 10240 MiB, 7623 MiB, 2454 MiB // SLEEP
2023/03/10 11:06:56.464, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P3, 4, 4, 37, 0 %, 1 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:06:58.477, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P5, 4, 4, 36, 1 %, 12 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:00.489, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 33 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:02.502, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 25 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:04.564, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 5 %, 22 %, 10240 MiB, 7155 MiB, 2922 MiB // ENCODE/DECODE
2023/03/10 11:07:06.580, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 60 %, 11 %, 10240 MiB, 7004 MiB, 3073 MiB
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce GPU memory consumption #3165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce GPU memory consumption #3165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions