Skip to content

Reduce GPU memory consumption #3165

@mthrok

Description

@mthrok

Running GPU decoding/encoding with ffmpeg command takes about 300MB of GPU memory.

ffmpeg -hide_banner -y -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid  -i "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4" -c:a copy -c:v h264_nvenc test.mp4
$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
2023/03/10 11:04:18.960, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 35, 34 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:20.984, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 34, 22 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:23.003, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 3 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:25.017, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 3 %, 5 %, 10240 MiB, 7923 MiB, 2154 MiB
2023/03/10 11:04:27.027, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 4 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:29.043, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 3 %, 5 %, 10240 MiB, 7926 MiB, 2151 MiB

Running the following script, which involves GPU decode/encode and YUV444P conversion takes about 1.3 GB (!) memory. (about 200 MB is from PyTorch CUDA Tensor)
It looks like some 700 MB is still occupied when decoding/encoding is not happening. We need to look into what is consuming so much memory.

600MB of active memory might be about device context, which we might be able to reuse among decoder/encoder. (#3160)

import time
from datetime import datetime

import torchaudio
from torchaudio.io import StreamReader, StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)

input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for chunk, in r.stream():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    return num_frames


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    time.sleep(10)
    print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
2023/03/10 11:06:34.270, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 18 %, 18 %, 10240 MiB, 8292 MiB, 1785 MiB // BEFORE LAUNCH
2023/03/10 11:06:36.283, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 14 %, 26 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:06:38.297, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 4 %, 58 %, 10240 MiB, 8289 MiB, 1788 MiB
2023/03/10 11:06:40.322, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 35, 52 %, 10 %, 10240 MiB, 7023 MiB, 3054 MiB // ENCODE/DECODE
2023/03/10 11:06:42.346, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 60 %, 11 %, 10240 MiB, 7023 MiB, 3054 MiB
2023/03/10 11:06:44.358, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 56 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:46.378, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 57 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:48.389, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 53 %, 10 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:50.407, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 58 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:52.421, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 59 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:54.445, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 7 %, 2 %, 10240 MiB, 7623 MiB, 2454 MiB // SLEEP
2023/03/10 11:06:56.464, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P3, 4, 4, 37, 0 %, 1 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:06:58.477, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P5, 4, 4, 36, 1 %, 12 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:00.489, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 33 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:02.502, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 25 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:04.564, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 5 %, 22 %, 10240 MiB, 7155 MiB, 2922 MiB // ENCODE/DECODE
2023/03/10 11:07:06.580, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 60 %, 11 %, 10240 MiB, 7004 MiB, 3073 MiB

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions