-
Notifications
You must be signed in to change notification settings - Fork 724
Closed
Labels
Description
Running GPU decoding/encoding with ffmpeg
command takes about 300MB of GPU memory.
ffmpeg -hide_banner -y -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -i "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4" -c:a copy -c:v h264_nvenc test.mp4
$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
2023/03/10 11:04:18.960, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 35, 34 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:20.984, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 34, 22 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:23.003, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 3 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:25.017, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 3 %, 5 %, 10240 MiB, 7923 MiB, 2154 MiB
2023/03/10 11:04:27.027, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 4 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:29.043, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 3 %, 5 %, 10240 MiB, 7926 MiB, 2151 MiB
Running the following script, which involves GPU decode/encode and YUV444P conversion takes about 1.3 GB (!) memory. (about 200 MB is from PyTorch CUDA Tensor)
It looks like some 700 MB is still occupied when decoding/encoding is not happening. We need to look into what is consuming so much memory.
600MB of active memory might be about device context, which we might be able to reuse among decoder/encoder. (#3160)
import time
from datetime import datetime
import torchaudio
from torchaudio.io import StreamReader, StreamWriter
# torchaudio.utils.ffmpeg_utils.set_log_level(36)
input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"
def test():
r = StreamReader(input)
i = r.get_src_stream_info(r.default_video_stream)
r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")
w = StreamWriter(output)
w.add_video_stream(
height=i.height,
width=i.width,
frame_rate=i.frame_rate,
format="yuv444p",
encoder_format="yuv444p",
encoder="h264_nvenc",
hw_accel="cuda:0",
)
with w.open():
num_frames = 0
for chunk, in r.stream():
num_frames += chunk.size(0)
w.write_video_chunk(0, chunk)
return num_frames
total_num_frames = 0
while True:
t0 = time.monotonic()
num_frames = test()
elapsed = time.monotonic() - t0
total_num_frames += num_frames
time.sleep(10)
print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
2023/03/10 11:06:34.270, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 18 %, 18 %, 10240 MiB, 8292 MiB, 1785 MiB // BEFORE LAUNCH
2023/03/10 11:06:36.283, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 14 %, 26 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:06:38.297, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 4 %, 58 %, 10240 MiB, 8289 MiB, 1788 MiB
2023/03/10 11:06:40.322, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 35, 52 %, 10 %, 10240 MiB, 7023 MiB, 3054 MiB // ENCODE/DECODE
2023/03/10 11:06:42.346, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 60 %, 11 %, 10240 MiB, 7023 MiB, 3054 MiB
2023/03/10 11:06:44.358, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 56 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:46.378, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 57 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:48.389, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 53 %, 10 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:50.407, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 58 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:52.421, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 59 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:54.445, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 7 %, 2 %, 10240 MiB, 7623 MiB, 2454 MiB // SLEEP
2023/03/10 11:06:56.464, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P3, 4, 4, 37, 0 %, 1 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:06:58.477, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P5, 4, 4, 36, 1 %, 12 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:00.489, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 33 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:02.502, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 25 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:04.564, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 5 %, 22 %, 10240 MiB, 7155 MiB, 2922 MiB // ENCODE/DECODE
2023/03/10 11:07:06.580, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 60 %, 11 %, 10240 MiB, 7004 MiB, 3073 MiB