Re-use HW device/frames context 

Currently when performing HW decoding / encoding, HW contexts are recreated at each invocation.

Looking at the design of [AVBufferRef](https://ffmpeg.org/doxygen/4.1/structAVBufferRef.html) and how it is used for `AVCodecContext::hw_device_ctx` in [hw_decode.c](https://ffmpeg.org/doxygen/4.1/hw_decode_8c-example.html), they should be reusable across different invocations (and perhaps encoding/decoding.)

Reusing should reduce the GPU memory usage and initialization time.

Note: Currently, launching GPU decoder/encoder at the same time consumes about 600 MB of GPU memory.

![335029419_1175635926473827_4534254760631136073_n](https://user-images.githubusercontent.com/855818/224187967-71402f0e-5b5a-4cb4-a22b-fa06e49f82fe.png)

<details><summary>code</summary>

```python
import time
from datetime import datetime

import torchaudio
from [torchaudio.io](http://torchaudio.io/) import StreamReader, StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)

input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(10, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with [w.open](http://w.open/)():
        num_frames = 0
        for chunk, in [r.stream](http://r.stream/)():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    return num_frames


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    time.sleep(10)
    print(f"{[datetime.now](http://datetime.now/)()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
```

</details>

## Implementation Idea

1. When GPU decode/encode is firstly launched, store a reference to HW [device|frames] context to cache store.
2. Subsequent call will reuse the cache.
3. Should have utility to remove cache, (and disable reuse?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-use HW device/frames context #3160

Implementation Idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Re-use HW device/frames context #3160

Description

Implementation Idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions