Add hardware acceleration to video decoding #331

rvillalba-novetta · 2018-05-28T13:30:12Z

Add optional dictionary parameter to input open allowing the user to pass setting for hardware acceleration
Add nvidia libraries to the ffmpeg build if available

mikeboers · 2018-05-28T13:31:42Z

Note that this is related to #307.

mikeboers · 2018-09-20T03:01:53Z

I've rebased this onto the current develop. It is in the hwaccel branch.

hunterjm · 2019-01-10T02:54:45Z

Any ideas on when this will be updated with the current branch and merged?

jlaine · 2019-04-12T16:40:48Z

@mikeboers I'm totally confused about this PR. If the most up-to-date code is on your branch, let's close this PR and open a new one based on your branch?

jlaine · 2019-04-12T21:47:54Z

I have rebased this PR on top of develop, pushed it to @rvillalba-novetta 's repo and killed the hwaccel branch on the main repo. The status seems to be : it only builds with ffmpeg 4.x

mikeboers · 2019-04-13T13:31:02Z

Thanks for that @jlaine. Without putting much effort in (because as you can tell I'm not at the moment), can you identify why it is FFmpeg 4.x only?

jlaine · 2019-04-13T13:37:25Z

It doesn't even compile on older ffmpegs so I'm wondering if it's an entirely new API

mikeboers · 2019-04-13T13:51:37Z

Looks like the headers go back to at least 3.3.

mikeboers · 2019-04-13T13:53:26Z

avcodec_get_hw_config does not exist < 4.

mikeboers · 2019-04-13T13:57:52Z

As far as the example is concerned, it replaces this (annoying) function -> https://www.ffmpeg.org/doxygen/3.4/hw__decode_8c_source.html#l00047

mabl · 2019-06-13T18:43:56Z

I have had great results with this patch. I additionally added hwaccel to CodecContext.create and Codec.create, to allow creation of hardware accelerated Codecs without a container.

mikeboers · 2019-10-18T16:04:10Z

Now that we've dropped FFmpeg < 4.0, this is more approachable as is. The tests pass. So now (at some point) we can decide if we like the API and how it is implemented.

mikeboers · 2019-10-30T21:01:42Z

Actually... I think I'm done with this at this point because:

the ffmpeg CLI isn't even able to use hardware decoding on my workstation, so this is a pain to work on, and
man ffmpeg says:

Note that most acceleration methods are intended for playback and will not be faster than software decoding on modern CPUs. Additionally, ffmpeg will usually need to copy the decoded frames from the GPU memory into the system memory, resulting in further performance loss. This option is thus mainly useful for testing.

Since it is unreasonable to think that anyone is decoding with PyAV just for playback, this seems pretty pointless to keep fighting.

Thanks, everyone, for your time on this, but I'm getting off here.

We will not likely continue down this path, as `man ffmpeg` makes the point that CPU decoding is about the same speed as GPU decoding, and so is only really of use for playback. I don't think PyAVs target is such high performance playback, so we don't need to make the design concesions required for this branch. NOTE: This has not been tested to work. Two commits back is the original PR squashed into a single commit and is more likely to work, although if there is any hope of this being merged it will have to look more like this commit does. See (and further any discussion) #331 on GitHub.

See #331.

We will not likely continue down this path, as `man ffmpeg` makes the point that CPU decoding is about the same speed as GPU decoding, and so is only really of use for playback. I don't think PyAVs target is such high performance playback, so we don't need to make the design concesions required for this branch. NOTE: This has not been tested to work. Two commits back is the original PR squashed into a single commit and is more likely to work, although if there is any hope of this being merged it will have to look more like this commit does. See (and further any discussion) #331 on GitHub.

wegel · 2019-11-05T15:33:35Z

Hmm, hardware acceleration in the context of PyAV for me is mainly for encoding, not decoding. For encoding there's a substantial performance gain, and you can do multiple encodes in parallel if you have the correct hardware.

See PyAV-Org#331.

razvanphp · 2020-05-29T22:37:54Z

Not to open a new "question-like" issue, is it expected we can use hardware accelerated encoders? For example, h264_v4l2m2m or h264_omx?

With PyAV 8.0.1 the first one triggers:

...
    packages = self.codec.encode(frame)
  File "av/codec/context.pyx", line 458, in av.codec.context.CodecContext.encode
  File "av/codec/context.pyx", line 464, in av.codec.context.CodecContext.encode
  File "av/codec/context.pyx", line 275, in av.codec.context.CodecContext.open
  File "av/error.pyx", line 336, in av.error.err_check
av.error.ValueError: [Errno 22] Invalid argument; last error log: [h264_v4l2m2m] can't configure encoder

and it's used like:

codec = av.CodecContext.create("h264_v4l2m2m", "w")
codec.width = frame.width
codec.height = frame.height
codec.pix_fmt = "yuv420p"
codec.time_base = fractions.Fraction(1, MAX_FRAME_RATE)
codec.options = {
    "h264_profile": "baseline",
    "level": "31",
    "tune": "zerolatency",
}

packages = self.codec.encode(frame)

I also tried h264_omx on raspberry Pi4, and while it works in ffmpeg directly, on pyav it hangs the app unresponsive.

Any suggestions? There is also nothing in the documentation about this flow...

Thank you!

pallas · 2020-07-03T23:16:05Z

FWIW, I do want to receive a h264 stream from a remote machine and directly render on a Raspberry Pi. I may take a shot at this if PyAV sees it as out of scope.

vade · 2020-08-25T21:14:01Z

Theres other use cases where hardware decode is very useful outside of playback - decode to hardware accelerated memory.

For my use case I would like to be able to decode into memory backed by the GPU to send as a tensor for inference for machine learning. PyAV is perfect as a shim to libAV to have finer grained frame access and in process control - however without GPU accelerated memory its not quite as appealing.

Note this is also very helpful for server side on demand rendering on headless GPU instances - decoding directly to a texture is quite nice. We did used C++ for this project using LibAV with nvdec on AWS hardware : https://rarevolume.com/work/reuterstv/ - it would have been much slower (and more expensive) sans libAVs nvdec implementation.

Being able to use Python + libav + nvdec / nvenc would allow a a lot of nice optimal code paths outside of rendering:

encoding (as mentioned)
rendering
machine learning inference
machine learning training
automated quality assessment
hardware accelerated transcoding
GPU accelerated computer vision
(and so on)

Thank you.

vade · 2020-08-26T16:19:46Z

Actually... I think I'm done with this at this point because:

the ffmpeg CLI isn't even able to use hardware decoding on my workstation, so this is a pain to work on, and

man ffmpeg says:

Note that most acceleration methods are intended for playback and will not be faster than software decoding on modern CPUs. Additionally, ffmpeg will usually need to copy the decoded frames from the GPU memory into the system memory, resulting in further performance loss. This option is thus mainly useful for testing.

Since it is unreasonable to think that anyone is decoding with PyAV just for playback, this seems pretty pointless to keep fighting.

Thanks, everyone, for your time on this, but I'm getting off here.

Hi - I was able to do some tests using VPF (Nvidia's) new video performance framework in Python and test these claims.

They dont hold up. Not even close.

PyAV on a Google Colab machine for a h.264 QuickTime has the following performance:

One comment for those in the public - while the shared Google Colab environment can have different load so performance benchmarking is difficult, for the same h.264 QuickTime mov file -

LibAV CPU:

Decode took: 0.9556884765625 seconds
fps 92.08021458679269
3.8401634946991954 x realtime

VPF try on a T4 on collab GPU instance try one:

Decode took: 0.42092251777648926 seconds
fps 209.06460520301312
8.71894433062566 x realtime

Try 2 (less contention perhaps?)

Decode took:  0.2164461612701416 seconds
fps 406.5676170166362
16.955717664216532 x realtime

This is rendering back to the CPU, so it includes GPU to CPU transfer. That is a non trivial performance increase with NVDec.

vade · 2020-08-26T16:23:57Z

my suggestion to the VPF authors at Nvidia is to leverage PyAV for Demuxing and have them build a special NVEnc NVDec packet decoder which can vend surfaces (on the gpu) or frames (CPU backed memory which has been read back)

This would mean HWAccel is ignored - so other existing back ends dont get support.
However, it does mean that Python could vend GPU backed frames using PyAV and VPF together - and keep PyAV clean of the complexities of HWAccel support (which is admittedly tricky)

I do feel however its a murky fix and not really inline with having LibAV fully ported to python. Given the above perf delta - which is real, demonstrable (look at new updates from Adobe switching to use NV encoder and decoder) as well as Apple using HW Accel in Video Toolbox via the T2 and AMD chipsets - the gains are tremendous for pro workflows.

I highly suggest this is reconsidered.

Can I help somehow?

philipp-schmidt · 2021-01-18T15:41:28Z

Any news regarding the hardware acceleration? I'm currently using VPF to do this but I'm having issues with rtsp transport of IP cameras.

Could someone outline how a solution with VPF and PyAV in tandem would look like?

Essentially I want PyAV to handle transport and container related stuff and then let VPF handle the H.264 decoding. Is this a valid use-case for PyAV?

NVIDIA/VideoProcessingFramework#132

philipp-schmidt · 2021-01-18T17:55:22Z

VPF authors have found a way to combine this and are waiting for bitstream support in PyAV

NVIDIA/VideoProcessingFramework#99 (comment)

kennethtang4 · 2021-12-18T13:48:11Z

I think one of the main reasons to use hardware acceleration is to offload the processing power to GPU and keep the CPU free, not only about the processing time only. CPU workload dropped from 30+% to less than 2% after enabling hardware acceleration for decoding 1080P@30. I did that by using C++ to call FFmpeg libraries and I would say it rather simple after all as long as you know the concept behind it and which function to call. Though, I am using the precompiled FFmpeg libraries (libavcodec, libavformat etc…) as it is tedious to compile every dependencies for each different hardware.

I am more than happy to help if Mike is up to it. Could setup a remote workstation for you to test things out if that is you concern.

mikeboers added the enhancement label May 28, 2018

notedit mentioned this pull request May 29, 2018

How to improve decode performance? #326

Closed

jlaine force-pushed the hwaccel branch 2 times, most recently from 437df6e to 18dc455 Compare April 12, 2019 21:45

mikeboers force-pushed the hwaccel branch from 18dc455 to a85347b Compare October 18, 2019 00:57

Add hardware acceleration to video decoding

94033ed

mikeboers force-pushed the hwaccel branch from a85347b to 94033ed Compare October 23, 2019 14:38

mikeboers closed this Oct 30, 2019

mikeboers mentioned this pull request Oct 30, 2019

Add hardware acceleration to video decoding #568

Closed

mikeboers mentioned this pull request Oct 30, 2019

Set hwaccel ? #307

Closed

mikeboers added a commit that referenced this pull request Oct 31, 2019

Declare hardware decoding off-limits in the docs.

c14e6ef

See #331.

pscholl pushed a commit to pscholl/PyAV that referenced this pull request Feb 27, 2020

Declare hardware decoding off-limits in the docs.

d1aefd1

See PyAV-Org#331.

razvanphp mentioned this pull request May 24, 2020

Video playback delay in server example aiortc/aiortc#104

Closed

PyAV-Org locked as too heated and limited conversation to collaborators Mar 25, 2022

Add hardware acceleration to video decoding #331

Add hardware acceleration to video decoding #331

Uh oh!

Conversation

rvillalba-novetta commented May 28, 2018

Uh oh!

mikeboers commented May 28, 2018

Uh oh!

mikeboers commented Sep 20, 2018

Uh oh!

hunterjm commented Jan 10, 2019

Uh oh!

jlaine commented Apr 12, 2019

Uh oh!

jlaine commented Apr 12, 2019

Uh oh!

mikeboers commented Apr 13, 2019

Uh oh!

jlaine commented Apr 13, 2019

Uh oh!

mikeboers commented Apr 13, 2019

Uh oh!

mikeboers commented Apr 13, 2019

Uh oh!

mikeboers commented Apr 13, 2019

Uh oh!

mabl commented Jun 13, 2019

Uh oh!

mikeboers commented Oct 18, 2019

Uh oh!

mikeboers commented Oct 30, 2019

Uh oh!

wegel commented Nov 5, 2019

Uh oh!

razvanphp commented May 29, 2020

Uh oh!

pallas commented Jul 3, 2020

Uh oh!

vade commented Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vade commented Aug 26, 2020

Uh oh!

vade commented Aug 26, 2020

Uh oh!

philipp-schmidt commented Jan 18, 2021

Uh oh!

philipp-schmidt commented Jan 18, 2021

Uh oh!

kennethtang4 commented Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

vade commented Aug 25, 2020 •

edited

Loading

kennethtang4 commented Dec 18, 2021 •

edited

Loading