Audio-video synchronization conceptual issue #1221

bjuncek · 2019-08-09T10:12:50Z

It seems that _read_from_stream function assumes pts to be globally consistent across the streams when that might not necessarily be the case.

Specifically the pyav docs state that pts is The presentation timestamp in time_base units for this frame. What that would mean is that if we have a reference frame, let's say video, then when reading the corresponding audio, audio pts would have to be transformed somewhat like this

offset = round(end_offset * reference_time_base / current_time_base)

One option that I'd propose is to have:

def _read_from_stream(container, start_offset, end_offset, stream, stream_name, reference_tb=None):

# ...


    current_tb = container.get(stream).time_base
    seek_offset = start_offset
    if referece_tb is not None:
        # Make sure that we have time_base dependent presentation time
        seek_offset = round(seek_offset * referece_tb / current_tb)
        end_offset  = round(end_offset * referece_tb / current_tb)

# ...

    return result, current_tb

then, subsequently, we'd need to change the _align_audio_frames as well to match this.

Any thoughts?
If there is agreement, and try to send out PR tomorrow evening BST.

cc @fmassa @iyah4888

Docs on pyav streams:
https://docs.mikeboers.com/pyav/develop/api/stream.html

The text was updated successfully, but these errors were encountered:

iyah4888 · 2019-08-09T14:43:27Z

Thank you, @bjuncek, for moving this issue to the official repository.

Another option could be to use an absolute time scale (second) as a unified unit.
Since each stream has their own time base, by multiplying PTS and its time base, (PTS for a stream)*(TIME_BASE corresponding to the stream), we can unify the semantic meaning of time stamp into the absolute time, seconds.
Would it be more intuitive, so that it may lead to easy debugging as well?
Thanks for working on this!

fmassa · 2019-08-12T19:16:36Z

This sounds like a sensible thing to do, and we should probably convert to a global reference (for example, seconds or a converted pts wrt video).

I'd be glad to accept a PR fixing this. But it would be great as well to have proper tests for this, so that we make sure we don't break this in the future.

bjuncek · 2019-08-19T12:42:49Z

Added a simple PR to fix it - lets maybe discuss ways to test sych (and audio in general) offline?

fmassa · 2019-09-30T13:38:13Z

Fixed via #1331

fmassa added bug help wanted module: io module: video labels Aug 12, 2019

bjuncek mentioned this issue Aug 19, 2019

[DISCUSSION NEEDED] AV-sync fundamental issues #1248

Closed

fmassa closed this as completed Sep 30, 2019

tmabraham mentioned this issue Mar 2, 2020

torchvision.io.video.read_video pts units for video only #1931

Closed

v-iashin mentioned this issue Jul 16, 2020

VideoClips: audio clips do not correspond to video clips #2474

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio-video synchronization conceptual issue #1221

Audio-video synchronization conceptual issue #1221

bjuncek commented Aug 9, 2019 •

edited

Loading

iyah4888 commented Aug 9, 2019 •

edited

Loading

Uh oh!

fmassa commented Aug 12, 2019

Uh oh!

bjuncek commented Aug 19, 2019

Uh oh!

fmassa commented Sep 30, 2019

Uh oh!

Audio-video synchronization conceptual issue #1221

Audio-video synchronization conceptual issue #1221

Comments

bjuncek commented Aug 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

iyah4888 commented Aug 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa commented Aug 12, 2019

Uh oh!

bjuncek commented Aug 19, 2019

Uh oh!

fmassa commented Sep 30, 2019

Uh oh!

bjuncek commented Aug 9, 2019 •

edited

Loading

iyah4888 commented Aug 9, 2019 •

edited

Loading