Skip to content

Audio-video synchronization conceptual issue #1221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bjuncek opened this issue Aug 9, 2019 · 4 comments
Closed

Audio-video synchronization conceptual issue #1221

bjuncek opened this issue Aug 9, 2019 · 4 comments

Comments

@bjuncek
Copy link
Contributor

bjuncek commented Aug 9, 2019

It seems that _read_from_stream function assumes pts to be globally consistent across the streams when that might not necessarily be the case.

Specifically the pyav docs state that pts is The presentation timestamp in time_base units for this frame. What that would mean is that if we have a reference frame, let's say video, then when reading the corresponding audio, audio pts would have to be transformed somewhat like this

offset = round(end_offset * reference_time_base / current_time_base)

One option that I'd propose is to have:

def _read_from_stream(container, start_offset, end_offset, stream, stream_name, reference_tb=None):

# ...


    current_tb = container.get(stream).time_base
    seek_offset = start_offset
    if referece_tb is not None:
        # Make sure that we have time_base dependent presentation time
        seek_offset = round(seek_offset * referece_tb / current_tb)
        end_offset  = round(end_offset * referece_tb / current_tb)

# ...

    return result, current_tb

then, subsequently, we'd need to change the _align_audio_frames as well to match this.

Any thoughts?
If there is agreement, and try to send out PR tomorrow evening BST.

cc @fmassa @iyah4888


Docs on pyav streams:
https://docs.mikeboers.com/pyav/develop/api/stream.html

@iyah4888
Copy link

iyah4888 commented Aug 9, 2019

Thank you, @bjuncek, for moving this issue to the official repository.

Another option could be to use an absolute time scale (second) as a unified unit.
Since each stream has their own time base, by multiplying PTS and its time base, (PTS for a stream)*(TIME_BASE corresponding to the stream), we can unify the semantic meaning of time stamp into the absolute time, seconds.
Would it be more intuitive, so that it may lead to easy debugging as well?
Thanks for working on this!

@fmassa
Copy link
Member

fmassa commented Aug 12, 2019

This sounds like a sensible thing to do, and we should probably convert to a global reference (for example, seconds or a converted pts wrt video).

I'd be glad to accept a PR fixing this. But it would be great as well to have proper tests for this, so that we make sure we don't break this in the future.

@bjuncek
Copy link
Contributor Author

bjuncek commented Aug 19, 2019

Added a simple PR to fix it - lets maybe discuss ways to test sych (and audio in general) offline?

@fmassa
Copy link
Member

fmassa commented Sep 30, 2019

Fixed via #1331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants