Skip to content

[DISCUSSION NEEDED] AV-sync fundamental issues #1248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

[DISCUSSION NEEDED] AV-sync fundamental issues #1248

wants to merge 2 commits into from

Conversation

bjuncek
Copy link
Contributor

@bjuncek bjuncek commented Aug 19, 2019

As pointed out in #1221 _read_from_stream function assumes that all the offsets are global, which is fundamentally wrong as pts is stream-specific representation.

Here, I propose to make precomputed pts a global offset in the form of k/fps and type fraction.Fraction where k is k-th frame. Then, in every function which operates on stream level, global pts would be converted to a stream specific measure by dividing it by stream specific time-base, i.e. int(round(float(global_offset / stream_time_base))).

Obviously, tests are modified to reflect this change.


This is an early version and a suggestion on how to tackle formerly mentioned issue. If anyone has better Idea, please let me know.

cc @fmassa @iyah4888

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR Bruno!

This is indeed a problem with the current audio synchronization.

I've made a few comments, let me know what you think.
My main concern is that we introduce a slight backwards-incompatibility for the timestamps, so we should think about proper deprecation strategies for it.

@@ -250,4 +268,4 @@ def read_video_timestamps(filename):
container.streams.video[0], {'video': 0})
video_fps = float(container.streams.video[0].average_rate)
container.close()
return [x.pts for x in video_frames], video_fps
return [x.pts * x.time_base for x in video_frames], video_fps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a BC-breaking change, and I wonder if there would be a way of keeping backwards-compatibility for one version before removing the old behavior, with a loud warning.

Maybe we should add an extra option to read_video_timestamps, something like output_format=None, and raise a warning is it is None, and take the current behavior in this case. Similar to what has been done in interpolate, with the align_corners flag.
Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes a lot of sense.
Is there a preferred warning system in torchvision?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just use warnings.warn for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S.G.
what do we want to keep as a default behaviour?

for start in range(5):
for l in range(1, 4):
lv, _, _ = io.read_video(f_name, pts[start], pts[start + l - 1])
s_data = data[start:(start + l)]
self.assertEqual(len(lv), l)
self.assertTrue(s_data.equal(lv))

lv, _, _ = io.read_video(f_name, pts[4] + 1, pts[7])
lv, _, _ = io.read_video(f_name, pts[4] + 1 / (fps + 1), pts[7])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the need of the +1 in (fps + 1)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that the original test was here to make sure that if we try to decode from a pts that doesn't exist (pts[4]+1) that we return the closest possible frame to that pts - then we check and get 4 frames (from pts[4] to pts[7]).

In the same way, 1/(fps+1) is a non-existing pts that is the closest to the existing (pts[4]).

raise unittest.SkipTest(msg)

lv, la, info = io.read_video(f_name, pts[3], pts[7])
# FIXME: add Another video - this one doesn't have audio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable - I'll take a look.

@fmassa
Copy link
Member

fmassa commented Sep 30, 2019

Subsumed by #1331

@fmassa fmassa closed this Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants