-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[DISCUSSION NEEDED] AV-sync fundamental issues #1248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR Bruno!
This is indeed a problem with the current audio synchronization.
I've made a few comments, let me know what you think.
My main concern is that we introduce a slight backwards-incompatibility for the timestamps, so we should think about proper deprecation strategies for it.
@@ -250,4 +268,4 @@ def read_video_timestamps(filename): | |||
container.streams.video[0], {'video': 0}) | |||
video_fps = float(container.streams.video[0].average_rate) | |||
container.close() | |||
return [x.pts for x in video_frames], video_fps | |||
return [x.pts * x.time_base for x in video_frames], video_fps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a BC-breaking change, and I wonder if there would be a way of keeping backwards-compatibility for one version before removing the old behavior, with a loud warning.
Maybe we should add an extra option to read_video_timestamps
, something like output_format=None
, and raise a warning is it is None
, and take the current behavior in this case. Similar to what has been done in interpolate
, with the align_corners
flag.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes a lot of sense.
Is there a preferred warning system in torchvision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just use warnings.warn
for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S.G.
what do we want to keep as a default behaviour?
for start in range(5): | ||
for l in range(1, 4): | ||
lv, _, _ = io.read_video(f_name, pts[start], pts[start + l - 1]) | ||
s_data = data[start:(start + l)] | ||
self.assertEqual(len(lv), l) | ||
self.assertTrue(s_data.equal(lv)) | ||
|
||
lv, _, _ = io.read_video(f_name, pts[4] + 1, pts[7]) | ||
lv, _, _ = io.read_video(f_name, pts[4] + 1 / (fps + 1), pts[7]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the need of the +1
in (fps + 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that the original test was here to make sure that if we try to decode from a pts that doesn't exist (pts[4]+1
) that we return the closest possible frame to that pts - then we check and get 4 frames (from pts[4]
to pts[7]
).
In the same way, 1/(fps+1)
is a non-existing pts that is the closest to the existing (pts[4]
).
raise unittest.SkipTest(msg) | ||
|
||
lv, la, info = io.read_video(f_name, pts[3], pts[7]) | ||
# FIXME: add Another video - this one doesn't have audio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option for getting the video would be to use av.datasets
https://github.com/mikeboers/PyAV/blob/cd458ffe89988b0feca44da6c56ef29f17555962/tests/common.py#L11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable - I'll take a look.
Subsumed by #1331 |
As pointed out in #1221
_read_from_stream
function assumes that all the offsets are global, which is fundamentally wrong aspts
is stream-specific representation.Here, I propose to make precomputed
pts
a global offset in the form ofk/fps
and typefraction.Fraction
wherek
is k-th frame. Then, in every function which operates on stream level, globalpts
would be converted to a stream specific measure by dividing it by stream specific time-base, i.e.int(round(float(global_offset / stream_time_base)))
.Obviously, tests are modified to reflect this change.
This is an early version and a suggestion on how to tackle formerly mentioned issue. If anyone has better Idea, please let me know.
cc @fmassa @iyah4888