VideoClips: audio clips do not correspond to video clips #2474

v-iashin · 2020-07-15T14:02:02Z

🐛 Bug

The audio stream does not correspond to the visual stream when torchvision.datasets.video_utils.VideoClips is used.

To Reproduce

Steps to reproduce the behavior:

Here are two videos I tested on Archive.zip
The code to reproduce

from torchvision.io import read_video
from torchvision.datasets.video_utils import VideoClips

VIDEO_PATH = './4fpkD4A_t1s_35000_45000.mp4'
VIDEO_PATH = './small.mp4'

if __name__ == "__main__":
    print(f'I am using: {VIDEO_PATH}')
    print(f'Output using torchvision.io.read_video:')
    visual, audio, info = read_video(VIDEO_PATH, pts_unit='sec')
    print('Visual:', visual.shape, 'Audio:', audio.shape, info)
    
    print(f'Output using torchvision.datasets.video_utils.VideoClips:')
    vclips = VideoClips([VIDEO_PATH], clip_length_in_frames=30, frames_between_clips=30)
    for i in range(vclips.num_clips()):
        visual, audio, info, vid_idx = vclips.get_clip(i)
        print(f'Clip #{i}', 'Visual:', visual.shape, 'Audio:', audio.shape, info)

The output I see

I am using: ./small.mp4
Output using torchvision.io.read_video:
Visual: torch.Size([166, 320, 560, 3]) Audio: torch.Size([1, 266240]) {'video_fps': 30.0, 'audio_fps': 48000}
Output using torchvision.datasets.video_utils.VideoClips:
/home/vladimir/miniconda3/envs/bug_report_video_clips/lib/python3.8/site-packages/torchvision/io/video.py:103: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  warnings.warn(
100.0%
Clip #0 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 86016]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #1 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 87142]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #2 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 87255]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #3 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #4 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}

I am using: ./4fpkD4A_t1s_35000_45000.mp4
Output using torchvision.io.read_video:
Visual: torch.Size([300, 720, 1280, 3]) Audio: torch.Size([2, 440320]) {'video_fps': 30.0, 'audio_fps': 44100}
Output using torchvision.datasets.video_utils.VideoClips:
/home/vladimir/miniconda3/envs/bug_report_video_clips/lib/python3.8/site-packages/torchvision/io/video.py:103: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  warnings.warn(
100.0%
Clip #0 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 14336]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #1 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #2 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #3 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #4 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #5 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #6 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #7 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #8 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #9 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 15360]) {'video_fps': 30.0, 'audio_fps': 44100}

Expected behavior

The output of torchvision.io.read_video is ok, and it is as expected. I provide it here for reference. Also, the visual streams returned from torchvision.datasets.video_utils.VideoClips are ok.
I expect the output of torchvision.datasets.video_utils.VideoClips().get_clip() to have a comparable number of samples, i.e. 48k or 44.1k for 1 second of 30 fps video. Instead, it outputs more samples than expected or just a fraction of it. Specifically, for ./small.mp4, it outputs ~87k in the first three clips and 0 in the last two (expected 48k at each clip), while for 4fpkD4A_t1s_35000_45000.mp4 it outputs ~14k for the first one and 15k for the rest of them (expected 44.1k at each). The later one does not even reach the expected 440k samples for the whole 10s video. Similarly, the earlier one totals to (86016 + 87142 + 87255) = 260413 which does not correspond to 266240 loaded in torchvision.io.read_video.

Environment

Collecting environment information...
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect

Nvidia driver version: 440.44
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] torch==1.5.1
[pip3] torchvision==0.6.0a0+35d732a
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.2.89              hfd86e86_1  
[conda] mkl                       2020.1                      217  
[conda] mkl-service               2.3.0            py38he904b0f_0  
[conda] mkl_fft                   1.1.0            py38h23d657b_0  
[conda] mkl_random                1.1.1            py38h0573a6f_0  
[conda] numpy                     1.18.5           py38ha1c710e_0  
[conda] numpy-base                1.18.5           py38hde5b4d6_0  
[conda] pytorch                   1.5.1           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
[conda] torchvision               0.6.1                py38_cu102    pytorch

Additional context

Currently, VideoClips does not have a doc on the website. Therefore, my misunderstanding might arise from its absence.

The text was updated successfully, but these errors were encountered:

v-iashin · 2020-07-16T13:15:34Z

Ok, after digging into debugging I found out that The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'. actually relates to this bug and it is known (e.g. #1221, #1672, #1931).

bjuncek · 2020-07-23T16:24:06Z

Hi @v-iashin - have you been able to resolve this issue when using the 'sec' as a default pts_unit?
I have done some testing of that on TV 0.5/6 but not on the latest master, so if needed let me know and I'll take a look into this :)

v-iashin · 2020-07-24T04:58:24Z

Hi @bjuncek there is no pts_unit argument in VideoClips. read_video works as I expect it to work -- reads an entire video.

bjuncek · 2020-07-24T20:19:55Z

Ah, I see that it's still set up as "pts" by default.
If you simply replace the default to "sec" on lines here and here, does that solve your issue?

If so, I'll send a PR later today or over the weekend. I can also test it out for you on Monday if that works.

v-iashin · 2020-07-25T08:55:36Z

Nope, it does not. It fails:

Traceback (most recent call last):
  File "/home/user/project4/main.py", line 14, in <module>
    vclips = VideoClips([VIDEO_PATH], clip_length_in_frames=30, frames_between_clips=30, )
  File "/home/user/miniconda3/envs/bug_report_video_clips_nightly/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 122, in __init__
    self._compute_frame_pts()
  File "/home/user/miniconda3/envs/bug_report_video_clips_nightly/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 150, in _compute_frame_pts
    clips = [torch.as_tensor(c) for c in clips]
  File "/home/user/miniconda3/envs/bug_report_video_clips_nightly/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 150, in <listcomp>
    clips = [torch.as_tensor(c) for c in clips]
RuntimeError: Could not infer dtype of Fraction

because now read_video_timestamps returns Fraction coming from video_time_base. To this end, I replaced

vision/torchvision/io/video.py

Line 347 in 1aef87d

pts = [x * video_time_base for x in pts]

with

pts = [x * float(video_time_base) for x in pts]

Still this returns me

I am using: ./small.mp4
Output using torchvision.io.read_video:
Visual: torch.Size([166, 320, 560, 3]) Audio: torch.Size([1, 266240]) {'video_fps': 30.0, 'audio_fps': 48000}
Output using torchvision.datasets.video_utils.VideoClips:
100.0%
Clip #0 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #1 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #2 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #3 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #4 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}

torch.Size([1, 0]) I expect it to be torch.Size([~<audio_fps>, 0])

Then, I inspected it a bit and found that aframes are reasonable until

vision/torchvision/io/video.py

Line 274 in 1aef87d

aframes = _align_audio_frames(aframes, audio_frames, start_pts, end_pts)

where pts are used again.

MJoodaki · 2021-04-19T13:18:50Z

Hi, I have almost the same problem with VideoClip.
I used VideoClip to create clips (each incl. 15 frames ) from a reference video (without audio) and saved the clips in a directory using torchvison.io.write_video and then I realized that instead of 15 frames/clip it got 12 frames/clip which every 2 frames in a row are duplicated.
Is that also because of the pts_unit= "pts" problem as you mentioned before or could be something else? and Do you have any suggestion for solving this issue? Apparently, in the new update, still, the "pts" is the default!
I was wondering if anybody could help me.
Many thanks

v-iashin · 2021-04-19T13:53:18Z

Hi, I haven't figured out how to solve it yet. Also, let me know if you will find out anything.

v-iashin · 2022-01-20T09:23:32Z

Currently on Google Colab I get:

I am using: ./small.mp4
Output using torchvision.io.read_video:
Visual: torch.Size([166, 320, 560, 3]) Audio: torch.Size([1, 266240]) {'video_fps': 30.0, 'audio_fps': 48000}
Output using torchvision.datasets.video_utils.VideoClips:
100%
1/1 [00:00<00:00, 6.29it/s]
Clip #0 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 46080]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #1 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 47213]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #2 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 47344]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #3 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 46450]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #4 Visual: torch.Size([30, 320, 560, 3]) Audio: torch.Size([1, 46581]) {'video_fps': 30.0, 'audio_fps': 48000}

and

I am using: ./4fpkD4A_t1s_35000_45000.mp4
Output using torchvision.io.read_video:
Visual: torch.Size([300, 720, 1280, 3]) Audio: torch.Size([2, 440320]) {'video_fps': 30.0, 'audio_fps': 44100}
Output using torchvision.datasets.video_utils.VideoClips:
100%
1/1 [00:01<00:00, 1.33s/it]
Clip #0 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 41984]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #1 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 42939]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #2 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 42869]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #3 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 42800]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #4 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 42730]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #5 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 42660]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #6 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 43615]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #7 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 43545]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #8 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 43476]) {'video_fps': 30.0, 'audio_fps': 44100}
Clip #9 Visual: torch.Size([30, 720, 1280, 3]) Audio: torch.Size([2, 43406]) {'video_fps': 30.0, 'audio_fps': 44100}

Here is the environment

Colab packages

absl-py==0.12.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.1.0
appdirs==1.4.4
argcomplete==1.12.3
argon2-cffi==21.1.0
arviz==0.11.4
astor==0.8.1
astropy==4.3.1
astunparse==1.6.3
atari-py==0.2.9
atomicwrites==1.4.0
attrs==21.2.0
audioread==2.1.9
autograd==1.3
av==8.0.2
Babel==2.9.1
backcall==0.2.0
beautifulsoup4==4.6.3
bleach==4.1.0
blis==0.4.1
bokeh==2.3.3
Bottleneck==1.3.2
branca==0.4.2
bs4==0.0.1
CacheControl==0.12.10
cached-property==1.5.2
cachetools==4.2.4
catalogue==1.0.0
certifi==2021.10.8
cffi==1.15.0
cftime==1.5.1.1
chardet==3.0.4
charset-normalizer==2.0.8
click==7.1.2
cloudpickle==1.3.0
cmake==3.12.0
cmdstanpy==0.9.5
colorcet==2.0.6
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.3.2
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cupy-cuda111==9.4.0
cvxopt==1.2.7
cvxpy==1.0.31
cycler==0.11.0
cymem==2.0.6
Cython==0.29.24
daft==0.0.4
dask==2.12.0
datascience==0.10.6
debugpy==1.0.0
decorator==4.4.2
defusedxml==0.7.1
descartes==1.1.0
dill==0.3.4
distributed==1.25.3
dlib @ file:///dlib-19.18.0-cp37-cp37m-linux_x86_64.whl
dm-tree==0.1.6
docopt==0.6.2
docutils==0.17.1
dopamine-rl==1.0.5
earthengine-api==0.1.290
easydict==1.9
ecos==2.0.7.post1
editdistance==0.5.3
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
entrypoints==0.3
ephem==4.1
et-xmlfile==1.1.0
fa2==0.3.5
fastai==1.0.61
fastdtw==0.3.4
fastprogress==1.0.0
fastrlock==0.8
fbprophet==0.7.1
feather-format==0.4.1
filelock==3.4.0
firebase-admin==4.4.0
fix-yahoo-finance==0.0.22
Flask==1.1.4
flatbuffers==2.0
folium==0.8.3
future==0.16.0
gast==0.4.0
GDAL==2.2.2
gdown==3.6.4
gensim==3.6.0
geographiclib==1.52
geopy==1.17.0
gin-config==0.5.0
glob2==0.7
google==2.0.3
google-api-core==1.26.3
google-api-python-client==1.12.8
google-auth==1.35.0
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.6
google-cloud-bigquery==1.21.0
google-cloud-bigquery-storage==1.1.0
google-cloud-core==1.0.3
google-cloud-datastore==1.8.0
google-cloud-firestore==1.7.0
google-cloud-language==1.2.0
google-cloud-storage==1.18.1
google-cloud-translate==1.5.0
google-colab @ file:///colabtools/dist/google-colab-1.0.0.tar.gz
google-pasta==0.2.0
google-resumable-media==0.4.1
googleapis-common-protos==1.53.0
googledrivedownloader==0.4
graphviz==0.10.1
greenlet==1.1.2
grpcio==1.42.0
gspread==3.0.1
gspread-dataframe==3.0.8
gym==0.17.3
h5py==3.1.0
HeapDict==1.0.1
hijri-converter==2.2.2
holidays==0.10.5.2
holoviews==1.14.6
html5lib==1.0.1
httpimport==0.5.18
httplib2==0.17.4
httplib2shim==0.0.3
humanize==0.5.1
hyperopt==0.1.2
ideep4py==2.0.0.post3
idna==2.10
imageio==2.4.1
imagesize==1.3.0
imbalanced-learn==0.8.1
imblearn==0.0
imgaug==0.2.9
importlib-metadata==4.8.2
importlib-resources==5.4.0
imutils==0.5.4
inflect==2.1.0
iniconfig==1.1.1
intel-openmp==2021.4.0
intervaltree==2.1.0
ipykernel==4.10.1
ipython==5.5.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.6.5
itsdangerous==1.1.0
jax==0.2.25
jaxlib @ https://storage.googleapis.com/jax-releases/cuda111/jaxlib-0.1.71+cuda111-cp37-none-manylinux2010_x86_64.whl
jdcal==1.4.1
jedi==0.18.1
jieba==0.42.1
Jinja2==2.11.3
joblib==1.1.0
jpeg4py==0.1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.3.5
jupyter-console==5.2.0
jupyter-core==4.9.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.2
kaggle==1.5.12
kapre==0.3.6
keras==2.7.0
Keras-Preprocessing==1.1.2
keras-vis==0.4.1
kiwisolver==1.3.2
korean-lunar-calendar==0.2.1
libclang==12.0.0
librosa==0.8.1
lightgbm==2.2.3
llvmlite==0.34.0
lmdb==0.99
LunarCalendar==0.0.9
lxml==4.2.6
Markdown==3.3.6
MarkupSafe==2.0.1
matplotlib==3.2.2
matplotlib-inline==0.1.3
matplotlib-venn==0.11.6
missingno==0.5.0
mistune==0.8.4
mizani==0.6.0
mkl==2019.0
mlxtend==0.14.0
more-itertools==8.12.0
moviepy==0.2.3.5
mpmath==1.2.1
msgpack==1.0.3
multiprocess==0.70.12.2
multitasking==0.0.10
murmurhash==1.0.6
music21==5.5.0
natsort==5.5.0
nbclient==0.5.9
nbconvert==5.6.1
nbformat==5.1.3
nest-asyncio==1.5.4
netCDF4==1.5.8
networkx==2.6.3
nibabel==3.0.2
nltk==3.2.5
notebook==5.3.1
numba==0.51.2
numexpr==2.7.3
numpy==1.19.5
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.1
okgrade==0.4.3
omegaconf==2.0.6
opencv-contrib-python==4.1.2.30
opencv-python==4.1.2.30
openpyxl==2.5.9
opt-einsum==3.3.0
osqp==0.6.2.post0
packaging==21.3
palettable==3.3.0
pandas==1.1.5
pandas-datareader==0.9.0
pandas-gbq==0.13.3
pandas-profiling==1.4.1
pandocfilters==1.5.0
panel==0.12.1
param==1.12.0
parso==0.8.3
pathlib==1.0.1
patsy==0.5.2
pep517==0.12.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
pip-tools==6.2.0
plac==1.1.3
plotly==4.4.1
plotnine==0.6.0
pluggy==0.7.1
pooch==1.5.2
portpicker==1.3.9
prefetch-generator==1.0.1
preshed==3.0.6
prettytable==2.4.0
progressbar2==3.38.0
prometheus-client==0.12.0
promise==2.3
prompt-toolkit==1.0.18
protobuf==3.17.3
psutil==5.4.8
psycopg2==2.7.6.1
ptyprocess==0.7.0
py==1.11.0
pyarrow==3.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.3
pycparser==2.21
pyct==0.4.8
pydata-google-auth==1.2.0
pydot==1.3.0
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
pyemd==0.5.1
pyerfa==2.0.0.1
pyglet==1.5.0
Pygments==2.6.1
pygobject==3.26.1
pymc3==3.11.4
PyMeeus==0.5.11
pymongo==3.12.1
pymystem3==0.2.0
PyOpenGL==3.1.5
pyparsing==3.0.6
pyrsistent==0.18.0
pysndfile==1.3.8
PySocks==1.7.1
pystan==2.19.1.1
pytest==3.6.4
python-apt==0.0.0
python-chess==0.23.11
python-dateutil==2.8.2
python-louvain==0.15
python-slugify==5.0.2
python-utils==2.5.6
pytz==2018.9
pyviz-comms==2.1.0
PyWavelets==1.2.0
PyYAML==6.0
pyzmq==22.3.0
qdldl==0.1.5.post0
qtconsole==5.2.1
QtPy==1.11.2
regex==2019.12.20
requests==2.23.0
requests-oauthlib==1.3.0
resampy==0.2.2
retrying==1.3.3
rpy2==3.4.5
rsa==4.8
scikit-image==0.18.3
scikit-learn==1.0.1
scipy==1.4.1
screen-resolution-extra==0.0.0
scs==2.1.4
seaborn==0.11.2
semver==2.13.0
Send2Trash==1.8.0
setuptools-git==1.2
Shapely==1.8.0
simplegeneric==0.8.1
six==1.15.0
sklearn==0.0
sklearn-pandas==1.8.0
smart-open==5.2.1
snowballstemmer==2.2.0
sortedcontainers==2.4.0
SoundFile==0.10.3.post1
spacy==2.2.4
Sphinx==1.8.6
sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-websupport==1.2.4
SQLAlchemy==1.4.27
sqlparse==0.4.2
srsly==1.0.5
statsmodels==0.10.2
sympy==1.7.1
tables==3.4.4
tabulate==0.8.9
tblib==1.7.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow @ file:///tensorflow-2.7.0-cp37-cp37m-linux_x86_64.whl
tensorflow-datasets==4.0.1
tensorflow-estimator==2.7.0
tensorflow-gcs-config==2.7.0
tensorflow-hub==0.12.0
tensorflow-io-gcs-filesystem==0.22.0
tensorflow-metadata==1.4.0
tensorflow-probability==0.15.0
termcolor==1.1.0
terminado==0.12.1
testpath==0.5.0
text-unidecode==1.3
textblob==0.15.3
Theano-PyMC==1.1.2
thinc==7.4.0
threadpoolctl==3.0.0
tifffile==2021.11.2
toml==0.10.2
tomli==1.2.2
toolz==0.11.2
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchaudio @ https://download.pytorch.org/whl/cu111/torchaudio-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchsummary==1.5.1
torchtext==0.11.0
torchvision @ https://download.pytorch.org/whl/cu111/torchvision-0.11.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
tornado==5.1.1
tqdm==4.62.3
traitlets==5.1.1
tweepy==3.10.0
typeguard==2.7.1
typing-extensions==3.10.0.2
tzlocal==1.5.1
uritemplate==3.0.1
urllib3==1.24.3
vega-datasets==0.9.0
wasabi==0.8.2
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.2
wordcloud==1.5.0
wrapt==1.13.3
xarray==0.18.2
xgboost==0.90
xkit==0.0.0
xlrd==1.1.0
xlwt==1.3.0
yellowbrick==1.3.post1
zict==2.0.0
zipp==3.6.0

Strangely the number of audio samples in the clips are not equal to the audio frame rate given we have 30 frames in a clip which is equal to video fps. Moreover, the audio clips are inconsistent in size. I would expect to have 48000 and 44100 respectively or, at least, some explainable consistensy.

Another, maybe related, observation is that if you try to make clips with clip_length_in_frames=1, frames_between_clips=1 the video clips will be as expected (1 frame per clip) but audio will be read in batches similar to the torchvision.io.VideoReader API (see clip#16 and compare the shapes to the shapes of other clips):

Clip #0 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #1 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #2 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #3 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #4 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #5 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #6 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #7 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #8 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #9 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #10 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #11 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #12 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #13 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #14 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #15 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 0]) {'video_fps': 30.0, 'audio_fps': 48000}
Clip #16 Visual: torch.Size([1, 320, 560, 3]) Audio: torch.Size([1, 1024]) {'video_fps': 30.0, 'audio_fps': 48000}
...

v-iashin · 2022-01-20T09:31:15Z

Also, loading some videos results in a 1024 (one batch) difference in the number of audio samples from torchvision.io.read_video and torchvision.io.VideoReader

import torch
import numpy as np
import torchvision
from torchvision.datasets.video_utils import read_video, VideoClips

VIDEO_PATH = '4fpkD4A_t1s_35000_45000.mp4'
visual, audio, info = read_video(VIDEO_PATH, pts_unit='sec')
print('Audio:', audio.shape, 'Visual:', visual.shape, info)

Audio: torch.Size([2, 440320]) Visual: torch.Size([300, 720, 1280, 3])  {'video_fps': 30.0, 'audio_fps': 44100}

vs

audio_reader = torchvision.io.VideoReader(VIDEO_PATH, stream='audio')
audio = []
for frame in audio_reader:
    audio.append(frame['data'])
    audio = torch.cat(audio)
    print(audio.shape)

torch.Size([441344, 2])

bjuncek added module: video bug labels Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VideoClips: audio clips do not correspond to video clips #2474

VideoClips: audio clips do not correspond to video clips #2474

v-iashin commented Jul 15, 2020 •

edited

Loading

v-iashin commented Jul 16, 2020

Uh oh!

bjuncek commented Jul 23, 2020

Uh oh!

v-iashin commented Jul 24, 2020

Uh oh!

bjuncek commented Jul 24, 2020

Uh oh!

v-iashin commented Jul 25, 2020

Uh oh!

MJoodaki commented Apr 19, 2021

Uh oh!

v-iashin commented Apr 19, 2021

Uh oh!

v-iashin commented Jan 20, 2022

Uh oh!

v-iashin commented Jan 20, 2022

Uh oh!

VideoClips: audio clips do not correspond to video clips #2474

VideoClips: audio clips do not correspond to video clips #2474

Comments

v-iashin commented Jul 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

v-iashin commented Jul 16, 2020

Uh oh!

bjuncek commented Jul 23, 2020

Uh oh!

v-iashin commented Jul 24, 2020

Uh oh!

bjuncek commented Jul 24, 2020

Uh oh!

v-iashin commented Jul 25, 2020

Uh oh!

MJoodaki commented Apr 19, 2021

Uh oh!

v-iashin commented Apr 19, 2021

Uh oh!

v-iashin commented Jan 20, 2022

Uh oh!

v-iashin commented Jan 20, 2022

Uh oh!

v-iashin commented Jul 15, 2020 •

edited

Loading