-
Notifications
You must be signed in to change notification settings - Fork 7.1k
VideoClips: audio clips do not correspond to video clips #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @v-iashin - have you been able to resolve this issue when using the 'sec' as a default pts_unit? |
Hi @bjuncek there is no |
Nope, it does not. It fails: Traceback (most recent call last):
File "/home/user/project4/main.py", line 14, in <module>
vclips = VideoClips([VIDEO_PATH], clip_length_in_frames=30, frames_between_clips=30, )
File "/home/user/miniconda3/envs/bug_report_video_clips_nightly/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 122, in __init__
self._compute_frame_pts()
File "/home/user/miniconda3/envs/bug_report_video_clips_nightly/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 150, in _compute_frame_pts
clips = [torch.as_tensor(c) for c in clips]
File "/home/user/miniconda3/envs/bug_report_video_clips_nightly/lib/python3.8/site-packages/torchvision/datasets/video_utils.py", line 150, in <listcomp>
clips = [torch.as_tensor(c) for c in clips]
RuntimeError: Could not infer dtype of Fraction because now vision/torchvision/io/video.py Line 347 in 1aef87d
with pts = [x * float(video_time_base) for x in pts] Still this returns me
Then, I inspected it a bit and found that vision/torchvision/io/video.py Line 274 in 1aef87d
where pts are used again.
|
Hi, I have almost the same problem with |
Hi, I haven't figured out how to solve it yet. Also, let me know if you will find out anything. |
Currently on Google Colab I get:
and
Here is the environment Colab packagesabsl-py==0.12.0
alabaster==0.7.12
albumentations==0.1.12
altair==4.1.0
appdirs==1.4.4
argcomplete==1.12.3
argon2-cffi==21.1.0
arviz==0.11.4
astor==0.8.1
astropy==4.3.1
astunparse==1.6.3
atari-py==0.2.9
atomicwrites==1.4.0
attrs==21.2.0
audioread==2.1.9
autograd==1.3
av==8.0.2
Babel==2.9.1
backcall==0.2.0
beautifulsoup4==4.6.3
bleach==4.1.0
blis==0.4.1
bokeh==2.3.3
Bottleneck==1.3.2
branca==0.4.2
bs4==0.0.1
CacheControl==0.12.10
cached-property==1.5.2
cachetools==4.2.4
catalogue==1.0.0
certifi==2021.10.8
cffi==1.15.0
cftime==1.5.1.1
chardet==3.0.4
charset-normalizer==2.0.8
click==7.1.2
cloudpickle==1.3.0
cmake==3.12.0
cmdstanpy==0.9.5
colorcet==2.0.6
colorlover==0.3.0
community==1.0.0b1
contextlib2==0.5.5
convertdate==2.3.2
coverage==3.7.1
coveralls==0.5
crcmod==1.7
cufflinks==0.17.3
cupy-cuda111==9.4.0
cvxopt==1.2.7
cvxpy==1.0.31
cycler==0.11.0
cymem==2.0.6
Cython==0.29.24
daft==0.0.4
dask==2.12.0
datascience==0.10.6
debugpy==1.0.0
decorator==4.4.2
defusedxml==0.7.1
descartes==1.1.0
dill==0.3.4
distributed==1.25.3
dlib @ file:///dlib-19.18.0-cp37-cp37m-linux_x86_64.whl
dm-tree==0.1.6
docopt==0.6.2
docutils==0.17.1
dopamine-rl==1.0.5
earthengine-api==0.1.290
easydict==1.9
ecos==2.0.7.post1
editdistance==0.5.3
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
entrypoints==0.3
ephem==4.1
et-xmlfile==1.1.0
fa2==0.3.5
fastai==1.0.61
fastdtw==0.3.4
fastprogress==1.0.0
fastrlock==0.8
fbprophet==0.7.1
feather-format==0.4.1
filelock==3.4.0
firebase-admin==4.4.0
fix-yahoo-finance==0.0.22
Flask==1.1.4
flatbuffers==2.0
folium==0.8.3
future==0.16.0
gast==0.4.0
GDAL==2.2.2
gdown==3.6.4
gensim==3.6.0
geographiclib==1.52
geopy==1.17.0
gin-config==0.5.0
glob2==0.7
google==2.0.3
google-api-core==1.26.3
google-api-python-client==1.12.8
google-auth==1.35.0
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.6
google-cloud-bigquery==1.21.0
google-cloud-bigquery-storage==1.1.0
google-cloud-core==1.0.3
google-cloud-datastore==1.8.0
google-cloud-firestore==1.7.0
google-cloud-language==1.2.0
google-cloud-storage==1.18.1
google-cloud-translate==1.5.0
google-colab @ file:///colabtools/dist/google-colab-1.0.0.tar.gz
google-pasta==0.2.0
google-resumable-media==0.4.1
googleapis-common-protos==1.53.0
googledrivedownloader==0.4
graphviz==0.10.1
greenlet==1.1.2
grpcio==1.42.0
gspread==3.0.1
gspread-dataframe==3.0.8
gym==0.17.3
h5py==3.1.0
HeapDict==1.0.1
hijri-converter==2.2.2
holidays==0.10.5.2
holoviews==1.14.6
html5lib==1.0.1
httpimport==0.5.18
httplib2==0.17.4
httplib2shim==0.0.3
humanize==0.5.1
hyperopt==0.1.2
ideep4py==2.0.0.post3
idna==2.10
imageio==2.4.1
imagesize==1.3.0
imbalanced-learn==0.8.1
imblearn==0.0
imgaug==0.2.9
importlib-metadata==4.8.2
importlib-resources==5.4.0
imutils==0.5.4
inflect==2.1.0
iniconfig==1.1.1
intel-openmp==2021.4.0
intervaltree==2.1.0
ipykernel==4.10.1
ipython==5.5.0
ipython-genutils==0.2.0
ipython-sql==0.3.9
ipywidgets==7.6.5
itsdangerous==1.1.0
jax==0.2.25
jaxlib @ https://storage.googleapis.com/jax-releases/cuda111/jaxlib-0.1.71+cuda111-cp37-none-manylinux2010_x86_64.whl
jdcal==1.4.1
jedi==0.18.1
jieba==0.42.1
Jinja2==2.11.3
joblib==1.1.0
jpeg4py==0.1.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.3.5
jupyter-console==5.2.0
jupyter-core==4.9.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.2
kaggle==1.5.12
kapre==0.3.6
keras==2.7.0
Keras-Preprocessing==1.1.2
keras-vis==0.4.1
kiwisolver==1.3.2
korean-lunar-calendar==0.2.1
libclang==12.0.0
librosa==0.8.1
lightgbm==2.2.3
llvmlite==0.34.0
lmdb==0.99
LunarCalendar==0.0.9
lxml==4.2.6
Markdown==3.3.6
MarkupSafe==2.0.1
matplotlib==3.2.2
matplotlib-inline==0.1.3
matplotlib-venn==0.11.6
missingno==0.5.0
mistune==0.8.4
mizani==0.6.0
mkl==2019.0
mlxtend==0.14.0
more-itertools==8.12.0
moviepy==0.2.3.5
mpmath==1.2.1
msgpack==1.0.3
multiprocess==0.70.12.2
multitasking==0.0.10
murmurhash==1.0.6
music21==5.5.0
natsort==5.5.0
nbclient==0.5.9
nbconvert==5.6.1
nbformat==5.1.3
nest-asyncio==1.5.4
netCDF4==1.5.8
networkx==2.6.3
nibabel==3.0.2
nltk==3.2.5
notebook==5.3.1
numba==0.51.2
numexpr==2.7.3
numpy==1.19.5
nvidia-ml-py3==7.352.0
oauth2client==4.1.3
oauthlib==3.1.1
okgrade==0.4.3
omegaconf==2.0.6
opencv-contrib-python==4.1.2.30
opencv-python==4.1.2.30
openpyxl==2.5.9
opt-einsum==3.3.0
osqp==0.6.2.post0
packaging==21.3
palettable==3.3.0
pandas==1.1.5
pandas-datareader==0.9.0
pandas-gbq==0.13.3
pandas-profiling==1.4.1
pandocfilters==1.5.0
panel==0.12.1
param==1.12.0
parso==0.8.3
pathlib==1.0.1
patsy==0.5.2
pep517==0.12.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
pip-tools==6.2.0
plac==1.1.3
plotly==4.4.1
plotnine==0.6.0
pluggy==0.7.1
pooch==1.5.2
portpicker==1.3.9
prefetch-generator==1.0.1
preshed==3.0.6
prettytable==2.4.0
progressbar2==3.38.0
prometheus-client==0.12.0
promise==2.3
prompt-toolkit==1.0.18
protobuf==3.17.3
psutil==5.4.8
psycopg2==2.7.6.1
ptyprocess==0.7.0
py==1.11.0
pyarrow==3.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycocotools==2.0.3
pycparser==2.21
pyct==0.4.8
pydata-google-auth==1.2.0
pydot==1.3.0
pydot-ng==2.0.0
pydotplus==2.0.2
PyDrive==1.3.1
pyemd==0.5.1
pyerfa==2.0.0.1
pyglet==1.5.0
Pygments==2.6.1
pygobject==3.26.1
pymc3==3.11.4
PyMeeus==0.5.11
pymongo==3.12.1
pymystem3==0.2.0
PyOpenGL==3.1.5
pyparsing==3.0.6
pyrsistent==0.18.0
pysndfile==1.3.8
PySocks==1.7.1
pystan==2.19.1.1
pytest==3.6.4
python-apt==0.0.0
python-chess==0.23.11
python-dateutil==2.8.2
python-louvain==0.15
python-slugify==5.0.2
python-utils==2.5.6
pytz==2018.9
pyviz-comms==2.1.0
PyWavelets==1.2.0
PyYAML==6.0
pyzmq==22.3.0
qdldl==0.1.5.post0
qtconsole==5.2.1
QtPy==1.11.2
regex==2019.12.20
requests==2.23.0
requests-oauthlib==1.3.0
resampy==0.2.2
retrying==1.3.3
rpy2==3.4.5
rsa==4.8
scikit-image==0.18.3
scikit-learn==1.0.1
scipy==1.4.1
screen-resolution-extra==0.0.0
scs==2.1.4
seaborn==0.11.2
semver==2.13.0
Send2Trash==1.8.0
setuptools-git==1.2
Shapely==1.8.0
simplegeneric==0.8.1
six==1.15.0
sklearn==0.0
sklearn-pandas==1.8.0
smart-open==5.2.1
snowballstemmer==2.2.0
sortedcontainers==2.4.0
SoundFile==0.10.3.post1
spacy==2.2.4
Sphinx==1.8.6
sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-websupport==1.2.4
SQLAlchemy==1.4.27
sqlparse==0.4.2
srsly==1.0.5
statsmodels==0.10.2
sympy==1.7.1
tables==3.4.4
tabulate==0.8.9
tblib==1.7.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow @ file:///tensorflow-2.7.0-cp37-cp37m-linux_x86_64.whl
tensorflow-datasets==4.0.1
tensorflow-estimator==2.7.0
tensorflow-gcs-config==2.7.0
tensorflow-hub==0.12.0
tensorflow-io-gcs-filesystem==0.22.0
tensorflow-metadata==1.4.0
tensorflow-probability==0.15.0
termcolor==1.1.0
terminado==0.12.1
testpath==0.5.0
text-unidecode==1.3
textblob==0.15.3
Theano-PyMC==1.1.2
thinc==7.4.0
threadpoolctl==3.0.0
tifffile==2021.11.2
toml==0.10.2
tomli==1.2.2
toolz==0.11.2
torch @ https://download.pytorch.org/whl/cu111/torch-1.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchaudio @ https://download.pytorch.org/whl/cu111/torchaudio-0.10.0%2Bcu111-cp37-cp37m-linux_x86_64.whl
torchsummary==1.5.1
torchtext==0.11.0
torchvision @ https://download.pytorch.org/whl/cu111/torchvision-0.11.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
tornado==5.1.1
tqdm==4.62.3
traitlets==5.1.1
tweepy==3.10.0
typeguard==2.7.1
typing-extensions==3.10.0.2
tzlocal==1.5.1
uritemplate==3.0.1
urllib3==1.24.3
vega-datasets==0.9.0
wasabi==0.8.2
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.2
wordcloud==1.5.0
wrapt==1.13.3
xarray==0.18.2
xgboost==0.90
xkit==0.0.0
xlrd==1.1.0
xlwt==1.3.0
yellowbrick==1.3.post1
zict==2.0.0
zipp==3.6.0 Strangely the number of audio samples in the clips are not equal to the audio frame rate given we have 30 frames in a clip which is equal to video fps. Moreover, the audio clips are inconsistent in size. I would expect to have 48000 and 44100 respectively or, at least, some explainable consistensy. Another, maybe related, observation is that if you try to make clips with
|
Also, loading some videos results in a 1024 (one batch) difference in the number of audio samples from import torch
import numpy as np
import torchvision
from torchvision.datasets.video_utils import read_video, VideoClips
VIDEO_PATH = '4fpkD4A_t1s_35000_45000.mp4'
visual, audio, info = read_video(VIDEO_PATH, pts_unit='sec')
print('Audio:', audio.shape, 'Visual:', visual.shape, info)
vs audio_reader = torchvision.io.VideoReader(VIDEO_PATH, stream='audio')
audio = []
for frame in audio_reader:
audio.append(frame['data'])
audio = torch.cat(audio)
print(audio.shape)
|
Uh oh!
There was an error while loading. Please reload this page.
🐛 Bug
The audio stream does not correspond to the visual stream when
torchvision.datasets.video_utils.VideoClips
is used.To Reproduce
Steps to reproduce the behavior:
Expected behavior
torchvision.io.read_video
is ok, and it is as expected. I provide it here for reference. Also, the visual streams returned fromtorchvision.datasets.video_utils.VideoClips
are ok.torchvision.datasets.video_utils.VideoClips().get_clip()
to have a comparable number of samples, i.e. 48k or 44.1k for 1 second of 30 fps video. Instead, it outputs more samples than expected or just a fraction of it. Specifically, for./small.mp4
, it outputs ~87k in the first three clips and 0 in the last two (expected 48k at each clip), while for4fpkD4A_t1s_35000_45000.mp4
it outputs ~14k for the first one and 15k for the rest of them (expected 44.1k at each). The later one does not even reach the expected 440k samples for the whole 10s video. Similarly, the earlier one totals to(86016 + 87142 + 87255) = 260413
which does not correspond to266240
loaded intorchvision.io.read_video
.Environment
Additional context
Currently,
VideoClips
does not have a doc on the website. Therefore, my misunderstanding might arise from its absence.The text was updated successfully, but these errors were encountered: