You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the VAD feature with Whisper to recognize audio, and I'm using the following command. It seems strange that in the generated JSON file, the start time of the tokens begins at 0, which doesn't correspond to the timestamps.
It is currently reporting VAD processed tokens directly when it should be resolving/mapping these to original input audio timestamps. I'll open a pull request to handle this situation. Thanks for reporting this and bringing it to our attention!
We have opened #3173 which is slightly related to this and just so that you are aware of this issue.
I'm using the VAD feature with Whisper to recognize audio, and I'm using the following command. It seems strange that in the generated JSON file, the start time of the tokens begins at 0, which doesn't correspond to the timestamps.
The text was updated successfully, but these errors were encountered: