Skip to content

vad : revisit timestamp alignment/mapping #3173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 30, 2025

Conversation

danbev
Copy link
Collaborator

@danbev danbev commented May 20, 2025

This commit improving the timestamp alignment by introducing a mapping table, adding intermediate reference points for longer segments, and binary search for lookups.

The motivation for this changes is to address issues with the currently solution where zero-length segments are possible, and also to improve the precision of the VAD timestamps.

Refs: #3162


Notes regarding the changes can be found here.

danbev added 2 commits May 22, 2025 06:29
This commit improving the timestamp alignment by introducing a mapping
table, adding intermediate reference points for longer segments, and
binary search for lookups.

The motivation for this changes is to address issues with the currently
solution where zero-length segments are possible, and also to improve
the precision of the VAD timestamps.

Refs: ggml-org#3162
This commit changes the type of the `processed_time` and `original_time`
fields in the `vad_time_mapping` struct from `double` to `uint64_t`.

The motivation for this change is made to improve precision and avoid
floating-point inaccuracies and also be consistent with other part of
the code base that use `uint64_t` for time representation.

This is a part of a refactoring where I'm also going to change the
vad_segment_info struct to use `uint64_t` for the start and end times.
This is the reason for the not so pleasant conversion and casts in the
code at the moment.
@danbev danbev force-pushed the vad-timestamp-mapping-table branch from 82d6980 to 4c5ca93 Compare May 22, 2025 07:49
@danbev danbev marked this pull request as ready for review May 22, 2025 11:56
@danbev danbev requested a review from ggerganov May 23, 2025 03:48
@ggerganov
Copy link
Member

This change seems to not be compatible with the -p parameter. For example:

./bin/whisper-cli -m ../models/ggml-large-v3-turbo.bin -f ../samples/gb0.wav --vad --vad-model ../models/silero-v5.1.2-ggml.bin -fa -p 2

The second half of the transcription has the same repeating timestamp for all segments.

[00:00:00.000 --> 00:00:03.250]   - Good morning, this Tuesday is election day.
[00:00:03.250 --> 00:00:05.950]   After months of spirited debate and vigorous campaigning,
[00:00:05.950 --> 00:00:08.570]   the time has come for Americans to make important decisions
[00:00:08.570 --> 00:00:10.150]   about our nation's future.
[00:00:10.150 --> 00:00:13.750]   I encourage all Americans to go to the polls and vote.
[00:00:13.750 --> 00:00:16.100]   Election season brings out the spirit of competition
[00:00:16.100 --> 00:00:18.020]   between our political parties.
[00:00:18.020 --> 00:00:20.210]   And that competition is an essential part
[00:00:20.210 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:25.920]   Republicans, Democrats, and Independents
[00:00:25.920 --> 00:00:29.120]   can find common ground on at least one point.
[00:00:29.120 --> 00:00:31.510]   Our system of representative democracy
[00:00:31.510 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.220]   The United States was founded on the belief
[00:00:36.220 --> 00:00:38.280]   that all men are created equal.
[00:00:38.280 --> 00:00:41.440]   Every election day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.810]   religions, and backgrounds step into voting booths
[00:00:43.810 --> 00:00:45.300]   throughout the nation.
[00:00:45.300 --> 00:00:47.730]   Whether they are rich or poor, old or young,
[00:00:47.730 --> 00:00:50.640]   each of them has an equal share in choosing the path
[00:00:50.640 --> 00:00:52.450]   that our country will take.
[00:00:52.450 --> 00:00:54.870]   And every ballot they cast is a reminder
[00:00:54.870 --> 00:00:58.290]   that our founding principles are alive and well.
[00:00:58.290 --> 00:00:59.720]   Voting is one of the great privileges
[00:00:59.720 --> 00:01:01.780]   of American citizenship.
[00:01:01.780 --> 00:01:03.680]   and it has always required brave defenders.
[00:01:03.680 --> 00:01:03.780]   As you head to the polls next week, remember the sacrifices that have been made by generations of Americans in uniform to preserve our way of life.
[00:01:03.680 --> 00:01:03.780]   From Bunker Hill to Baghdad, the men and women of American armed forces have been devoted guardians of our democracy.
[00:01:03.680 --> 00:01:03.780]   All of us owe them and their families a special debt of gratitude on Election Day.
[00:01:03.680 --> 00:01:03.780]   Americans should also remember the important example that our elections set throughout the world.
[00:01:03.680 --> 00:01:03.780]   Young democracies from Georgia and Ukraine to Afghanistan and Iraq can look to the United States for proof that self-government can endure.
[00:01:03.680 --> 00:01:03.780]   And nations that still live under tyranny and oppression can find hope and inspiration in our commitment to liberty.
[00:01:03.680 --> 00:01:03.780]   For more than two centuries, Americans have demonstrated the ability of free people to choose their own leaders.
[00:01:03.680 --> 00:01:03.780]   Our nation has flourished because of its commitment to trusting the wisdom of our citizenry.
[00:01:03.680 --> 00:01:03.780]   In this year's election, we will see this tradition continue.
[00:01:03.680 --> 00:01:03.780]   And we will be reminded once again that we are blessed to live in a free nation guided by the will of the people.
[00:01:03.680 --> 00:01:03.780]   Thank you for listening.
ggml_metal_free: deallocating
whisper_full_parallel: the audio has been split into 2 chunks at the following times:
whisper_full_parallel: split 1 - 00:01:03.670
whisper_full_parallel: the transcription quality may be degraded near these boundaries

@danbev
Copy link
Collaborator Author

danbev commented May 27, 2025

@ggerganov I had not tried this with -p/--processors and did not notice this. Thanks, I'll look into it.

danbev added 2 commits May 27, 2025 19:41
This commit extracts the VAD processing from the
`whisper_full_with_state` function into the `whisper_full` and
`whisper_full_parallel` functions.

The motivation for this is that I did not take into account that when
`whisper_full_parallel` is called with `n_processors > 1`, then the
vad processing would not be applied correctly. Instead the VAD
processing should be done prior to processing in the case of
`whisper_full_parallel`.
The commit removes the parameter `filtered_n_samples` from the
`whisper_vad` function signature and its usage, as it is no longer
needed since filtered samples is now a vector (previously it was a
float*)

The motivation for this is to simplify the usage of this function.
@danbev
Copy link
Collaborator Author

danbev commented May 28, 2025

Example using -p 2:

$ ./build/bin/whisper-cli -m models/ggml-large-v3-turbo.bin -f samples/gb0.wav --vad --vad-model models/silero-v5.1.2-ggml.bin -fa -p 2
output
[00:00:00.000 --> 00:00:03.250]   - Good morning, this Tuesday is election day.
[00:00:03.250 --> 00:00:05.950]   After months of spirited debate and vigorous campaigning,
[00:00:05.950 --> 00:00:08.570]   the time has come for Americans to make important decisions
[00:00:08.570 --> 00:00:10.150]   about our nation's future.
[00:00:10.150 --> 00:00:13.750]   I encourage all Americans to go to the polls and vote.
[00:00:13.750 --> 00:00:16.100]   Election season brings out the spirit of competition
[00:00:16.100 --> 00:00:18.020]   between our political parties.
[00:00:18.020 --> 00:00:20.210]   And that competition is an essential part
[00:00:20.210 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:25.920]   Republicans, Democrats, and Independents
[00:00:25.920 --> 00:00:29.120]   can find common ground on at least one point.
[00:00:29.120 --> 00:00:31.510]   Our system of representative democracy
[00:00:31.510 --> 00:00:34.440]   is one of America's greatest strengths.
[00:00:34.440 --> 00:00:36.220]   The United States was founded on the belief
[00:00:36.220 --> 00:00:38.280]   that all men are created equal.
[00:00:38.280 --> 00:00:41.440]   Every election day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.810]   religions, and backgrounds step into voting booths
[00:00:43.810 --> 00:00:45.300]   throughout the nation.
[00:00:45.300 --> 00:00:47.730]   Whether they are rich or poor, old or young,
[00:00:47.730 --> 00:00:50.640]   each of them has an equal share in choosing the path
[00:00:50.640 --> 00:00:52.450]   that our country will take.
[00:00:52.450 --> 00:00:54.870]   And every ballot they cast is a reminder
[00:00:54.870 --> 00:00:58.290]   that our founding principles are alive and well.
[00:00:58.290 --> 00:00:59.720]   Voting is one of the great privileges
[00:00:59.720 --> 00:01:01.770]   of American citizenship.
[00:01:01.770 --> 00:01:03.460]   And it has always required brave defenders.
[00:01:03.460 --> 00:01:09.140]   As you head to the polls next week, remember the sacrifices that have been made by generations
[00:01:09.140 --> 00:01:13.100]   of Americans in uniform to preserve our way of life.
[00:01:13.100 --> 00:01:17.390]   From Bunker Hill to Baghdad, the men and women of American armed forces have been devoted
[00:01:17.390 --> 00:01:20.060]   guardians of our democracy.
[00:01:20.060 --> 00:01:25.590]   All of us owe them and their families a special debt of gratitude on Election Day.
[00:01:25.590 --> 00:01:28.670]   Americans should also remember the important example that our elections set throughout
[00:01:28.670 --> 00:01:30.260]   the world.
[00:01:30.260 --> 00:01:34.140]   Young democracies from Georgia and Ukraine to Afghanistan and Iraq can look to the United
[00:01:34.140 --> 00:01:39.190]   States for proof that self-government can endure, and nations that still live under tyranny
[00:01:39.190 --> 00:01:44.160]   and oppression can find hope and inspiration in our commitment to liberty.
[00:01:44.160 --> 00:01:48.220]   For more than two centuries, Americans have demonstrated the ability of free people to choose their
[00:01:48.220 --> 00:01:49.680]   own leaders.
[00:01:49.680 --> 00:01:54.720]   Our nation has flourished because of its commitment to trusting the wisdom of our citizenry.
[00:01:54.720 --> 00:02:00.200]   In this year's election, we will see this tradition continue, and we will be reminded once again
[00:02:00.200 --> 00:02:05.510]   that we are blessed to live in a free nation guided by the will of the people.
[00:02:05.510 --> 00:02:06.230]   Thank you for listening.

whisper_full_parallel: the audio has been split into 2 chunks at the following times:
whisper_full_parallel: split 1 - 00:00:59.210
whisper_full_parallel: the transcription quality may be degraded near these boundaries

@danbev danbev merged commit 98dfe8d into ggml-org:master May 30, 2025
102 of 103 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants