bug: Whisper VAD - Token Timestamp Issue

I'm using the VAD feature with Whisper to recognize audio, and I'm using the following command.  It seems strange that in the generated JSON file, the start time of the tokens begins at 0, which doesn't correspond to the timestamps.

```
./build/bin/whisper-cli -m /Users/pilot/.aicutpro/whisper_cpp/models/ggml-tiny.bin -f test.wav -of test -oj -pp -l en -t 4 -bo 5 -bs 5 --vad --vad-model vad_models/ggml-silero-v5.1.2.bin -fa -sow -ojf
```

```json
	"transcription": [
		{
			"timestamps": {
				"from": "00:00:04,480",
				"to": "00:00:07,860"
			},
			"offsets": {
				"from": 4480,
				"to": 7860
			},
			"text": " I want to tell you what I see coming.",
			"tokens": [
				{
					"text": "[_BEG_]",
					"timestamps": {
						"from": "00:00:00,000",
						"to": "00:00:00,000"
					},
					"offsets": {
						"from": 0,
						"to": 0
					},
					"id": 50363,
					"p": 0.995989,
					"t_dtw": -1
				},
				{
					"text": " I",
					"timestamps": {
						"from": "00:00:00,070",
						"to": "00:00:00,070"
					},
					"offsets": {
						"from": 70,
						"to": 70
					},
					"id": 314,
					"p": 0.928097,
					"t_dtw": -1
				},
				{
					"text": " want",
					"timestamps": {
						"from": "00:00:00,130",
						"to": "00:00:00,370"
					},
					"offsets": {
						"from": 130,
						"to": 370
					},
					"id": 765,
					"p": 0.985233,
					"t_dtw": -1
				},
				{
					"text": " to",
					"timestamps": {
						"from": "00:00:00,370",
						"to": "00:00:00,520"
					},
					"offsets": {
						"from": 370,
						"to": 520
					},
					"id": 284,
					"p": 0.997866,
					"t_dtw": -1
				},
				{
					"text": " tell",
					"timestamps": {
						"from": "00:00:00,520",
						"to": "00:00:00,820"
					},
					"offsets": {
						"from": 520,
						"to": 820
					},
					"id": 1560,
					"p": 0.999005,
					"t_dtw": -1
				},
				{
					"text": " you",
					"timestamps": {
						"from": "00:00:00,820",
						"to": "00:00:01,040"
					},
					"offsets": {
						"from": 820,
						"to": 1040
					},
					"id": 345,
					"p": 0.996679,
					"t_dtw": -1
				},
				{
					"text": " what",
					"timestamps": {
						"from": "00:00:01,040",
						"to": "00:00:01,340"
					},
					"offsets": {
						"from": 1040,
						"to": 1340
					},
					"id": 644,
					"p": 0.993718,
					"t_dtw": -1
				},
				{
					"text": " I",
					"timestamps": {
						"from": "00:00:01,340",
						"to": "00:00:01,410"
					},
					"offsets": {
						"from": 1340,
						"to": 1410
					},
					"id": 314,
					"p": 0.993655,
					"t_dtw": -1
				},
				{
					"text": " see",
					"timestamps": {
						"from": "00:00:01,410",
						"to": "00:00:01,630"
					},
					"offsets": {
						"from": 1410,
						"to": 1630
					},
					"id": 766,
					"p": 0.997687,
					"t_dtw": -1
				},
				{
					"text": " coming",
					"timestamps": {
						"from": "00:00:01,630",
						"to": "00:00:02,080"
					},
					"offsets": {
						"from": 1630,
						"to": 2080
					},
					"id": 2406,
					"p": 0.995338,
					"t_dtw": -1
				},
				{
					"text": ".",
					"timestamps": {
						"from": "00:00:02,080",
						"to": "00:00:02,360"
					},
					"offsets": {
						"from": 2080,
						"to": 2360
					},
					"id": 13,
					"p": 0.919863,
					"t_dtw": -1
				},
				{
					"text": "[_TT_118]",
					"timestamps": {
						"from": "00:00:02,360",
						"to": "00:00:02,360"
					},
					"offsets": {
						"from": 2360,
						"to": 2360
					},
					"id": 50481,
					"p": 0.291442,
					"t_dtw": -1
				}
			]
		},
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Whisper VAD - Token Timestamp Issue #3174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Whisper VAD - Token Timestamp Issue #3174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions