[Usage]: How to let Whisper return timestamps in transcript?

### Your current environment

```
==============================
        System Info
==============================
OS                           : macOS 15.5 (arm64)
GCC version                  : Could not collect
Clang version                : 17.0.0 (clang-1700.0.13.5)
CMake version                : Could not collect
Libc version                 : N/A

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.10 (main, Apr  8 2025, 11:35:47) [Clang 16.0.0 (clang-1600.0.26.6)] (64-bit runtime)
Python platform              : macOS-15.5-arm64-arm-64bit

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Apple M3 Max

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==26.4.0
[pip3] torch==2.7.0
[pip3] torchaudio==2.7.0
[pip3] torchvision==0.22.0
[pip3] transformers==4.52.4
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.9.2.dev44+gc742438f8 (git sha: c742438f8)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
```


### How would you like to use vllm

I'd like to run whisper model inference with vLLM. To support transcribing long audio files, I need whisper model to return timestamps for chunking and then merging. My code looks like below

```
model_name = "/Users/xxx/.cache/huggingface/hub/models--openai--whisper-base/snapshots/e37978b90ca9030d5170a5c07aadb050351a65bb"
llm = LLM(model_name, ...)
decoder_prompt = ("<|startoftranscript|>")
input_prompts = [
    {
        "encoder_prompt": {
            "prompt": "",
            "multi_modal_data": {"audio": audio_data},
        },
        "decoder_prompt": decoder_prompt,
    }
]
sampling_params = SamplingParams(decode_with_timestamps=True, ...)
output = llm.generate(input_prompts, sampling_params)
```
Even though the file generation_config.json has `"return_timestamps": true,` and `decode_with_timestamps=True` is passed to SamplingParams, the output doesn't contain any timestamp, just pure transcript text.

I even tried to add `WhisperTimeStampLogitsProcessor` from `transformers` as an additional logits_processors in SamplingParams. It didn't work unfortunately. 

According to [HF transformers' page for whisper](https://huggingface.co/openai/whisper-large-v3), return_timestamps can be passed to its pipeline (which doesn't exist in vLLM)
```
generate_kwargs = {
    "max_new_tokens": 448,
    "num_beams": 1,
    "condition_on_prev_tokens": False,
    "compression_ratio_threshold": 1.35,  # zlib compression ratio threshold (in token space)
    "temperature": (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    "logprob_threshold": -1.0,
    "no_speech_threshold": 0.6,
    "return_timestamps": True,
}

result = pipe(sample, generate_kwargs=generate_kwargs)
```
I'm just wondering how to let Whisper output timestamps. Any pointers are welcome! Thanks!

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: How to let Whisper return timestamps in transcript? #19556

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: How to let Whisper return timestamps in transcript? #19556

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions