Skip to content

[Usage]: How to let Whisper return timestamps in transcript? #19556

@billhao

Description

@billhao

Your current environment

==============================
        System Info
==============================
OS                           : macOS 15.5 (arm64)
GCC version                  : Could not collect
Clang version                : 17.0.0 (clang-1700.0.13.5)
CMake version                : Could not collect
Libc version                 : N/A

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.10 (main, Apr  8 2025, 11:35:47) [Clang 16.0.0 (clang-1600.0.26.6)] (64-bit runtime)
Python platform              : macOS-15.5-arm64-arm-64bit

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Apple M3 Max

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==26.4.0
[pip3] torch==2.7.0
[pip3] torchaudio==2.7.0
[pip3] torchvision==0.22.0
[pip3] transformers==4.52.4
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.9.2.dev44+gc742438f8 (git sha: c742438f8)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

How would you like to use vllm

I'd like to run whisper model inference with vLLM. To support transcribing long audio files, I need whisper model to return timestamps for chunking and then merging. My code looks like below

model_name = "/Users/xxx/.cache/huggingface/hub/models--openai--whisper-base/snapshots/e37978b90ca9030d5170a5c07aadb050351a65bb"
llm = LLM(model_name, ...)
decoder_prompt = ("<|startoftranscript|>")
input_prompts = [
    {
        "encoder_prompt": {
            "prompt": "",
            "multi_modal_data": {"audio": audio_data},
        },
        "decoder_prompt": decoder_prompt,
    }
]
sampling_params = SamplingParams(decode_with_timestamps=True, ...)
output = llm.generate(input_prompts, sampling_params)

Even though the file generation_config.json has "return_timestamps": true, and decode_with_timestamps=True is passed to SamplingParams, the output doesn't contain any timestamp, just pure transcript text.

I even tried to add WhisperTimeStampLogitsProcessor from transformers as an additional logits_processors in SamplingParams. It didn't work unfortunately.

According to HF transformers' page for whisper, return_timestamps can be passed to its pipeline (which doesn't exist in vLLM)

generate_kwargs = {
    "max_new_tokens": 448,
    "num_beams": 1,
    "condition_on_prev_tokens": False,
    "compression_ratio_threshold": 1.35,  # zlib compression ratio threshold (in token space)
    "temperature": (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    "logprob_threshold": -1.0,
    "no_speech_threshold": 0.6,
    "return_timestamps": True,
}

result = pipe(sample, generate_kwargs=generate_kwargs)

I'm just wondering how to let Whisper output timestamps. Any pointers are welcome! Thanks!

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleOver 90 days of inactivityusageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions