Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions vllm/multimodal/profiling.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,16 +189,9 @@ def get_encoder_dummy_data(
mm_inputs, _ = self.get_and_validate_mm_inputs(seq_len)
mm_inputs = cast(MultiModalEncDecInputs, mm_inputs)

# For encoder-decoder models, use encoder prompt token ids instead of
# decoder prompt to construct dummy seq_data for encoder profiling.
encoder_prompt_token_ids = mm_inputs["encoder_prompt_token_ids"]

total_len = len(encoder_prompt_token_ids)
num_tokens_to_pad = max(total_len, seq_len) - total_len
encoder_prompt_token_ids.extend([0] * num_tokens_to_pad)
Comment on lines -192 to -198
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I add this padding because whisper needs padding the encoder sequence. We need to update whisper's profiler to keep padding for it.

Copy link
Contributor Author

@tjohnson31415 tjohnson31415 Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to look at whisper now; trying out openai/whisper-small. I added some debug logs to this funtion and it looks like encoder_prompt_token_ids always comes in with a length of 1500 during the profiling and this padding doesn't usually trigger. Default for max multimodal batched tokens is 5120, so would need max_num_seqs<4 for seq_len to be >1500, but I get CUDA OOB exceptions if I try with max_num_seqs<12 🤔


return DummyData(
seq_data=SequenceData.from_seqs(encoder_prompt_token_ids),
seq_data=SequenceData.from_seqs(
mm_inputs["encoder_prompt_token_ids"]),
multi_modal_data=None,
multi_modal_placeholders=None,
)
Expand Down