-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[MM][Doc] Add documentation for configurable mm profiling #26200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds documentation for the new configurable multi-modal profiling options. The documentation is clear and provides a good example. However, there is a TODO
note left in the user-facing documentation which contains an important limitation of the feature. This should be rephrased to be clearer for users to avoid potential confusion regarding memory usage.
- `image`: `{"count": int, "width": int, "height": int}` | ||
- `video`: `{"count": int, "num_frames": int, "width": int, "height": int}` | ||
- `audio`: `{"count": int, "length": int}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to include links to the API docs here?
This way if the doc becomes out of sync, at least the reader can see the latest options in the API docs.
i.e. for audio you'd use:
[`AudioDummyOptions`][vllm.config.multimodal.AudioDummyOptions]
Documentation preview: https://vllm--26200.org.readthedocs.build/en/26200/ |
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
Signed-off-by: wwl2755 <[email protected]>
…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...
…ct#26200) Signed-off-by: wwl2755 <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…ct#26200) Signed-off-by: wwl2755 <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>
…ct#26200) Signed-off-by: wwl2755 <[email protected]>
PR #25631 introduced a configurable mm profiling method, and this PR adds the corresponding usage guideline in the documents.
CC: @ywang96 @DarkLight1337 @Isotr0py @hmellor