[Feature]: Streaming multi-modal input/output

This is a tracking issue for enabling streaming MM I/O.

## Outline
 
Streaming input:

- Support streaming multi-modal inputs at API level.
- Handle streaming inputs in the multi-modal processor.
- Define an interface for models to indicate support for streaming inputs.
- Update V1 model runner and scheduler to handle partial MM encoding requests (this is the hardest part IMO)

Streaming output:

- Implement `RequestOutputKind.DELTA` for multi-modal outputs in V1 output processor.
- Support streaming multi-modal outputs at API level.

## Notes

- Currently we are waiting for AWS's proposal
- Take inspiration from https://github.com/vllm-project/vllm/pull/16347?
- See also [#22695](https://github.com/vllm-project/vllm/issues/22695)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Streaming multi-modal input/output #25066

Outline

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Streaming multi-modal input/output #25066

Description

Outline

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions