Skip to content

Conversation

@JuanmaBM
Copy link
Contributor

@JuanmaBM JuanmaBM commented Jun 4, 2025

This PR adds support for OpenAI-style multimodal request formats, allowing the simulator to correctly parse messages where the content field is an array of typed blocks (text, image_url, etc.) instead of a plain string.

Key Changes:

  • Refactors the Message struct to support []ContentBlock instead of string.
  • Adds parsing logic.
  • Maintains backward compatibility with simple string content if needed.

This enables compatibility with frameworks like Llama Stack and other OpenAI v2 clients using structured content.

Fixes: Multimodal Requests Not Supported

Signed-off-by: Juanma Barea <[email protected]>
@mayabar
Copy link
Collaborator

mayabar commented Jun 10, 2025

Hi @JuanmaBM , thanks for you PR.
Support of multimodal requests in chat completion is an important feature.
According the OpenAI API (https://platform.openai.com/docs/guides/images-vision?api-mode=chat) structure of content in case of array of items seems a little different from one in the implementation.
OpenAI API's example:

"content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ],

Code:

type contentBlock struct {
	Type     string `json:"type"`
	Text     string `json:"text,omitempty"`
	ImageURL string `json:"image_url,omitempty"`
}

In the code the image_url field is a string instead of an object with the 'url' field.

@JuanmaBM
Copy link
Contributor Author

You're right, @mayabar. However, I don't believe it's necessary to add it as an object since the simulator only provides random/echo responses and doesn't utilize the image_url data. I'm happy to modify it if you feel it's essential to meet the specification :)

@mayabar
Copy link
Collaborator

mayabar commented Jun 11, 2025

Thanks for your comment, @JuanmaBM , in the simulator we want to be compatible with OpenAI API specification, at this stage we don't support all fields in requests data structures, currently, our goal is to support a sub-set of the OpenAI API, but we don't want to add data structure's fields that are not defined in OpenAI API or overload existing fields. This will allow the simulator clients to be able easily to switch to real vLLM

@JuanmaBM
Copy link
Contributor Author

Okay, @mayabar, I've been pretty busy lately, but I'll try to update the PR tomorrow.

Signed-off-by: Juanma Barea <[email protected]>
@JuanmaBM
Copy link
Contributor Author

@mayabar Done.

I was thinking that a schema validator might be useful for the simulator. It could be a desirable feature to verify whether an agent or AI client supports the OpenAI specification. I was considering adding a --schema-validate flag to the simulator parameters. If you think this feature would be useful, I can implement it.

@mayabar
Copy link
Collaborator

mayabar commented Jun 15, 2025

Hi @JuanmaBM , thank you for the update.

Regarding the schema validation - the request body structure is validated by marshaling to an appropriate structure.
Ira (@irar2) is working on another PR that will add support for tools calls. Tool description is a "free" json that should be compliant to the specific schema, in this case we will use schema validation.

mayabar
mayabar previously approved these changes Jun 15, 2025
var sb strings.Builder
for _, block := range mc.Structured {
if block.Type == "text" {
sb.WriteString(block.Text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add here separator (space), this will allow better tokens number calculation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mayabar mayabar requested a review from irar2 June 15, 2025 09:00
@mayabar mayabar dismissed their stale review June 15, 2025 09:33

lint fails

Copy link
Collaborator

@mayabar mayabar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix lint failure

Copy link
Collaborator

@shmuelk shmuelk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/approve

@shmuelk shmuelk merged commit bef085f into llm-d:main Jun 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multimodal Requests Not Supported

3 participants