Skip to content

Conversation

@Kludex
Copy link
Member

@Kludex Kludex commented Mar 15, 2025

Still a lot to do, and decide... It's still not type safe, and can't use message_history properly.

The main.py in the files already work tho.

image

@Kludex Kludex marked this pull request as draft March 15, 2025 13:20
@github-actions
Copy link

Docs Preview

commit: e8ba35b
Preview URL: https://6c9c1503-pydantic-ai-previews.pydantic.workers.dev

@DouweM
Copy link
Collaborator

DouweM commented Apr 30, 2025

@Kludex Are you planning to work on this or are we better off closing it for now?

@Kludex
Copy link
Member Author

Kludex commented Apr 30, 2025

This is still in my radar, I prefer to keep it open.

@ollz272
Copy link

ollz272 commented Jun 30, 2025

hi, would love access to this feature, is there an ETA?

@lshamis
Copy link

lshamis commented Jul 7, 2025

I think something like this will be necessary sooner than later. Many models can/will generate interleaved multimodal content.

Slightly philosophical question, but why are the output types of an LLM different from those of ToolCall?

@DouweM
Copy link
Collaborator

DouweM commented Jul 7, 2025

Slightly philosophical question, but why are the output types of an LLM different from those of ToolCall?

@lshamis Because the types of data LLMs support as input (whether that's via the user prompt as a tool call result) are not the same as the types of data they can output. For example, all models support text input and text output, and many support image, video, audio, and document input, but only a handful support image output, and as far as I know none can output e.g. PDF files. So there's necessarily a difference between the types of things we allow tools to output (as it's anything that can be sent back to the model as input) and what models themselves can output.

@dorukgezici
Copy link
Contributor

Hey @Kludex and @DouweM , I would be happy to contribute on this as our startup is completely media gen focused (images and video). Would appreciate some guidance though since it seems like a core overhaul. We currently have custom tools implemented that use native Google and OpenAI clients.

@DouweM DouweM assigned DouweM and unassigned Kludex Sep 18, 2025
@DouweM
Copy link
Collaborator

DouweM commented Sep 18, 2025

@dorukgezici Much appreciated! However having just spent some time looking into https://platform.openai.com/docs/guides/tools-image-generation and https://ai.google.dev/gemini-api/docs/image-generation, I think this'd take me a few hours to implement in a clean way, and someone less familiar with our architecture a lot more than that to get it work and then get it through the review cycle 😅 So I'm gonna have a crack at this tomorrow or on my flight to our team offsite on Sunday :)

@DouweM
Copy link
Collaborator

DouweM commented Sep 19, 2025

Closing in favor of #2970

@DouweM DouweM closed this Sep 19, 2025
@Viicos Viicos deleted the playing-with-gemini-images-output branch November 19, 2025 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for model Gemini Flash 2.0 Image Generation

6 participants