Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
58772e6
Generate FileParts for images generated by Google and OpenAI
DouweM Sep 19, 2025
3089c94
Add Image class
DouweM Sep 19, 2025
09f852b
fix 3.10
DouweM Sep 19, 2025
39ff6c6
fix snapshot with pretty quote
DouweM Sep 19, 2025
bf79068
Refactor output schemas to prepare for allows_image
DouweM Sep 19, 2025
1e26d7a
Support output_type=Image
DouweM Sep 19, 2025
1a62094
Automatically add built-in image generation tool in OpenAI model if I…
DouweM Sep 20, 2025
941021a
fix snapshot
DouweM Sep 20, 2025
cc44d4f
Merge branch 'main' into image-generation
DouweM Oct 1, 2025
bea8fe8
fix errors after merge conflicts
DouweM Oct 1, 2025
dd71619
Simplify handling of response without expected output
DouweM Oct 1, 2025
93a3492
Test interaction of image generation with output and tool calls
DouweM Oct 1, 2025
86d8f1e
streaming
DouweM Oct 1, 2025
e631131
support OpenAI image generation tool options and result metadata
DouweM Oct 1, 2025
288983a
Fake support for ImageGenerationTool on GoogleModel
DouweM Oct 1, 2025
6c7e072
test on vertex
DouweM Oct 1, 2025
5e9af12
add missing cassettes
DouweM Oct 1, 2025
9807c74
fix lint
DouweM Oct 1, 2025
41f3350
Fix a bunch of tests
DouweM Oct 1, 2025
9e6fc4b
Include assistant file parts in OTel messages
DouweM Oct 1, 2025
4c66656
Use global Vertex region for tests, as it has image-preview model
DouweM Oct 1, 2025
ef0d9c1
fix tests
DouweM Oct 1, 2025
8006b83
fix vertex cassettes
DouweM Oct 1, 2025
b353943
fix vertex cassette
DouweM Oct 1, 2025
dda7a48
Raise error when using image output with Temporal
DouweM Oct 1, 2025
a087703
coverage
DouweM Oct 1, 2025
4551fd5
coverage
DouweM Oct 2, 2025
99ebf3d
coverage
DouweM Oct 2, 2025
6b0b1c7
Rename Image to BinaryImage, verify it supports roundtrip serialization
DouweM Oct 2, 2025
215f6ec
Fix test
DouweM Oct 2, 2025
40feb36
Add result.response.{text,thinking,files,tool_calls,builtin_tool_call…
DouweM Oct 2, 2025
960b5ea
Add response property to AgentStream and StreamedResponseSync
DouweM Oct 2, 2025
f212df4
coverage
DouweM Oct 2, 2025
577e327
Use ModelResponse convenience accessors
DouweM Oct 2, 2025
9cc6a04
coverage
DouweM Oct 2, 2025
fe6f026
tweaks
DouweM Oct 2, 2025
52da4f1
Add ModelResponse.images helper property
DouweM Oct 2, 2025
b33f78a
fix cli copy test
DouweM Oct 3, 2025
c353b99
address feedback
DouweM Oct 3, 2025
91b2824
Add docs
DouweM Oct 3, 2025
f394f7a
Add links to model settings docs
DouweM Oct 3, 2025
91e2283
Fix Groq API docs link
DouweM Oct 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 195 additions & 20 deletions docs/builtin-tools.md

Large diffs are not rendered by default.

44 changes: 41 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
"Output" refers to the final value returned from [running an agent](agents.md#running-agents). This can be either plain text, [structured data](#structured-output), or the result of a [function](#output-functions) called with arguments provided by the model.
"Output" refers to the final value returned from [running an agent](agents.md#running-agents). This can be either plain text, [structured data](#structured-output), an [image](#image-output), or the result of a [function](#output-functions) called with arguments provided by the model.

The output is wrapped in [`AgentRunResult`][pydantic_ai.agent.AgentRunResult] or [`StreamedRunResult`][pydantic_ai.result.StreamedRunResult] so that you can access other data, like [usage][pydantic_ai.usage.RunUsage] of the run and [message history](message-history.md#accessing-messages-from-results).

Both `AgentRunResult` and `StreamedRunResult` are generic in the data they wrap, so typing information about the data returned by the agent is preserved.

A run ends when the model responds with one of the structured output types, or, if no output type is specified or `str` is one of the allowed options, when a plain text response is received. A run can also be cancelled if usage limits are exceeded, see [Usage Limits](agents.md#usage-limits).
A run ends when the model responds with one of the output types, or, if no output type is specified or `str` is one of the allowed options, when a plain text response is received. A run can also be cancelled if usage limits are exceeded, see [Usage Limits](agents.md#usage-limits).

Here's an example using a Pydantic model as the `output_type`, forcing the model to respond with data matching our specification:

Expand All @@ -29,7 +29,7 @@ print(result.usage())

_(This example is complete, it can be run "as is")_

## Output data {#structured-output}
## Structured output data {#structured-output}

The [`Agent`][pydantic_ai.Agent] class constructor takes an `output_type` argument that takes one or more types or [output functions](#output-functions). It supports simple scalar types, list and dict types (including `TypedDict`s and [`StructuredDict`s](#structured-dict)), dataclasses and Pydantic models, as well as type unions -- generally everything supported as type hints in a Pydantic model. You can also pass a list of multiple choices.

Expand Down Expand Up @@ -470,6 +470,44 @@ print(result.output)

_(This example is complete, it can be run "as is")_

## Image output

Some models can generate images as part of their response, for example those that support the [Image Generation built-in tool](builtin-tools.md#image-generation-tool) and OpenAI models using the [Code Execution built-in tool](builtin-tools.md#code-execution-tool) when told to generate a chart.

To use the generated image as the output of the agent run, you can set `output_type` to [`BinaryImage`][pydantic_ai.messages.BinaryImage]. If no image-generating built-in tool is explicitly specified, the [`ImageGenerationTool`][pydantic_ai.builtin_tools.ImageGenerationTool] will be enabled automatically.

```py {title="image_output.py"}
from pydantic_ai import Agent, BinaryImage

agent = Agent('openai-responses:gpt-5', output_type=BinaryImage)

result = agent.run_sync('Generate an image of an axolotl.')
assert isinstance(result.output, BinaryImage)
```

_(This example is complete, it can be run "as is")_

If an agent does not need to always generate an image, you can use a union of `BinaryImage` and `str`. If the model generates both, the image will take precedence as output and the text will be available on [`ModelResponse.text`][pydantic_ai.messages.ModelResponse.text]:

```py {title="image_output_union.py"}
from pydantic_ai import Agent, BinaryImage

agent = Agent('openai-responses:gpt-5', output_type=BinaryImage | str)

result = agent.run_sync('Tell me a two-sentence story about an axolotl, no image please.')
print(result.output)
"""
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
"""

result = agent.run_sync('Tell me a two-sentence story about an axolotl with an illustration.')
assert isinstance(result.output, BinaryImage)
print(result.response.text)
"""
Once upon a time, in a hidden underwater cave, lived a curious axolotl named Pip who loved to explore. One day, while venturing further than usual, Pip discovered a shimmering, ancient coin that granted wishes!
"""
```

## Streamed Results

There two main challenges with streamed results:
Expand Down
14 changes: 6 additions & 8 deletions docs/thinking.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ You can customize the tags using the [`thinking_tags`][pydantic_ai.profiles.Mode
### OpenAI Responses

The [`OpenAIResponsesModel`][pydantic_ai.models.openai.OpenAIResponsesModel] can generate native thinking parts.
To enable this functionality, you need to set the `openai_reasoning_effort` and `openai_reasoning_summary` fields in the
[`OpenAIResponsesModelSettings`][pydantic_ai.models.openai.OpenAIResponsesModelSettings].
To enable this functionality, you need to set the
[`OpenAIResponsesModelSettings.openai_reasoning_effort`][pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_reasoning_effort] and [`OpenAIResponsesModelSettings.openai_reasoning_summary`][pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_reasoning_summary] [model settings](agents.md#model-run-settings).

By default, the unique IDs of reasoning, text, and function call parts from the message history are sent to the model, which can result in errors like `"Item 'rs_123' of type 'reasoning' was provided without its required following item."`
if the message history you're sending does not match exactly what was received from the Responses API in a previous response, for example if you're using a [history processor](message-history.md#processing-message-history).
To disable this, you can set the `openai_send_reasoning_ids` field on [`OpenAIResponsesModelSettings`][pydantic_ai.models.openai.OpenAIResponsesModelSettings] to `False`.
To disable this, you can disable the [`OpenAIResponsesModelSettings.openai_send_reasoning_ids`][pydantic_ai.models.openai.OpenAIResponsesModelSettings.openai_send_reasoning_ids] [model setting](agents.md#model-run-settings).

```python {title="openai_thinking_part.py"}
from pydantic_ai import Agent
Expand All @@ -36,7 +36,7 @@ agent = Agent(model, model_settings=settings)

## Anthropic

To enable thinking, use the `anthropic_thinking` field in the [`AnthropicModelSettings`][pydantic_ai.models.anthropic.AnthropicModelSettings].
To enable thinking, use the [`AnthropicModelSettings.anthropic_thinking`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_thinking] [model setting](agents.md#model-run-settings).

```python {title="anthropic_thinking_part.py"}
from pydantic_ai import Agent
Expand All @@ -52,8 +52,7 @@ agent = Agent(model, model_settings=settings)

## Google

To enable thinking, use the `google_thinking_config` field in the
[`GoogleModelSettings`][pydantic_ai.models.google.GoogleModelSettings].
To enable thinking, use the [`GoogleModelSettings.google_thinking_config`][pydantic_ai.models.google.GoogleModelSettings.google_thinking_config] [model setting](agents.md#model-run-settings).

```python {title="google_thinking_part.py"}
from pydantic_ai import Agent
Expand All @@ -75,8 +74,7 @@ Groq supports different formats to receive thinking parts:
- `"hidden"`: The thinking part is not included in the text content.
- `"parsed"`: The thinking part has its own structured part in the response which is converted into a [`ThinkingPart`][pydantic_ai.messages.ThinkingPart] object.

To enable thinking, use the `groq_reasoning_format` field in the
[`GroqModelSettings`][pydantic_ai.models.groq.GroqModelSettings]:
To enable thinking, use the [`GroqModelSettings.groq_reasoning_format`][pydantic_ai.models.groq.GroqModelSettings.groq_reasoning_format] [model setting](agents.md#model-run-settings):

```python {title="groq_thinking_part.py"}
from pydantic_ai import Agent
Expand Down
15 changes: 14 additions & 1 deletion pydantic_ai_slim/pydantic_ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,14 @@
UserPromptNode,
capture_run_messages,
)
from .builtin_tools import CodeExecutionTool, UrlContextTool, WebSearchTool, WebSearchUserLocation
from .builtin_tools import (
CodeExecutionTool,
ImageGenerationTool,
MemoryTool,
UrlContextTool,
WebSearchTool,
WebSearchUserLocation,
)
from .exceptions import (
AgentRunError,
ApprovalRequired,
Expand All @@ -30,11 +37,13 @@
BaseToolCallPart,
BaseToolReturnPart,
BinaryContent,
BinaryImage,
BuiltinToolCallPart,
BuiltinToolReturnPart,
DocumentFormat,
DocumentMediaType,
DocumentUrl,
FilePart,
FileUrl,
FinalResultEvent,
FinishReason,
Expand Down Expand Up @@ -131,6 +140,7 @@
'DocumentMediaType',
'DocumentUrl',
'FileUrl',
'FilePart',
'FinalResultEvent',
'FinishReason',
'FunctionToolCallEvent',
Expand All @@ -139,6 +149,7 @@
'ImageFormat',
'ImageMediaType',
'ImageUrl',
'BinaryImage',
'ModelMessage',
'ModelMessagesTypeAdapter',
'ModelRequest',
Expand Down Expand Up @@ -197,6 +208,8 @@
'WebSearchUserLocation',
'UrlContextTool',
'CodeExecutionTool',
'ImageGenerationTool',
'MemoryTool',
# output
'ToolOutput',
'NativeOutput',
Expand Down
Loading