Skip to content

Commit 62eff6b

Browse files
feat(langchain): add support for image generation tool
1 parent d63a505 commit 62eff6b

File tree

6 files changed

+460
-0
lines changed

6 files changed

+460
-0
lines changed

.changeset/tidy-ligers-rhyme.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@langchain/openai": minor
3+
---
4+
5+
feat(langchain): add support for image generation tool

libs/providers/langchain-openai/README.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,122 @@ Filter operators: `eq` (equals), `ne` (not equal), `gt` (greater than), `gte` (g
292292

293293
For more information, see [OpenAI's File Search Documentation](https://platform.openai.com/docs/guides/tools-file-search).
294294

295+
### Image Generation Tool
296+
297+
The Image Generation tool allows models to generate or edit images using text prompts and optional image inputs. It leverages the GPT Image model and automatically optimizes text inputs for improved performance.
298+
299+
Use Image Generation for:
300+
301+
- **Creating images from text**: Generate images from detailed text descriptions
302+
- **Editing existing images**: Modify images based on text instructions
303+
- **Multi-turn image editing**: Iteratively refine images across conversation turns
304+
- **Various output formats**: Support for PNG, JPEG, and WebP formats
305+
306+
```typescript
307+
import { ChatOpenAI, tools } from "@langchain/openai";
308+
309+
const model = new ChatOpenAI({ model: "gpt-4o" });
310+
311+
// Basic usage - generate an image
312+
const response = await model.invoke(
313+
"Generate an image of a gray tabby cat hugging an otter with an orange scarf",
314+
{ tools: [tools.imageGeneration()] }
315+
);
316+
317+
// Access the generated image (base64-encoded)
318+
const imageOutput = response.additional_kwargs.tool_outputs?.find(
319+
(output) => output.type === "image_generation_call"
320+
);
321+
if (imageOutput?.result) {
322+
const fs = await import("fs");
323+
fs.writeFileSync("output.png", Buffer.from(imageOutput.result, "base64"));
324+
}
325+
```
326+
327+
**Custom size and quality** - Configure output dimensions and quality:
328+
329+
```typescript
330+
const response = await model.invoke("Draw a beautiful sunset over mountains", {
331+
tools: [
332+
tools.imageGeneration({
333+
size: "1536x1024", // Landscape format (also: "1024x1024", "1024x1536", "auto")
334+
quality: "high", // Quality level (also: "low", "medium", "auto")
335+
}),
336+
],
337+
});
338+
```
339+
340+
**Output format and compression** - Choose format and compression level:
341+
342+
```typescript
343+
const response = await model.invoke("Create a product photo", {
344+
tools: [
345+
tools.imageGeneration({
346+
outputFormat: "jpeg", // Format (also: "png", "webp")
347+
outputCompression: 90, // Compression 0-100 (for JPEG/WebP)
348+
}),
349+
],
350+
});
351+
```
352+
353+
**Transparent background** - Generate images with transparency:
354+
355+
```typescript
356+
const response = await model.invoke(
357+
"Create a logo with transparent background",
358+
{
359+
tools: [
360+
tools.imageGeneration({
361+
background: "transparent", // Background type (also: "opaque", "auto")
362+
outputFormat: "png",
363+
}),
364+
],
365+
}
366+
);
367+
```
368+
369+
**Streaming with partial images** - Get visual feedback during generation:
370+
371+
```typescript
372+
const response = await model.invoke("Draw a detailed fantasy castle", {
373+
tools: [
374+
tools.imageGeneration({
375+
partialImages: 2, // Number of partial images (0-3)
376+
}),
377+
],
378+
});
379+
```
380+
381+
**Force image generation** - Ensure the model uses the image generation tool:
382+
383+
```typescript
384+
const response = await model.invoke("A serene lake at dawn", {
385+
tools: [tools.imageGeneration()],
386+
tool_choice: { type: "image_generation" },
387+
});
388+
```
389+
390+
**Multi-turn editing** - Refine images across conversation turns:
391+
392+
```typescript
393+
// First turn: generate initial image
394+
const response1 = await model.invoke("Draw a red car", {
395+
tools: [tools.imageGeneration()],
396+
});
397+
398+
// Second turn: edit the image
399+
const response2 = await model.invoke(
400+
[response1, new HumanMessage("Now change the car color to blue")],
401+
{ tools: [tools.imageGeneration()] }
402+
);
403+
```
404+
405+
> **Prompting tips**: Use terms like "draw" or "edit" for best results. For combining images, say "edit the first image by adding this element" instead of "combine" or "merge".
406+
407+
Supported models: `gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `o3`
408+
409+
For more information, see [OpenAI's Image Generation Documentation](https://platform.openai.com/docs/guides/tools-image-generation).
410+
295411
## Embeddings
296412

297413
This package also adds support for OpenAI's embeddings model.
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
import { OpenAI as OpenAIClient } from "openai";
2+
3+
/**
4+
* Optional mask for inpainting. Allows you to specify areas of the image
5+
* that should be regenerated.
6+
*/
7+
export interface ImageGenerationInputMask {
8+
/**
9+
* Base64-encoded mask image URL.
10+
*/
11+
imageUrl?: string;
12+
/**
13+
* File ID for the mask image (uploaded via OpenAI File API).
14+
*/
15+
fileId?: string;
16+
}
17+
18+
/**
19+
* Options for the Image Generation tool.
20+
*/
21+
export interface ImageGenerationOptions {
22+
/**
23+
* Background type for the generated image.
24+
* - `transparent`: Generate image with transparent background
25+
* - `opaque`: Generate image with opaque background
26+
* - `auto`: Let the model decide based on the prompt
27+
* @default "auto"
28+
*/
29+
background?: "transparent" | "opaque" | "auto";
30+
31+
/**
32+
* Control how much effort the model will exert to match the style and features,
33+
* especially facial features, of input images. This parameter is only supported
34+
* for `gpt-image-1`. Unsupported for `gpt-image-1-mini`.
35+
* - `high`: Higher fidelity to input images
36+
* - `low`: Lower fidelity to input images
37+
* @default "low"
38+
*/
39+
inputFidelity?: "high" | "low";
40+
41+
/**
42+
* Optional mask for inpainting. Use this to specify areas of an image
43+
* that should be regenerated.
44+
*/
45+
inputImageMask?: ImageGenerationInputMask;
46+
47+
/**
48+
* The image generation model to use.
49+
* @default "gpt-image-1"
50+
*/
51+
model?: "gpt-image-1" | "gpt-image-1-mini";
52+
53+
/**
54+
* Moderation level for the generated image.
55+
* - `auto`: Standard moderation
56+
* - `low`: Less restrictive moderation
57+
* @default "auto"
58+
*/
59+
moderation?: "auto" | "low";
60+
61+
/**
62+
* Compression level for the output image (0-100).
63+
* Only applies to JPEG and WebP formats.
64+
* @default 100
65+
*/
66+
outputCompression?: number;
67+
68+
/**
69+
* The output format of the generated image.
70+
* @default "png"
71+
*/
72+
outputFormat?: "png" | "webp" | "jpeg";
73+
74+
/**
75+
* Number of partial images to generate in streaming mode (0-3).
76+
* When set, the model will return partial images as they are generated,
77+
* providing faster visual feedback.
78+
* @default 0
79+
*/
80+
partialImages?: number;
81+
82+
/**
83+
* The quality of the generated image.
84+
* - `low`: Faster generation, lower quality
85+
* - `medium`: Balanced generation time and quality
86+
* - `high`: Slower generation, higher quality
87+
* - `auto`: Let the model decide based on the prompt
88+
* @default "auto"
89+
*/
90+
quality?: "low" | "medium" | "high" | "auto";
91+
92+
/**
93+
* The size of the generated image.
94+
* - `1024x1024`: Square format
95+
* - `1024x1536`: Portrait format
96+
* - `1536x1024`: Landscape format
97+
* - `auto`: Let the model decide based on the prompt
98+
* @default "auto"
99+
*/
100+
size?: "1024x1024" | "1024x1536" | "1536x1024" | "auto";
101+
}
102+
103+
/**
104+
* OpenAI Image Generation tool type for the Responses API.
105+
*/
106+
export type ImageGenerationTool = OpenAIClient.Responses.Tool.ImageGeneration;
107+
108+
/**
109+
* Converts input mask options to the API format.
110+
*/
111+
function convertInputImageMask(
112+
mask: ImageGenerationInputMask | undefined
113+
): ImageGenerationTool["input_image_mask"] {
114+
if (!mask) return undefined;
115+
return {
116+
image_url: mask.imageUrl,
117+
file_id: mask.fileId,
118+
};
119+
}
120+
121+
/**
122+
* Creates an Image Generation tool that allows models to generate or edit images
123+
* using text prompts and optional image inputs.
124+
*
125+
* The image generation tool leverages the GPT Image model and automatically
126+
* optimizes text inputs for improved performance. When included in a request,
127+
* the model can decide when and how to generate images as part of the conversation.
128+
*
129+
* **Key Features**:
130+
* - Generate images from text descriptions
131+
* - Edit existing images with text instructions
132+
* - Multi-turn image editing by referencing previous responses
133+
* - Configurable output options (size, quality, format)
134+
* - Streaming support for partial image generation
135+
*
136+
* **Prompting Tips**:
137+
* - Use terms like "draw" or "edit" in your prompt for best results
138+
* - For combining images, say "edit the first image by adding this element" instead of "combine"
139+
*
140+
* @see {@link https://platform.openai.com/docs/guides/tools-image-generation | OpenAI Image Generation Documentation}
141+
*
142+
* @param options - Configuration options for the Image Generation tool
143+
* @returns An Image Generation tool definition to be passed to the OpenAI Responses API
144+
*
145+
* @example
146+
* ```typescript
147+
* import { ChatOpenAI, tools } from "@langchain/openai";
148+
*
149+
* const model = new ChatOpenAI({ model: "gpt-4o" });
150+
*
151+
* // Basic usage - generate an image
152+
* const response = await model.invoke(
153+
* "Generate an image of a gray tabby cat hugging an otter with an orange scarf",
154+
* { tools: [tools.imageGeneration()] }
155+
* );
156+
*
157+
* // Access the generated image
158+
* const imageData = response.additional_kwargs.tool_outputs?.find(
159+
* (output) => output.type === "image_generation_call"
160+
* );
161+
* if (imageData?.result) {
162+
* // imageData.result contains the base64-encoded image
163+
* const fs = await import("fs");
164+
* fs.writeFileSync("output.png", Buffer.from(imageData.result, "base64"));
165+
* }
166+
*
167+
* // With custom options
168+
* const response = await model.invoke(
169+
* "Draw a beautiful sunset over mountains",
170+
* {
171+
* tools: [tools.imageGeneration({
172+
* size: "1536x1024", // Landscape format
173+
* quality: "high", // Higher quality output
174+
* outputFormat: "jpeg", // JPEG format
175+
* outputCompression: 90, // 90% compression
176+
* })]
177+
* }
178+
* );
179+
*
180+
* // With transparent background
181+
* const response = await model.invoke(
182+
* "Create a logo with a transparent background",
183+
* {
184+
* tools: [tools.imageGeneration({
185+
* background: "transparent",
186+
* outputFormat: "png",
187+
* })]
188+
* }
189+
* );
190+
*
191+
* // Force the model to use image generation
192+
* const response = await model.invoke(
193+
* "A serene lake at dawn",
194+
* {
195+
* tools: [tools.imageGeneration()],
196+
* tool_choice: { type: "image_generation" },
197+
* }
198+
* );
199+
*
200+
* // Enable streaming with partial images
201+
* const response = await model.invoke(
202+
* "Draw a detailed fantasy castle",
203+
* {
204+
* tools: [tools.imageGeneration({
205+
* partialImages: 2, // Get 2 partial images during generation
206+
* })]
207+
* }
208+
* );
209+
* ```
210+
*
211+
* @remarks
212+
* - Supported models: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3
213+
* - The image generation process always uses `gpt-image-1` model internally
214+
* - The model will automatically revise prompts for improved performance
215+
* - Access the revised prompt via `revised_prompt` field in the output
216+
* - Multi-turn editing is supported by passing previous response messages
217+
*/
218+
export function imageGeneration(
219+
options?: ImageGenerationOptions
220+
): ImageGenerationTool {
221+
return {
222+
type: "image_generation",
223+
background: options?.background,
224+
input_fidelity: options?.inputFidelity,
225+
input_image_mask: convertInputImageMask(options?.inputImageMask),
226+
model: options?.model,
227+
moderation: options?.moderation,
228+
output_compression: options?.outputCompression,
229+
output_format: options?.outputFormat,
230+
partial_images: options?.partialImages,
231+
quality: options?.quality,
232+
size: options?.size,
233+
};
234+
}

libs/providers/langchain-openai/src/tools/index.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,17 @@ export type {
3737
FileSearchHybridSearchWeights,
3838
} from "./fileSearch.js";
3939

40+
import { imageGeneration } from "./imageGeneration.js";
41+
export type {
42+
ImageGenerationTool,
43+
ImageGenerationOptions,
44+
ImageGenerationInputMask,
45+
} from "./imageGeneration.js";
46+
4047
export const tools = {
4148
webSearch,
4249
mcp,
4350
codeInterpreter,
4451
fileSearch,
52+
imageGeneration,
4553
};

0 commit comments

Comments
 (0)