Skip to content

Commit 7737d25

Browse files
committed
feat(client): add Realtime API support
1 parent 20f179d commit 7737d25

File tree

16 files changed

+777
-8
lines changed

16 files changed

+777
-8
lines changed

README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,6 +388,20 @@ const { data: stream, request_id } = await openai.chat.completions
388388
.withResponse();
389389
```
390390

391+
## Realtime API Beta
392+
393+
The Realtime API enables you to build low-latency, multi-modal conversational experiences. It currently supports text and audio as both input and output, as well as [function calling](https://platform.openai.com/docs/guides/function-calling) through a `WebSocket` connection.
394+
395+
```ts
396+
import { OpenAIRealtimeWebSocket } from 'openai/beta/realtime/websocket';
397+
398+
const rt = new OpenAIRealtimeWebSocket({ model: 'gpt-4o-realtime-preview-2024-12-17' });
399+
400+
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
401+
```
402+
403+
For more information see [realtime.md](realtime.md).
404+
391405
## Microsoft Azure OpenAI
392406

393407
To use this library with [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/overview), use the `AzureOpenAI`

examples/azure.ts renamed to examples/azure/chat.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import { AzureOpenAI } from 'openai';
44
import { getBearerTokenProvider, DefaultAzureCredential } from '@azure/identity';
5+
import 'dotenv/config';
56

67
// Corresponds to your Model deployment within your OpenAI resource, e.g. gpt-4-1106-preview
78
// Navigate to the Azure OpenAI Studio to deploy a model.
@@ -13,7 +14,7 @@ const azureADTokenProvider = getBearerTokenProvider(credential, scope);
1314

1415
// Make sure to set AZURE_OPENAI_ENDPOINT with the endpoint of your Azure resource.
1516
// You can find it in the Azure Portal.
16-
const openai = new AzureOpenAI({ azureADTokenProvider });
17+
const openai = new AzureOpenAI({ azureADTokenProvider, apiVersion: '2024-10-01-preview' });
1718

1819
async function main() {
1920
console.log('Non-streaming:');
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
import { OpenAIRealtimeWebSocket } from 'openai/beta/realtime/websocket';
2+
import { AzureOpenAI } from 'openai';
3+
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
4+
import 'dotenv/config';
5+
6+
async function main() {
7+
const cred = new DefaultAzureCredential();
8+
const scope = 'https://cognitiveservices.azure.com/.default';
9+
const deploymentName = 'gpt-4o-realtime-preview-1001';
10+
const azureADTokenProvider = getBearerTokenProvider(cred, scope);
11+
const client = new AzureOpenAI({
12+
azureADTokenProvider,
13+
apiVersion: '2024-10-01-preview',
14+
deployment: deploymentName,
15+
});
16+
const rt = await OpenAIRealtimeWebSocket.azure(client);
17+
18+
// access the underlying `ws.WebSocket` instance
19+
rt.socket.addEventListener('open', () => {
20+
console.log('Connection opened!');
21+
rt.send({
22+
type: 'session.update',
23+
session: {
24+
modalities: ['text'],
25+
model: 'gpt-4o-realtime-preview',
26+
},
27+
});
28+
29+
rt.send({
30+
type: 'conversation.item.create',
31+
item: {
32+
type: 'message',
33+
role: 'user',
34+
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
35+
},
36+
});
37+
38+
rt.send({ type: 'response.create' });
39+
});
40+
41+
rt.on('error', (err) => {
42+
// in a real world scenario this should be logged somewhere as you
43+
// likely want to continue procesing events regardless of any errors
44+
throw err;
45+
});
46+
47+
rt.on('session.created', (event) => {
48+
console.log('session created!', event.session);
49+
console.log();
50+
});
51+
52+
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
53+
rt.on('response.text.done', () => console.log());
54+
55+
rt.on('response.done', () => rt.close());
56+
57+
rt.socket.addEventListener('close', () => console.log('\nConnection closed!'));
58+
}
59+
60+
main();

examples/azure/realtime/ws.ts

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
2+
import { OpenAIRealtimeWS } from 'openai/beta/realtime/ws';
3+
import { AzureOpenAI } from 'openai';
4+
import 'dotenv/config';
5+
6+
async function main() {
7+
const cred = new DefaultAzureCredential();
8+
const scope = 'https://cognitiveservices.azure.com/.default';
9+
const deploymentName = 'gpt-4o-realtime-preview-1001';
10+
const azureADTokenProvider = getBearerTokenProvider(cred, scope);
11+
const client = new AzureOpenAI({
12+
azureADTokenProvider,
13+
apiVersion: '2024-10-01-preview',
14+
deployment: deploymentName,
15+
});
16+
const rt = await OpenAIRealtimeWS.azure(client);
17+
18+
// access the underlying `ws.WebSocket` instance
19+
rt.socket.on('open', () => {
20+
console.log('Connection opened!');
21+
rt.send({
22+
type: 'session.update',
23+
session: {
24+
modalities: ['text'],
25+
model: 'gpt-4o-realtime-preview',
26+
},
27+
});
28+
29+
rt.send({
30+
type: 'conversation.item.create',
31+
item: {
32+
type: 'message',
33+
role: 'user',
34+
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
35+
},
36+
});
37+
38+
rt.send({ type: 'response.create' });
39+
});
40+
41+
rt.on('error', (err) => {
42+
// in a real world scenario this should be logged somewhere as you
43+
// likely want to continue procesing events regardless of any errors
44+
throw err;
45+
});
46+
47+
rt.on('session.created', (event) => {
48+
console.log('session created!', event.session);
49+
console.log();
50+
});
51+
52+
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
53+
rt.on('response.text.done', () => console.log());
54+
55+
rt.on('response.done', () => rt.close());
56+
57+
rt.socket.on('close', () => console.log('\nConnection closed!'));
58+
}
59+
60+
main();

examples/package.json

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,16 @@
66
"license": "MIT",
77
"private": true,
88
"dependencies": {
9+
"@azure/identity": "^4.2.0",
10+
"dotenv": "^16.4.7",
911
"express": "^4.18.2",
1012
"next": "^14.1.1",
1113
"openai": "file:..",
12-
"zod-to-json-schema": "^3.21.4",
13-
"@azure/identity": "^4.2.0"
14+
"zod-to-json-schema": "^3.21.4"
1415
},
1516
"devDependencies": {
1617
"@types/body-parser": "^1.19.3",
17-
"@types/express": "^4.17.19"
18+
"@types/express": "^4.17.19",
19+
"@types/web": "^0.0.194"
1820
}
1921
}

examples/realtime/websocket.ts

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import { OpenAIRealtimeWebSocket } from 'openai/beta/realtime/websocket';
2+
3+
async function main() {
4+
const rt = new OpenAIRealtimeWebSocket({ model: 'gpt-4o-realtime-preview-2024-12-17' });
5+
6+
// access the underlying `ws.WebSocket` instance
7+
rt.socket.addEventListener('open', () => {
8+
console.log('Connection opened!');
9+
rt.send({
10+
type: 'session.update',
11+
session: {
12+
modalities: ['text'],
13+
model: 'gpt-4o-realtime-preview',
14+
},
15+
});
16+
17+
rt.send({
18+
type: 'conversation.item.create',
19+
item: {
20+
type: 'message',
21+
role: 'user',
22+
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
23+
},
24+
});
25+
26+
rt.send({ type: 'response.create' });
27+
});
28+
29+
rt.on('error', (err) => {
30+
// in a real world scenario this should be logged somewhere as you
31+
// likely want to continue procesing events regardless of any errors
32+
throw err;
33+
});
34+
35+
rt.on('session.created', (event) => {
36+
console.log('session created!', event.session);
37+
console.log();
38+
});
39+
40+
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
41+
rt.on('response.text.done', () => console.log());
42+
43+
rt.on('response.done', () => rt.close());
44+
45+
rt.socket.addEventListener('close', () => console.log('\nConnection closed!'));
46+
}
47+
48+
main();

examples/realtime/ws.ts

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
import { OpenAIRealtimeWS } from 'openai/beta/realtime/ws';
2+
3+
async function main() {
4+
const rt = new OpenAIRealtimeWS({ model: 'gpt-4o-realtime-preview-2024-12-17' });
5+
6+
// access the underlying `ws.WebSocket` instance
7+
rt.socket.on('open', () => {
8+
console.log('Connection opened!');
9+
rt.send({
10+
type: 'session.update',
11+
session: {
12+
modalities: ['text'],
13+
model: 'gpt-4o-realtime-preview',
14+
},
15+
});
16+
17+
rt.send({
18+
type: 'conversation.item.create',
19+
item: {
20+
type: 'message',
21+
role: 'user',
22+
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
23+
},
24+
});
25+
26+
rt.send({ type: 'response.create' });
27+
});
28+
29+
rt.on('error', (err) => {
30+
// in a real world scenario this should be logged somewhere as you
31+
// likely want to continue procesing events regardless of any errors
32+
throw err;
33+
});
34+
35+
rt.on('session.created', (event) => {
36+
console.log('session created!', event.session);
37+
console.log();
38+
});
39+
40+
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
41+
rt.on('response.text.done', () => console.log());
42+
43+
rt.on('response.done', () => rt.close());
44+
45+
rt.socket.on('close', () => console.log('\nConnection closed!'));
46+
}
47+
48+
main();

package.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
"@swc/core": "^1.3.102",
3030
"@swc/jest": "^0.2.29",
3131
"@types/jest": "^29.4.0",
32+
"@types/ws": "^8.5.13",
3233
"@types/node": "^20.17.6",
3334
"typescript-eslint": "^8.24.0",
3435
"@typescript-eslint/eslint-plugin": "^8.24.0",
@@ -47,6 +48,7 @@
4748
"tsc-multi": "https://github.com/stainless-api/tsc-multi/releases/download/v1.1.3/tsc-multi.tgz",
4849
"tsconfig-paths": "^4.0.0",
4950
"typescript": "^4.8.2",
51+
"ws": "^8.18.0",
5052
"zod": "^3.23.8"
5153
},
5254
"imports": {
@@ -71,9 +73,13 @@
7173
},
7274
"bin": "./bin/cli",
7375
"peerDependencies": {
76+
"ws": "^8.18.0",
7477
"zod": "^3.23.8"
7578
},
7679
"peerDependenciesMeta": {
80+
"ws": {
81+
"optional": true
82+
},
7783
"zod": {
7884
"optional": true
7985
}

realtime.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
## Realtime API beta
2+
3+
The Realtime API enables you to build low-latency, multi-modal conversational experiences. It currently supports text and audio as both input and output, as well as [function calling](https://platform.openai.com/docs/guides/function-calling) through a `WebSocket` connection.
4+
5+
The Realtime API works through a combination of client-sent events and server-sent events. Clients can send events to do things like update session configuration or send text and audio inputs. Server events confirm when audio responses have completed, or when a text response from the model has been received. A full event reference can be found [here](https://platform.openai.com/docs/api-reference/realtime-client-events) and a guide can be found [here](https://platform.openai.com/docs/guides/realtime).
6+
7+
This SDK supports accessing the Realtime API through the [WebSocket API](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket) or with [ws](https://github.com/websockets/ws).
8+
9+
Basic text based example with `ws`:
10+
11+
```ts
12+
// requires `yarn add ws @types/ws`
13+
import { OpenAIRealtimeWS } from 'openai/beta/realtime/ws';
14+
15+
const rt = new OpenAIRealtimeWS({ model: 'gpt-4o-realtime-preview-2024-12-17' });
16+
17+
// access the underlying `ws.WebSocket` instance
18+
rt.socket.on('open', () => {
19+
console.log('Connection opened!');
20+
rt.send({
21+
type: 'session.update',
22+
session: {
23+
modalities: ['text'],
24+
model: 'gpt-4o-realtime-preview',
25+
},
26+
});
27+
28+
rt.send({
29+
type: 'conversation.item.create',
30+
item: {
31+
type: 'message',
32+
role: 'user',
33+
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
34+
},
35+
});
36+
37+
rt.send({ type: 'response.create' });
38+
});
39+
40+
rt.on('error', (err) => {
41+
// in a real world scenario this should be logged somewhere as you
42+
// likely want to continue procesing events regardless of any errors
43+
throw err;
44+
});
45+
46+
rt.on('session.created', (event) => {
47+
console.log('session created!', event.session);
48+
console.log();
49+
});
50+
51+
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
52+
rt.on('response.text.done', () => console.log());
53+
54+
rt.on('response.done', () => rt.close());
55+
56+
rt.socket.on('close', () => console.log('\nConnection closed!'));
57+
```
58+
59+
To use the web API `WebSocket` implementation, replace `OpenAIRealtimeWS` with `OpenAIRealtimeWebSocket` and adjust any `rt.socket` access:
60+
61+
```ts
62+
import { OpenAIRealtimeWebSocket } from 'openai/beta/realtime/websocket';
63+
64+
const rt = new OpenAIRealtimeWebSocket({ model: 'gpt-4o-realtime-preview-2024-12-17' });
65+
// ...
66+
rt.socket.addEventListener('open', () => {
67+
// ...
68+
});
69+
```
70+
71+
A full example can be found [here](https://github.com/openai/openai-node/blob/master/examples/realtime/websocket.ts).
72+
73+
### Realtime error handling
74+
75+
When an error is encountered, either on the client side or returned from the server through the [`error` event](https://platform.openai.com/docs/guides/realtime-model-capabilities#error-handling), the `error` event listener will be fired. However, if you haven't registered an `error` event listener then an `unhandled Promise rejection` error will be thrown.
76+
77+
It is **highly recommended** that you register an `error` event listener and handle errors approriately as typically the underlying connection is still usable.
78+
79+
```ts
80+
const rt = new OpenAIRealtimeWS({ model: 'gpt-4o-realtime-preview-2024-12-17' });
81+
rt.on('error', (err) => {
82+
// in a real world scenario this should be logged somewhere as you
83+
// likely want to continue procesing events regardless of any errors
84+
throw err;
85+
});
86+
```

0 commit comments

Comments
 (0)