Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 87 additions & 1 deletion docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,7 @@
"streaming_query"
],
"summary": "Streaming Query Endpoint Handler",
"description": "Handle request to the /streaming_query endpoint.\n\nThis endpoint receives a query request, authenticates the user,\nselects the appropriate model and provider, and streams\nincremental response events from the Llama Stack backend to the\nclient. Events include start, token updates, tool calls, turn\ncompletions, errors, and end-of-stream metadata. Optionally\nstores the conversation transcript if enabled in configuration.\n\nReturns:\n StreamingResponse: An HTTP streaming response yielding\n SSE-formatted events for the query lifecycle.\n\nRaises:\n HTTPException: Returns HTTP 500 if unable to connect to the\n Llama Stack server.",
"description": "Handle request to the /streaming_query endpoint using Agent API.\n\nThis is a wrapper around streaming_query_endpoint_handler_base that provides\nthe Agent API specific retrieve_response and response generator functions.\n\nReturns:\n StreamingResponse: An HTTP streaming response yielding\n SSE-formatted events for the query lifecycle.\n\nRaises:\n HTTPException: Returns HTTP 500 if unable to connect to the\n Llama Stack server.",
"operationId": "streaming_query_endpoint_handler_v1_streaming_query_post",
"requestBody": {
"content": {
Expand Down Expand Up @@ -1306,6 +1306,92 @@
}
}
},
"/v2/streaming_query": {
"post": {
"tags": [
"streaming_query_v2"
],
"summary": "Streaming Query Endpoint Handler V2",
"description": "Handle request to the /streaming_query endpoint using Responses API.\n\nThis is a wrapper around streaming_query_endpoint_handler_base that provides\nthe Responses API specific retrieve_response and response generator functions.\n\nReturns:\n StreamingResponse: An HTTP streaming response yielding\n SSE-formatted events for the query lifecycle.\n\nRaises:\n HTTPException: Returns HTTP 500 if unable to connect to the\n Llama Stack server.",
"operationId": "streaming_query_endpoint_handler_v2_v2_streaming_query_post",
Comment on lines +1314 to +1316
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix duplicated “v2_v2” in operationId.

Stabilize the name to match the pattern used elsewhere; clients may depend on it.

-"operationId": "streaming_query_endpoint_handler_v2_v2_streaming_query_post",
+"operationId": "streaming_query_endpoint_handler_v2_streaming_query_post",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"summary": "Streaming Query Endpoint Handler V2",
"description": "Handle request to the /streaming_query endpoint using Responses API.\n\nThis is a wrapper around streaming_query_endpoint_handler_base that provides\nthe Responses API specific retrieve_response and response generator functions.\n\nReturns:\n StreamingResponse: An HTTP streaming response yielding\n SSE-formatted events for the query lifecycle.\n\nRaises:\n HTTPException: Returns HTTP 500 if unable to connect to the\n Llama Stack server.",
"operationId": "streaming_query_endpoint_handler_v2_v2_streaming_query_post",
"summary": "Streaming Query Endpoint Handler V2",
"description": "Handle request to the /streaming_query endpoint using Responses API.\n\nThis is a wrapper around streaming_query_endpoint_handler_base that provides\nthe Responses API specific retrieve_response and response generator functions.\n\nReturns:\n StreamingResponse: An HTTP streaming response yielding\n SSE-formatted events for the query lifecycle.\n\nRaises:\n HTTPException: Returns HTTP 500 if unable to connect to the\n Llama Stack server.",
"operationId": "streaming_query_endpoint_handler_v2_streaming_query_post",
🤖 Prompt for AI Agents
In docs/openapi.json around lines 1314 to 1316, the operationId contains a
duplicated segment "v2_v2"
("streaming_query_endpoint_handler_v2_v2_streaming_query_post"); update it to
remove the duplicate so it matches the project's operationId pattern (e.g.,
"streaming_query_endpoint_handler_v2_streaming_query_post") ensuring the new
operationId is unique and consistent with other endpoints.

"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/QueryRequest"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Streaming response with Server-Sent Events",
"content": {
"application/json": {
"schema": {
"type": "string",
"example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
}
},
"text/plain": {
"schema": {
"type": "string",
"example": "Hello world!\n\n---\n\nReference: https://example.com/doc"
}
}
}
},
"400": {
"description": "Missing or invalid credentials provided by client",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/UnauthorizedResponse"
}
}
}
},
"401": {
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/UnauthorizedResponse"
}
}
}
},
"403": {
"description": "User is not authorized",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ForbiddenResponse"
}
}
}
},
"500": {
"description": "Internal Server Error",
"detail": {
"response": "Unable to connect to Llama Stack",
"cause": "Connection error."
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/readiness": {
"get": {
"tags": [
Expand Down
46 changes: 39 additions & 7 deletions docs/openapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,14 +227,10 @@ Returns:

> **Streaming Query Endpoint Handler**

Handle request to the /streaming_query endpoint.
Handle request to the /streaming_query endpoint using Agent API.

This endpoint receives a query request, authenticates the user,
selects the appropriate model and provider, and streams
incremental response events from the Llama Stack backend to the
client. Events include start, token updates, tool calls, turn
completions, errors, and end-of-stream metadata. Optionally
stores the conversation transcript if enabled in configuration.
This is a wrapper around streaming_query_endpoint_handler_base that provides
the Agent API specific retrieve_response and response generator functions.

Returns:
StreamingResponse: An HTTP streaming response yielding
Expand Down Expand Up @@ -587,6 +583,42 @@ Returns:
| 429 | The quota has been exceeded | [QuotaExceededResponse](#quotaexceededresponse) |
| 500 | Internal Server Error | |
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
## POST `/v2/streaming_query`

> **Streaming Query Endpoint Handler V2**

Handle request to the /streaming_query endpoint using Responses API.

This is a wrapper around streaming_query_endpoint_handler_base that provides
the Responses API specific retrieve_response and response generator functions.

Returns:
StreamingResponse: An HTTP streaming response yielding
SSE-formatted events for the query lifecycle.

Raises:
HTTPException: Returns HTTP 500 if unable to connect to the
Llama Stack server.





### 📦 Request Body

[QueryRequest](#queryrequest)

### ✅ Responses

| Status Code | Description | Component |
|-------------|-------------|-----------|
| 200 | Streaming response with Server-Sent Events | string
string |
| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
| 401 | Unauthorized: Invalid or missing Bearer token for k8s auth | [UnauthorizedResponse](#unauthorizedresponse) |
| 403 | User is not authorized | [ForbiddenResponse](#forbiddenresponse) |
| 500 | Internal Server Error | |
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
## GET `/readiness`

> **Readiness Probe Get Method**
Expand Down
46 changes: 39 additions & 7 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,14 +227,10 @@ Returns:

> **Streaming Query Endpoint Handler**

Handle request to the /streaming_query endpoint.
Handle request to the /streaming_query endpoint using Agent API.

This endpoint receives a query request, authenticates the user,
selects the appropriate model and provider, and streams
incremental response events from the Llama Stack backend to the
client. Events include start, token updates, tool calls, turn
completions, errors, and end-of-stream metadata. Optionally
stores the conversation transcript if enabled in configuration.
This is a wrapper around streaming_query_endpoint_handler_base that provides
the Agent API specific retrieve_response and response generator functions.

Returns:
StreamingResponse: An HTTP streaming response yielding
Expand Down Expand Up @@ -587,6 +583,42 @@ Returns:
| 429 | The quota has been exceeded | [QuotaExceededResponse](#quotaexceededresponse) |
| 500 | Internal Server Error | |
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
## POST `/v2/streaming_query`

> **Streaming Query Endpoint Handler V2**

Handle request to the /streaming_query endpoint using Responses API.

This is a wrapper around streaming_query_endpoint_handler_base that provides
the Responses API specific retrieve_response and response generator functions.

Returns:
StreamingResponse: An HTTP streaming response yielding
SSE-formatted events for the query lifecycle.

Raises:
HTTPException: Returns HTTP 500 if unable to connect to the
Llama Stack server.





### 📦 Request Body

[QueryRequest](#queryrequest)

### ✅ Responses

| Status Code | Description | Component |
|-------------|-------------|-----------|
| 200 | Streaming response with Server-Sent Events | string
string |
| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
| 401 | Unauthorized: Invalid or missing Bearer token for k8s auth | [UnauthorizedResponse](#unauthorizedresponse) |
| 403 | User is not authorized | [ForbiddenResponse](#forbiddenresponse) |
| 500 | Internal Server Error | |
| 422 | Validation Error | [HTTPValidationError](#httpvalidationerror) |
## GET `/readiness`

> **Readiness Probe Get Method**
Expand Down
Loading