-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed as not planned
Labels
enhancementNew feature or requestNew feature or request
Description
The Feature
Instead of returning the response to the user, upload the response quickly to a fast S3 bucket (like GCS or R2), and return a presigned URL to the client. This would only work for non-streaming responses.
This has been supported in OpenAI's python client since: openai/openai-python#1100. Following redirects with fetch
in JavaScript is a default thing.
PoC:
from fastapi import FastAPI
from fastapi.responses import RedirectResponse
app = FastAPI()
@app.post("/chat/completions")
async def redirect_to_webhook():
return RedirectResponse(url="https://webhook.site/removed-removed-removed-removed-removed", status_code=303)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="localhost", port=8000)
Then when using:
#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri
import asyncio
import openai
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
c_handler = logging.StreamHandler()
logger.addHandler(c_handler)
client = openai.AsyncOpenAI(
api_key="FAKE",
base_url="http://localhost:8000",
)
async def main():
response = await client.chat.completions.create(
model="gemini-1.5-pro-preview-0409",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
temperature=0.0,
)
logger.info(response)
if __name__ == "__main__":
asyncio.run(main())
Then this results in a GET request with these headers:
connection: close
x-stainless-async: async:asyncio
x-stainless-runtime-version: 3.11.9
x-stainless-runtime: CPython
x-stainless-arch: arm64
x-stainless-os: MacOS
x-stainless-package-version: 1.28.0
x-stainless-lang: python
user-agent: AsyncOpenAI/Python 1.28.0
content-type: application/json
accept: application/json
accept-encoding: gzip, deflate, br
host: webhook.site
content-length:
Content-Type: application/json
Note to self, do not presigned the GET URL with the authorization
header.
Motivation, pitch
For large responses:
- This might reduce the load put on LiteLLM.
- If there's a slow client, this would allow LiteLLM to not need to maintain a connection. e.g. making scaling with serverless platforms more efficient.
Twitter / LinkedIn details
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request