Description
Describe the feature you'd like
Like many other inference libraries in python (e.g. OpenAI), create a real awaitable version of Predict for realtime sagemaker inference endpoints. This will help python applications that use FastAPI and asyncio to deliver realtime responses while not blocking the main event loop. Please note that this feature is different that the one currently available here where the predictions are written to a S3 bucket. This feature would work exactly like https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict but with an await
in real asyncio
style.
Sagemaker is an amazing library and it would be just way better for production environments using FastAPI to have this feature.
How would this feature be used? Please describe.
In this case, currently, the sync version looks like this:
response = predictor.predict(input_data)
The async might be looking like
response = await predictor.apredict(input_data)
Describe alternatives you've considered
I considered subclassing the predictor
and add the async version.
Additional context
For modern python applications building on top of FastAPI and Asyncio, it is crucial to use async modalities do avoid blocking the main event-loop in the server (in case of scalable applications). Therefore, having a real awaitable
functionality would avoid blocking the main event loop of the applications that leverage sagemaker.
Thanks alot