Support for Hugginface multimodal models

**Describe the feature you'd like**

Being able to deploy huggingface multimodal models to a sagemaker endpoint.
Currently only language models are supported that require a prompt as input.
Multimodal models like Llava / CLIP / ... require a prompt and an image as input and this is currently not supported.

**How would this feature be used? Please describe.**

This is how the feature will be used by the end user:

```python
import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel

huggingface_model = HuggingFaceModel(
   model_data="some-repo/some-multimodal-model",   # <<----- specify the model
   role=sagemaker.get_execution_role()
   transformers_version="4.28.1", 
   pytorch_version="2.0.0",      
   py_version='py310',        
   model_server_workers=1
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.12xlarge",
    container_startup_health_check_timeout=900, # increase timeout for large models
    model_data_download_timeout=900, # increase timeout for large models
)
----------------!
### Call Llava
import base64
import requests

# request
data = {
    "image" : "some base64 encoded image", # <---- specify the image
    "question" : "Describe the image and color details.",
}
output = predictor.predict(data)
print(output)
```


**Describe alternatives you've considered**

You can package the model yourself and provide an inference.py script, but you have to download the model and tar.gz which takes a lot of time.

**Additional context**

I came up with this idea when I created a tar.gz for llava with an inference.py and made it available to the world. See my LinkedIn post here: https://www.linkedin.com/posts/vincent-claes-0b346337_aws-sagemaker-huggingface-activity-7141776348963885056-Uv0g?utm_source=share&utm_medium=member_desktop 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Hugginface multimodal models #4330

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Hugginface multimodal models #4330

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions